![]() |
Dr. Mark Gardener |
|||||||||||||||
GO... |
||||||||||||||||
On this page... Introduction to graphing |
Using R for statistical analyses - Graphs 2This page is intended to be a help in getting to grips with the powerful statistical program called R. It is not intended as a course in statistics. If you have an analysis to perform I hope that you will be able to find the commands you need here and copy/paste them into R to get going. On this page you can find out information on producing a range of graphs to illustrate your analyses. Specifically on this page find out about scatter plots, stem-leaf plots and pie charts. To find out about bar charts, histograms and box-whisker plots go to the graphs1 page. |
|||||||||||||||
R is Open Source R is Free |
What is R?R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project web page is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation. R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages". However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes. Because R is a programming language it can seem a bit daunting; you have to type in commands to get it to work. However, it does have a Graphical User Interface (GUI) to make things easier. You can also copy and paste text from other applications into it (e.g. word processors). So, if you have a library of these commands it is easy to pop in the ones you need for the task at hand. That is the purpose of this web page; to provide a library of basic commands that the user can copy and paste into R to perform a variety of statistical analyses. |
|||||||||||||||
Navigation index |
||||||||||||||||
R is not a point and click interface. However, it has
great power and versatility. |
Introduction to GraphingR has great graphical power but it is not a point and click interface. This means that you must use typed commands to get it to produce the graphs you desire. This can be a bit tedious at first but once you have the hang of it you can save a list of useful commands as text that you can copy and paste into the R command line. |
|||||||||||||||
Scatter PlotsA scatter plot is used when you have two variables to plot against one another. R has a basic command to perform this task. The command is plot(). As usual with R there are many additional parameters that you can add to customise your plots. The basic command is: plot(x, y) Where x is the name of your x-variable and y is the name of your y-variable. This is fine if you have two variables but if they are part of a bigger data set then you have to remember to attach(data.file) your data set. A more powerful command is: plot(y ~ x, data= your.data) Note the use of the model syntax. This model syntax is used widely in R for setting-up ANOVA and regression analyses for example (see also it's use in the box-whisker plot). R comes with a number of data sets built-in; these are used in the examples and can be useful to 'play with'. For example the data set cars contains two variables speed and dist. To see a basic scatter plot try the following: plot(dist ~ speed, data= cars)
This basic scatter takes the axes labels from the variables and uses open circles as the plotting symbol. As usual with R we have a wealth of additional commands at our disposal to beef up the display. A useful additional command is to add a line of best-fit. This is a command that adds to the current plot (like the title() command). For the above example we'd type: abline(lm(dist ~ speed, data= cars)) The basic command uses abline(a, b), where a= slope and b= intercept. Here we use a linear model command to calculate the best-fit equation for us (try typing the lm() command separately, you get the intercept and slope). If we combine this with a couple of extra lines we can produce a better looking plot: plot(dist
~ speed, data= cars, xlab="Speed",
ylab="Distance", col= "blue")
This illustrates several of the additional commands. We have set the axis labels and the colour of the plotting symbols. Next we added a main title and set the font to bold italic (try other values). Finally we set the best-fit line and made it red. We can alter the plotting symbol using the command pch= n, where n is a simple number. We can also alter the range of the x and y axes using xlim= c(lower, upper) and ylim= c(lower, upper). The size of the plotted points is manipulated using the cex= n command, where n = the 'magnification' factor. Here are some commands that illustrate these parameters: plot(dist
~ speed, data= cars, pch= 19, xlim= c(0,25), ylim= c(-20, 120), cex=
2)
Here the plotting symbol is set to 19 (a solid circle) and expanded by a factor of 2. Both x and y axes have been rescaled. The labels on the axes have been left blank and default to the name of the variable (which is taken from the data set). |
||||||||||||||||
Stem and leaf plotsA very basic yet useful plot is a stem and leaf plot. It is a quick way to represent the distribution of a single sample. The basic command is: stem(variable) Here is a vector of numbers saved as the variable test.data: [1] 2.1 2.6 2.7 3.2 4.1 4.3 5.2 5.1 4.8 1.8 1.4 2.5 2.7 3.1 2.6 2.8 To see the stem plot of these data we type: stem(test.data) The decimal point is at the |
We can now see quite clearly that the data are not normally distributed. This is a useful command for moderately small samples as you can easily re-construct the original data from the plot. For other samples the barplot function may be used to create a frequency plot. Alternatively a histogram may be more useful. |
||||||||||||||||
Pie chartsPie charts are not necessarily the most useful way of displaying data but they remain popular. We can produce pie charts easily in R using the basic command pie() To start with get your data organised into a .CSV file. Make a file with multiple columns then each column can have a title and a single value (to plot). Here is a simple example file:
To produce a simple pie chart we type the following: pie(pie.data)
This is a basic chart; we can see that the names of the columns have been appended to each slice. We can add a title in the usual way using the title() command. By default the slices are presented in anti-clockwise order, we can alter this by adding a simple command clockwise= TRUE The colours are set to pastel shades by default, to alter them you can add a list of colours to the command line in the form col= c("col1", col2", col3"). Here is the finished article: pie(pie.data,
clockwise=TRUE, col= c("red", "orange", "yellow", "green", "blue", "purple"))
Now we have clockwise slices with our own selection of colours. The title was set with a separate command and the font set to bold italic (try other values). |
||||||||||||||||