Dr. Mark Gardener
On this page...Introduction to graphing
Using R for statistical analyses - Graphs 2
This page is intended to be a help in getting to grips with the powerful statistical program called R. It is not intended as a course in statistics (see here for details about those). If you have an analysis to perform I hope that you will be able to find the commands you need here and copy/paste them into R to get going.
I run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. From 2013 courses will be held at The Field Studies Council Field Centre at Slapton Ley in Devon. Alternatively I can come to you and provide the training at your workplace. See details on my Courses Page.
On this page you can find out information on producing a range of graphs to illustrate your analyses. Specifically on this page find out about scatter plots, stem-leaf plots and pie charts. To find out about bar charts, histograms and box-whisker plots go to the graphs1 page.
My publications about R
See my books about R on my Publications page
Statistics for Ecologists is available now from Pelagic Publishing. Get a 20% discount using the S4E20 code!
I have more projects in hand - visit my Publications page from time to time. You might also like my random essays on selected R topics in MonogRaphs. See also my Writer's Bloc page, details about my latest writing project including R scripts developed for the book.
R is Open Source
R is Free
R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project web page is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation.
R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages". However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes.
Because R is a programming language it can seem a bit daunting; you have to type in commands to get it to work. However, it does have a Graphical User Interface (GUI) to make things easier. You can also copy and paste text from other applications into it (e.g. word processors). So, if you have a library of these commands it is easy to pop in the ones you need for the task at hand. That is the purpose of this web page; to provide a library of basic commands that the user can copy and paste into R to perform a variety of statistical analyses.
R is not a point and click interface. However, it has great power and versatility.
R has great graphical power but it is not a point and click interface. This means that you must use typed commands to get it to produce the graphs you desire. This can be a bit tedious at first but once you have the hang of it you can save a list of useful commands as text that you can copy and paste into the R command line.
A scatter plot is used when you have two variables to plot against one another. R has a basic command to perform this task. The command is plot(). As usual with R there are many additional parameters that you can add to customise your plots.
The basic command is:
Where x is the name of your x-variable and y is the name of your y-variable. This is fine if you have two variables but if they are part of a bigger data set then you have to remember to attach(data.file) your data set. A more powerful command is:
plot(y ~ x, data= your.data)
R comes with a number of data sets built-in; these are used in the examples and can be useful to 'play with'. For example the data set cars contains two variables speed and dist.
To see a basic scatter plot try the following:
plot(dist ~ speed, data= cars)
This basic scatter takes the axes labels from the variables and uses open circles as the plotting symbol. As usual with R we have a wealth of additional commands at our disposal to beef up the display. A useful additional command is to add a line of best-fit. This is a command that adds to the current plot (like the title() command). For the above example we'd type:
abline(lm(dist ~ speed, data= cars))
The basic command uses abline(a, b), where a= slope and b= intercept. Here we use a linear model command to calculate the best-fit equation for us (try typing the lm() command separately, you get the intercept and slope).
If we combine this with a couple of extra lines we can produce a better looking plot:
~ speed, data= cars, xlab="Speed",
ylab="Distance", col= "blue")
This illustrates several of the additional commands. We have set the axis labels and the colour of the plotting symbols. Next we added a main title and set the font to bold italic (try other values). Finally we set the best-fit line and made it red.
We can alter the plotting symbol using the command pch= n, where n is a simple number. We can also alter the range of the x and y axes using xlim= c(lower, upper) and ylim= c(lower, upper). The size of the plotted points is manipulated using the cex= n command, where n = the 'magnification' factor. Here are some commands that illustrate these parameters:
~ speed, data= cars, pch= 19, xlim= c(0,25), ylim= c(-20, 120), cex=
Here the plotting symbol is set to 19 (a solid circle) and expanded by a factor of 2. Both x and y axes have been rescaled. The labels on the axes have been left blank and default to the name of the variable (which is taken from the data set).
A very basic yet useful plot is a stem and leaf plot. It is a quick way to represent the distribution of a single sample. The basic command is:
Here is a vector of numbers saved as the variable test.data:
 2.1 2.6 2.7 3.2 4.1 4.3 5.2 5.1 4.8 1.8 1.4 2.5 2.7 3.1 2.6 2.8
To see the stem plot of these data we type:
The decimal point is at the |
We can now see quite clearly that the data are not normally distributed. This is a useful command for moderately small samples as you can easily re-construct the original data from the plot. For other samples the barplot function may be used to create a frequency plot. Alternatively a histogram may be more useful.
Pie charts are not necessarily the most useful way of displaying data but they remain popular. We can produce pie charts easily in R using the basic command pie()
To start with get your data organised into a .CSV file. Make a file with multiple columns then each column can have a title and a single value (to plot). Here is a simple example file:
To produce a simple pie chart we type the following:
This is a basic chart; we can see that the names of the columns have been appended to each slice. We can add a title in the usual way using the title() command.
By default the slices are presented in anti-clockwise order, we can alter this by adding a simple command clockwise= TRUE
The colours are set to pastel shades by default, to alter them you can add a list of colours to the command line in the form col= c("col1", col2", col3"). Here is the finished article:
clockwise=TRUE, col= c("red", "orange", "yellow", "green", "blue", "purple"))
Now we have clockwise slices with our own selection of colours. The title was set with a separate command and the font set to bold italic (try other values).