![]() |
Dr. Mark Gardener |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
GO... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
On this page... |
Using R for statistical analyses - ANOVAThis page is intended to be a help in getting to grips with the powerful statistical program called R. It is not intended as a course in statistics. If you have an analysis to perform I hope that you will be able to find the commands you need here and copy/paste them into R to get going. On this page learn how to conduct analysis of variance including, one-way anova, post-hoc testing and more complex anova models. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
R is Open Source R is Free |
What is R?R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project web page is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation. R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages". However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes. Because R is a programming language it can seem a bit daunting; you have to type in commands to get it to work. However, it does have a Graphical User Interface (GUI) to make things easier. You can also copy and paste text from other applications into it (e.g. word processors). So, if you have a library of these commands it is easy to pop in the ones you need for the task at hand. That is the purpose of this web page; to provide a library of basic commands that the user can copy and paste into R to perform a variety of statistical analyses. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Navigation index |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANOVA - analysis of varianceThe analysis of variance is a commonly used method to determine differences between several samples. R provides a function to conduct ANOVA so: aov(model, data) The first stage is to arrange your data in a .CSV file. Use a column for each variable and give it a meaningful name. Don't forget that variable names in R can contain letters and numbers but the only punctuation allowed is a period. You need to set out your data file so that each column represents a factor in your analysis. Usually the 1st column will be your dependent variable (i.e. what you are actually measuring) and subsequent columns would be the independent factors (e.g. site, treatment). The second stage is to read your data file into memory and give it a sensible name. The next stage is to attach your data set so that the individual variables are read into memory. Finally we need to define the model and run the analysis. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
R uses a powerful model syntax e.g. y ~ x1 * x2 that alloows you to specify complex analyses. |
ANOVA One-wayAnalysis of variance and regression have much in common. Both examine a dependent variable and determine the variability of this variable in response to various factors. The simplest ANOVA would be where we have a single dependent variable and one single factor. For example, we may have raised broods of flies on various sugars. We measure the size of the individual flies and record the diet for each. Our data file would consist of two columns; one for growth and one for sugar. e.g.
... and so on. In this case we have a column for the dependent variable (growth) and a column for the dependent factor (sugar). The first column contains numeric data but the second contains letters. We could assign a number to each diet but it is more meaningful to assign a character string. It does not matter to R which form you have your dependent factors but it will be easier to interpret the results if you use meaningful names. Remember though that the only non-letter (i.e. punctuation) can be a period. The next step is to run the analysis. It is always a good idea to assign a variable to the result of the analysis so: > your.aov = aov(growth ~ sugar) Notice here the funny symbol (a tilde) in the model. This means take growth as the dependent variable, it depends on sugar. We will look at more complex models later but the form is similar to that used in multiple regression. To see the result of the analysis type in the name of the variable you gave it e.g. > your.aov Call: Terms:
Residual standard error: 2.279184 The basic result doen not give a great deal of information. We need to view the summary so try: > summary(your.aov)
--- This is rather more useful as we can now see the F-value and the level of significance. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tukey HSD is the most commonly used post-hoc test. |
Post-hoc testingSo far we have conducted a simple one-way anova. In this instance we see that there is a significant effect of diet upon growth. However, there are 6 treatments. We would like to know which of these treatments are significantly different from the controls and from other treatments. We need a post-hoc test. R provides a simple function to carry out the Tukey HSD test. > TukeyHSD(your.aov) This will show all the paired comparisons like so: > TukeyHSD(fly.aov) Fit: aov(formula = growth ~ sugar) $sugar
> The table/output shows us the difference between pairs, the 95% confidence interval(s) and the p-value of the pairwise comparisons. All we need to know! |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Use the model syntax to specify complex analyses in R |
ANOVA modelsSo far we have only cinsidered a simple one-way analysis. However, you will often have a more complex situation with several factors. The interaction between factors may also be important. Fortunately R has a model syntax that works for many sorts of analysis. Look at the section on Linear Regression Models for examples. When conducting an anova we have a single dependent variable and a number of explanatory factors. We set-up our anova in a general way: dependent ~ explanatory1... explanatory2... The model can take a variety of forms:
In reality we would give the variables more meanigful names. However, we can see that is pretty simple to alter our basic model to cope with more complex analyses. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANOVA Step by Step
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Navigation Index | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||