![]() |
Dr. Mark Gardener |
||||||||||||||||||||||||||||||||||||
GO... |
|||||||||||||||||||||||||||||||||||||
On this page... |
Using R for statistical analyses - Simple correlationThis page is intended to be a help in getting to grips with the powerful statistical program called R. It is not intended as a course in statistics. If you have an analysis to perform I hope that you will be able to find the commands you need here and copy/paste them into R to get going. On this page learn how to conduct simple correlations to find correlation coefficients as well as significance testing (e.g. Spearman Rank and Pearson). Also find an introduction to graphing (see the main graph page for more detail) |
||||||||||||||||||||||||||||||||||||
R is Open Source R is Free |
What is R?R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project web page is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation. R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages". However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes. Because R is a programming language it can seem a bit daunting; you have to type in commands to get it to work. However, it does have a Graphical User Interface (GUI) to make things easier. You can also copy and paste text from other applications into it (e.g. word processors). So, if you have a library of these commands it is easy to pop in the ones you need for the task at hand. That is the purpose of this web page; to provide a library of basic commands that the user can copy and paste into R to perform a variety of statistical analyses. |
||||||||||||||||||||||||||||||||||||
Navigation index |
|||||||||||||||||||||||||||||||||||||
|
You can get Spearman, Kendall or Pearson correlation coefficients. You can also obtain a matrix of pairwise comparisons in a data set. |
CorrelationR can perform correlation with the cor() function. Built-in to the base distribution of the program are three routines; for Pearson, Kendal and Spearman Rank correlations. The first stage is to arrange your data in a .CSV file. Use a column for each variable and give it a meaningful name. Don't forget that variable names in R can contain letters and numbers but the only punctuation allowed is a period. The second stage is to read your data file into memory and give it a sensible name. The next stage is to attach your data set so that the individual variables are read into memory. To get the correlation coefficient you type: > cor( var1, var2, method = "method") The default method is "pearson" so you may omit this if that is what you want. If you type "kendall" or "spearman" then you will get the appropriate correlation coefficient.
|
||||||||||||||||||||||||||||||||||||
|
You can test the significance of a correation using Pearson, Kendall or Spearman methods. |
Correlation and Significance testsGetting a correlation coefficient is generally only half the story; you will want to know if the relationship is significant. The cor() function in R can be extended to provide the significance testing required. The function is cor.test() As above you need to read your data into R from a .CSV file and attach the factors so that they are all stored in memory. To run a correlation test we type: > cor.test(var1, var2, method = "method") The default method is "pearson" so you may omit this if that is what you want. If you type "kendall" or "spearman" then you will get the appropriate significance test. As usual with R it is a good idea to assign a variable name to your result in case you want to perfom additional operations.
To see a summary of your correlation test type the name of the variable e.g. > cor.s Spearman's rank correlation rho data: y and x1 > |
||||||||||||||||||||||||||||||||||||
|
Find out more graphical methods |
Graphing the CorrelationYou will usually want to use a scatter plot to graph your correlation. The basic plot is plot() R has various default parameters set e.g. the axes are labelled as the factor name and the plotting symbol is set as an open circle.
|
||||||||||||||||||||||||||||||||||||
Correlation Step by Step
|
|||||||||||||||||||||||||||||||||||||