Dr. Mark Gardener

GO...
Gardeners Own Home
Using R Introduction
Navigation Index
About Us

On this page...

Basic correlation

Significance testing

Graphing the correlation

Correlation step-by-step

Using R for statistical analyses - Simple correlation

This page is intended to be a help in getting to grips with the powerful statistical program called R. It is not intended as a course in statistics (see here for details about those). If you have an analysis to perform I hope that you will be able to find the commands you need here and copy/paste them into R to get going.

I run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. From 2013 courses will be held at The Field Studies Council Field Centre at Slapton Ley in Devon. Alternatively I can come to you and provide the training at your workplace. See details on my Courses Page.

On this page learn how to conduct simple correlations to find correlation coefficients as well as significance testing (e.g. Spearman Rank and Pearson). Also find an introduction to graphing (see the main graph page for more detail).

See also: R Courses | R Tips, Tricks & Hints | MonogRaphs | Writer's bloc


My publications about R

See my books about R on my Publications page

Statistics for Ecologists | Beginning R | The Essential R Reference | Community Ecology

Statistics for Ecologists is available now from Pelagic Publishing. Get a 20% discount using the S4E20 code!
Beginning R is available from Wrox the publisher or see the entry on Amazon.co.uk.
The Essential R Reference is available from the publisher Wiley now (see the entry on Amazon.co.uk)!
Community Ecology is in production now and expected by the end of 2013 from Pelagic Publishing.

I have more projects in hand - visit my Publications page from time to time. You might also like my random essays on selected R topics in MonogRaphs. See also my Writer's Bloc page, details about my latest writing project including R scripts developed for the book.


Skip directly to the 1st topic

R is Open Source

R is Free

Get R at the R Project page

What is R?

R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project web page is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation.

R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages". However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes.

Because R is a programming language it can seem a bit daunting; you have to type in commands to get it to work. However, it does have a Graphical User Interface (GUI) to make things easier. You can also copy and paste text from other applications into it (e.g. word processors). So, if you have a library of these commands it is easy to pop in the ones you need for the task at hand. That is the purpose of this web page; to provide a library of basic commands that the user can copy and paste into R to perform a variety of statistical analyses.


Top

Navigation index

Introduction

Getting started with R:

Top
What is R?
Introduction
Data files
Inputting data
Seeing your data in R
What data are loaded?
Removing data sets
Help and Documentation


Data2

More about manipulating data and entering data without using a spreadsheet:

Making Data
Combine command
Types of Data
Entering data with scan()
Multiple variables
More types of data
Variables within data
Transposing data
Making text columns
Missing values
Stacking data
Selecting columns
Naming columns
Unstacking data


Help and Documentation

A short section on how to find more help with R

 

Basic Statistics

Some statistical tests:

Basic stats
Mean
Variance
Quantile
Length

T-test
Variance unequal
Variance Equal
Paired t-test
T-test Step by Step

U-test
Two sample test
Paired test
U-test Step by Step

Paired tests
T-test: see T-test
Wilcoxon: see U-test

Chi Squared
Yates Correction for 2x2 matrix
Chi-Squared Step by Step

Goodness of Fit test
Goodness of Fit Step by Step


Non-Parametric stats

Stats on multiple samples when you have non-parametric data.

Kruskal Wallis test
Kruskal-Wallis Stacked
Kruskal Post-Hoc test
Studentized Range Q
Selecting sub-sets
Friedman test
Friedman post-hoc
Rank data ANOVA

 

Correlation

Getting started with correlation and a basic graph:

Correlation
Correlation and Significance tests
Graphing the Correlation
Correlation step by step


Regression

Multiple regression analysis:

Multiple Regression
Linear regression models
Regression coefficients
Beta coefficients
R squared
Graphing the regression
Regression step by step


ANOVA

Analysis of variance:

ANOVA analysis of variance
One-Way ANOVA
Simple Post-hoc test
ANOVA Models
ANOVA Step by Step

 

Graphs

Getting started with graphs, some basic types:

Introduction
Bar charts
Multi-category
Stacked bars
Frequency plots
Horizontal bars

Histograms

Box-whisker plots
Single sample
Multi-sample
Horizontal plot


Graphs2

More graphical methods:

Scatter plot

Stem-Leaf plots

Pie charts


Graphs3

More advanced graphical methods:

Line Plots
Plot types
Time series
Custom axes

Bottom


Top

Navigation Index

 

You can get Spearman, Kendall or Pearson correlation coefficients. You can also obtain a matrix of pairwise comparisons in a data set.

Correlation

R can perform correlation with the cor() function. Built-in to the base distribution of the program are three routines; for Pearson, Kendal and Spearman Rank correlations.

The first stage is to arrange your data in a .CSV file. Use a column for each variable and give it a meaningful name. Don't forget that variable names in R can contain letters and numbers but the only punctuation allowed is a period.

The second stage is to read your data file into memory and give it a sensible name.

The next stage is to attach your data set so that the individual variables are read into memory.

To get the correlation coefficient you type:

> cor( var1, var2, method = "method")

The default method is "pearson" so you may omit this if that is what you want. If you type "kendall" or "spearman" then you will get the appropriate correlation coefficient.

Correlation coefficients
The default correlation returns the pearson correlation coefficient cor(var1, var2)
If you specify "spearman" you will get the spearman correlation coefficient cor(var1, var2, method = "spearman")
If you use a datset instead of separate variables you will return a matrix of all the pairwize correlation coefficients cor(dataset, method = "pearson")

Top

Navigation Index

 

You can test the significance of a correation using Pearson, Kendall or Spearman methods.

Correlation and Significance tests

Getting a correlation coefficient is generally only half the story; you will want to know if the relationship is significant. The cor() function in R can be extended to provide the significance testing required. The function is cor.test()

As above you need to read your data into R from a .CSV file and attach the factors so that they are all stored in memory.

To run a correlation test we type:

> cor.test(var1, var2, method = "method")

The default method is "pearson" so you may omit this if that is what you want. If you type "kendall" or "spearman" then you will get the appropriate significance test.

As usual with R it is a good idea to assign a variable name to your result in case you want to perfom additional operations.

Correlation Significance tests
The default method is "pearson" cor.p = cor.test(var1, var2)
If you specify "spearman" you will get the spearman correlation coefficient cor.s = cor.test(var1, var2, method = "spearman")

To see a summary of your correlation test type the name of the variable e.g.

> cor.s

Spearman's rank correlation rho

data: y and x1
S = 147.713, p-value = 0.00175
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.7362267

>


Top

Navigation Index

 

Find out more graphical methods

Graphing the Correlation

You will usually want to use a scatter plot to graph your correlation. The basic plot is plot()

R has various default parameters set e.g. the axes are labelled as the factor name and the plotting symbol is set as an open circle.

Correlation graphs
Use the basic defaults to create a scatter plot of your two variables plot(x.var, y.var)
This changes the axes titles plot(x.var, y.var, xlab="X-axis", ylab="Y-axis")
This changes the plotting symbol to a solid circle plot(x.var, y.var, pch=16)
Adds a line of best fit to your scatter plot (don't do this for non-parametric plots). abline(lm(y.var ~ x.var)

Top

Navigation Index

Correlation Step by Step

Step-by-step Correlation
First create your data file. Use a spreadsheet and make each column a variable. Each row is a replicate. The first row should contain the variable names. Save this as a .CSV file  
Read the data into R and save as some name your.data = read.csv(file.choose())
Allow the factors within the data to be accessible to R   attach(your.data)
Decide on the method, run the correlation and assign the result to a new variable. Methods are "pearson" (default), "kendal" and "spearman" your.cor = cor(var1, var2, method = "pearson")
Have a look at the resulting correlation coefficient   your.cor
Perform a pairwize correlation on all the variables in the data set. Decide on the method ("pearson" (default), "kendal" and "spearman")  cor.mat = cor(your.data, method = "pearson")
have a look at the resulting correlation matrix   cor.mat
To evaluate the statistical significance of your correlation decide on the appropriate method (pearson is the default, see above), assign a variable and run the test your.cor cor.test(var1, var2, method="spearman")
Have a look at the result of yor significance test   your.cor
Plot a graph of the two variables from your correlation. pch=21 plots an open circle, pch=19 plots a solid circle. Try other values.   plot(x.var, y.var, xlab="x-label", ylab="y-label", pch=21))
Add a line of best fit (if appropriate)   abline(lm(y.var ~ x.var)
 

Gardeners Own Home
Top
Navigation Index