![]() |
Dr. Mark Gardener |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
GO... |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
On this page... |
Using R for statistical analyses - Multiple RegressionThis page is intended to be a help in getting to grips with the powerful statistical program called R. It is not intended as a course in statistics. If you have an analysis to perform I hope that you will be able to find the commands you need here and copy/paste them into R to get going. On this page learn about multiple regression analysis including: how to set-up models, extracting the coefficients, beta coefficients and R squared values. There is a short section on graphing but see the main graph page for more detailed information. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
R is Open Source R is Free |
What is R?R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project web page is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation. R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages". However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes. Because R is a programming language it can seem a bit daunting; you have to type in commands to get it to work. However, it does have a Graphical User Interface (GUI) to make things easier. You can also copy and paste text from other applications into it (e.g. word processors). So, if you have a library of these commands it is easy to pop in the ones you need for the task at hand. That is the purpose of this web page; to provide a library of basic commands that the user can copy and paste into R to perform a variety of statistical analyses. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Navigation index |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Multiple RegressionR can perform multiple regression quite easily. The basic function is: lm(model, data) The first stage is to arrange your data in a .CSV file. Use a column for each variable and give it a meaningful name. Don't forget that variable names in R can contain letters and numbers but the only punctuation allowed is a period. The second stage is to read your data file into memory and give it a sensible name. The next stage is to attach your data set so that the individual variables are read into memory. Finally we need to define the model and run the analysis. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
R has a powerful model syntax. This is used in other analyses too e.g. anova |
Linear Regression ModelsThe basic form of a linear regression is: y = m1x1 + m2x2 + m3x3... + c Given a series of ys and a series of x1, x2 etc. we can determine the coefficients (the ms) and the intercept (c). We can also determine the relative strength of the factors and how well correlated each factor (or combination) is. The general form of our models in R is: y ~ x1 + x2... There are a number of options, depending upon your data set. Let's consider the situation where you have a single dependent variable (y) and 3 factors that you think are important in determining y; we'll call them x1, x2 and x3. In reality we'd give them more meaningful names. We can set up our model in a number of ways:
To run an analysis we use the lm() function on our data e.g. > lm(y ~ x1 + x2 +x3) We don't need to specify the data file because we have already read it into memory and used attach() to link the variable names. It is good practice to use a variable to "hold" the result of the analysis; we can do other things on the result that would be tedious to type in every time. In this instance for e.g. > field.lm = lm(y ~ x1 + x2 + x3) If we now type the name of our new variable we see the result; something like this:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is fine but the information is a bit thin. To get a bit more information we can use summary(our lm result). In this case we would see:
This is much more useful; we can see the coefficients and which factors are significant. We can also see the overall R-squared value for our model. The final stat shows us the significance of the overall model. In the example above only x1 proved to be of significance. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Regression coefficientsOnce you have the basic information you may wish to delve further and examine more components of your rgression model. The basic lm() function gives us a list of the basic coefficients but you may wish to utilize them individually. To see an individual coefficient we use: > variable = lm(model) > variable$coeff["factor"] In the example above we called our basic model field.lm To see the coefficient for x1 we would type > field.lm$coeff["x1"] Beta coefficientsWe can now determine the beta coefficients; that is the coefficients that are standardized against one another to show us the relative strengths. A beta coefficient is determined mathematically as: beta = coeff * SD(x) / SD(y) Where SD is the standard deviation. First assign a variable for each coefficient e.g. > coeff.x1 = your.lm$coeff["factor"] Next determine the beta coefficient e.g. > beta.x1 = coeff.x1 * sd(x1) / sd(y) R squaredIf we use > summary(our/lm) we can see extra information like the R squared value, which tells us how strong the fit is (the proportion of the explained variance). However, R only shows us the value for the overall model. We can find the individual R-squared values once we know the beta coefficients. Mathematically each R-squared value is: R2 = beta * r (where r is the correlation between y and the x factor). To get r we use cor(y, x) So, in R we type: > R2.x1 = beta.x1 * cor(y, x1) You merely alter the names of the variables to suit your data. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
See the main graph page for more information. |
Graphing the regressionNow we have a basic numeric summary of the regression. What would be nice would be to have a graphical summary. There are two basic graphs that we can call on quickly to summarize our data. > pairs(your.data) The first graph draws a scatterplot for each pair of variables. This is a useful quick summary but can be rather messy if you have lots of factors. You run the pairs plot on the original data not the actual linear model. The 2nd plot will produce a scatter graph of any two pairs of variables. You might want to add a best-fit line to the scatter. > abline(lm(y.var ~ x.var)) When you draw a graph in R the graph appears in a separate window, You can resize this and also copy to the clipboard for use in another program. You can also print directly from R. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Regression step-by-stepHere is a step by step guide to performing a regression. Just copy the commands you need (one at a time) and paste into R. Edit as required for your data set and variable names.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||