Dr. Mark Gardener
|
||||
GO... |
||||
Statistics for Ecologists using R and ExcelData Collection, Exploration, Analysis and PresentationAvailable from Pelagic Publishing Find supplementary material on this page - Data examples used in the book. For an outline/overview of the book see here. |
||||
|
List of example data files. Click to view or right click to download: Pivot.txt Back to top |
Statistics for Ecologists: Data examplesThroughout the book we use examples of data to illustrate various ideas, concepts and statistical analyses. This web page contains examples taken from the book that you may download and use to practice with. The data are all plain text and set out in Tab delimited format. This means that they may be viewed in a range of computer programs. You can view the data from your web browser and the files may also be opened in a word processor or notepad as well as a spreadsheet like Excel. In addition you can import the data into the R. I hope that you will be able to use these examples to help your understanding of the ideas in the book and to consolidate your learning of the various processes illustrated. All the files are plain text and will display in your browser or may be downloaded and opened using Excel, notepad or R. Each file contains a few lines of comment at the top; these comments give a brief introduction as well as containing notes on getting the data into the R program. Slightly more detailed notes on each file are on this page (jump direct to instruction list). In general to get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #; you may ignore them or delete the rows. To get the data into R you require either the scan() command or the read.table() command, according to which data file you are looking at. The comment rows at the top of each file give appropriate instructions. In the book we generally use the read.csv() command; this is a special form of the read.table() command with various defaults. CSV files do not display clearly in web browsers so the decision was made to use Tab delimited files. The following text gives brief notes about each file in turn. The file name is a hyperlink and clicking on this will open the data in your browser. To download the file right click and select “Save file as…” or similar (this varies according to browser). Click on an example name below to go directly to the instructions for that example: Pivot.txt | Beetle size.txt | Beetle comparison.txt | Leaf sizes.txt | Ridge Furrow.txt | Paired data.txt |
|||
Example file for practice: Pivot.txt Click to view or right click to download Back to Instruction list |
TitlePivot table example FileStructureThe file contains 3 columns; count, habitat and obs. These relate to the number of butterfly species, habitat type and observation number (each habitat was visited several times). UsesWe will use these data to make a simple pivot table in a spreadsheet. We want to end up with a table with a column for each habitat. Once the data are assembled into 3 separate samples you could try creating summary statistics for each sample and creating appropriate graphs. We will not carry out a pivot table operation using R (although there is a similar function) but we may wish to use the data for other purposes (e.g. Kruskal-Wallis test). ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Pivot.data = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Pivot.data” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Beetle size.txt Click to view or right click to download Back to Instruction list |
TitleBeetle sizes FileStructureThe data consists of 5 columns of figures; they are split like this simply to make them more compact. The values are the size of a water beetle in millimetres. UsesSummary statistics: We use these data to look at ways to summarize samples using averages for instance. These data can be used to make a distribution graph e.g. a histogram. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Beetle = scan(file.choose(), delim = “\t”, comment = “#”) You can replace the “Beetle” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Beetle comparison.txt Click to view or right click to download Back to Instruction list |
TitleComparison of two samples of beetle sizes FileStructureThere are two columns in the data file, one for Jun and one for Mar. The values are the size of the beetles in millimetres. UsesThe main use for these data is to illustrate differences in data distribution. You may create a histogram or stem-leaf plot for each sample for example. There are many other things that may be done with these data, for example, summary statistics, comparison of two samples, graphs. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get the data into R we have a couple of choices; we may open the file in Excel and copy each column to the clipboard to create a data object for each month or we can copy the file in its current form. In the first case we use something like the following: Jun = scan() Then paste the copied column; in this case the Jun sample. We then repeat the process but using a new name and the other (Mar) column. We end up with two separate data items in R. Alternatively we can read the data as one item like so: Beetles = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Beetles” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. We now have a single data object that contains both samples. |
|||
Example file for practice: Leaf sizes.txt Click to view or right click to download Back to Instruction list |
TitleLeaf sizes and running averages FileStructureThe data comprises of 10 columns of data; each column is a sample of 10 leaf sizes. UsesWe used these data to illustrate standard error. We can summarize each sample and create a running average (mean or median). We might also examine the distribution of each sample and the entire 100 values in the set. If we read the data into R they will appear as a single sample (of 100 values); see if you can work out how to create a data object containing 10 separate samples. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Leaf = scan(file.choose(), delim = “\t”, comment = “#”) You can replace the “Leaf” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Ridge Furrow.txt Click to view or right click to download Back to Instruction list |
TitleTest for significant difference FileStructureThe data comprises of two columns, one for the ridge sample and one for the furrow. The values represent the number of plant species found in quadrats in the two kinds of habitat. UsesThe main use of these data is to illustrate the t-test; a parametric test of differences between two samples. However, the data may also be used to look at summary statistics, distribution and graphing. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Ridge.Furrow = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Ridge.Furrow” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Paired data.txt Click to view or right click to download Back to Instruction list |
TitleMatched pair data FileStructureThese data are in two columns. One column for captures on white coloured targets and the other for yellow targets. The data are paired with each row being a single bi-coloured target. UsesThe principle use for these data is to illustrate use of matched pair data for examining differences. We may use the t-test of the Wilcoxon test. We can use these data for other purposes too, graphing for example. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Whitefly = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Whitefly” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Correlation.txt Click to view or right click to download Back to Instruction list |
TitleSimple correlation FileStructureThese data comprise of 2 columns, one represents water speed and the other the abundance of mayfly at the location the corresponding speed was measured. UsesThe main use is to illustrate simple correlation between two variables. We could also use the data to create a graph. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Mayfly = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Mayfly” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Pearson.txt Click to view or right click to download Back to Instruction list |
TitlePearson correlation example FileStructureThese data comprise of 2 columns, one represents water speed and the other the abundance of a freshwater invertebrate at the location the corresponding speed was measured. UsesThe main use is to illustrate simple correlation between two variables. We could also use the data to create a graph. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Fwater = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Fwater” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Polynomial.txt Click to view or right click to download Back to Instruction list |
TitlePolynomial regression FileStructureHere we have two columns, one relates to light intensity and the other to abundance of a plant, bluebell. UsesWe use these data to illustrate curvilinear relationships. Here we have a polynomial example. We can also draw a graph of the relationship. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Bbel = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Bbel” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Logarithmic.txt Click to view or right click to download Back to Instruction list |
TitleLogarithmic regression FileStructureHere we have two columns, one relates to soil nutrient concentration and the other to growth of plants at each concentration. UsesWe use these data to illustrate curvilinear relationships. Here we have a logarithmic example. We can also draw a graph of the relationship. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Nitrate = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Nitrate” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Chi Sq.txt Click to view or right click to download Back to Instruction list |
TitleAssociation test example FileStructureThese data comprise of 3 columns of insect taxa and 4 rows of habitats. We have a contingency table of observations and the table has row and column headings. UsesThe main use for these data is to illustrate the chi squared test for association. We can also use the data to create graphs, e.g. pie charts. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Inverts = read.table(file.choose(), delim = “\t”, comment = “#”, row.names = 1, header = TRUE) You can replace the “Inverts” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. In this case we require the first column (the habitat names) to be treated as row names so we use the row.names = 1 part. |
|||
Example file for practice: Goodness of Fit.txt Click to view or right click to download Back to Instruction list |
TitleGoodness of Fit example FileStructureThese data comprise 2 columns and 4 rows of values along with extra rows for heading names and row names. UsesThe main use for these data is to illustrate goodness of fit testing. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Peas = read.table(file.choose(), delim = “\t”, comment = “#”, row.names = 1, header = TRUE) You can replace the “Peas” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. In this case we require the first column (the habitat names) to be treated as row names so we use the row.names = 1 part. |
|||
Example file for practice: Three sites.txt Click to view or right click to download Back to Instruction list |
TitleSward height at three sites FileStructureThe data comprise of 3 columns, each is a sample of sward heights from a site. UsesWe use these data to look at differences between more than two samples. We may use analysis of variance or a Kruskal-Wallis test. We can also use the data to create a graph. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Sward = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Sward” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Twoway anova xl.txt Click to view or right click to download Back to Instruction list |
TitleTwo way anova in Excel FileStructureThe data are in 3 columns; the first column shows the grazing regime and the next 2 columns show the abundance of a plant species in two sites. UsesThe principle use for these data is to illustrate 2-way analysis of variance. These data are in a layout that only works for sensibly Excel. We may also use these data to create a graph. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RThese data are in a layout that is not really suitable for analysis in R. I would be possible to use the read.table() command in a similar manner to other data but before any meaningful analyses could be carried out some rearrangement would need to be done. The data are also presented in more “normal” layout in a separate file (Twoway anova br.txt); see following example. |
|||
Example file for practice: Twoway anova br.txt Click to view or right click to download Back to Instruction list |
TitlePlant abundance in relation to grazing and site FileStructureThe file contains 3 columns; the first is the abundance of a plant species in a series of quadrats. The next column is the site where the observations were made and the final column shows the grazing treatment at that location. These data are the same as the previous example (Twoway anova xl.txt) but are in regular recording format. UsesThe main use for these data is to illustrate 2-way analysis of variance. We may also use the data for other purposes, to draw a graph for example. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Graze = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Graze” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Hoglouse Three sites.txt Click to view or right click to download Back to Instruction list |
TitleHouglouse abundance at three sites FileStructureThe data are in 3 columns, one for each site. Each column contains values for the abundance of a freshwater invertebrate for that site. UsesThe main use for these data is to illustrate use of the Kruskal-Wallis test for differences. We might also use the data to draw a graph. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Hog = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Hog” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Regression.txt Click to view or right click to download Back to Instruction list |
TitleButterflies and food – a regression FileStructureThe data comprises of 3 columns. The first is the abundance of a butterfly species. The next two contain values for the abundance of larval and adult food plants. UsesThe principle use for these data is to illustrate multiple regression. We may also use these data for other purposes, drawing graphs for example. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Bfly = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Bfly” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Post pivot.txt Click to view or right click to download Back to Instruction list |
TitleButterfly abundance at 3 sites – post pivot table FileStructureThese data are in 3 columns, one for each site. The values represent the numbers of butterfly species seen on repeated visits to each site. UsesThese data are the result of creating a pivot table using data we met earlier (Pivot.txt). In this form we may use them to carry out a differences test (e.g. Kruskal-Wallis or anova) or perhaps create a graph. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Bfly = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Bfly” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Pie chart.txt Click to view or right click to download Back to Instruction list |
TitleGarden birds and habitat selection FileStructureThe data consists of 5 columns and 6 rows of values. The columns are various habitats and the rows are various common bird species. We have both row and column headings. UsesWe use these data to illustrate the use of pie charts but it is also suitable for a chi squared test for association. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Birds = read.table(file.choose(), delim = “\t”, comment = “#”, row.names = 1, header = TRUE) You can replace the “Birds” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. In this case we require the first column (the habitat names) to be treated as row names so we use the row.names = 1 part. |
|||
Example file for practice: Mayfly regression.txt Click to view or right click to download Back to Instruction list |
TitleMayfly and multiple regression FileStructureThese data comprise of a column of mayfly sizes and 4 columns of habitat data. UsesThe main use is to illustrate multiple regression; in the text we looked at stepwise regression. The data may also be used for other purposes such as summarizing samples and drawing graphs. ExcelTo get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: Mfly = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “Mfly” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
Example file for practice: Beach hoppers.txt Click to view or right click to download Back to Instruction list |
TitleCalifornian beach hoppers logistic regression FileStructureThese data are composed of a column of latitude information. The next two columns show the number of individuals at each of the latitudes that had one version or the other of a particular allele. The final column is a proportion at that latitude of individuals with the second version of the allele. UsesThe main use is to illustrate one form of logistic regression. ExcelThis example is intended to be used in R but to get the data into Excel simply open the program and then the file. You will need to tell Excel that the data are in delimited format and that the delimiter is the Tab character. Then Excel is able to display the data in separate cells. The top few lines are for information and begin with the hash character #. You may ignore them or delete the rows. RTo get these data into R we need to use a command like so: CBH = read.table(file.choose(), delim = “\t”, comment = “#”, header = TRUE) You can replace the “CBH” part with a name of your own choosing. The delim = “\t” part tells R that the data are separated by Tab characters and the comment = “#” part tells R that the # character begins comment lines, which are ignored. |
|||
| Click to view or right click to download | Pivot.txt | Beetle size.txt | Beetle comparison.txt | Leaf sizes.txt | Ridge Furrow.txt | Paired data.txt |
|||