Workshop 1: Getting R to work for you R for Landscape Ecology Workshop Series Fall 2011 Paul Galpern NRI, University of Manitoba Contents 1 Workshop overview 2 2 First steps in R 2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 4 4 3 Help with help 3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 4 R Objects 4.1 Example . . . . . . . . 4.1.1 Simple objects 4.1.2 Matrices . . . . 4.1.3 Data frames . . 4.2 Exercise . . . . . . . . 4.3 Challenge . . . . . . . 5 Manipulating 5.1 Example . 5.2 Exercise . 5.3 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 6 8 9 11 11 real data 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1 1 Workshop overview For each of these three workshops I will prepare a document like this one. These will serve as a guide for us, and a record for you of what we did. The intended structure of our workshop will be: (1) Paul illustrates an example; (2) you complete a short exercise based on the example, while Paul is on-hand to assist; (3) if there is time, you can do the optional challenge which takes the exercise a step further. Please don’t hesitate to email me if you have R questions! There are also many useful resources online. University of Manitoba has over 50 R specific e-books in its library that are ready to download. Click here for a complete list. These are some of the more intriguing titles that relate to ecology, and only a sample of analyses that are fully-documented and ready to run in R. 1. A beginner’s guide to R 2. A tiny handbook of R 3. Introductory statistics with R 4. A handbook of statistical analyses using R 5. Using R for data management, statistical analysis, and graphics 6. Numerical ecology with R 7. A practical guide to ecological modelling 8. Mixed effects models and extensions in ecology with R 9. A primer of ecology with R 10. Applied spatial data analysis with R 11. Ecological models and data in R 12. A modern approach to regression with R 13. Data manipulation With R 14. Morphometrics with R 15. Nonlinear regression with R 16. An introduction to applied multivariate analysis with R 17. Hidden Markov models for time series 18. Introductory time series with R 19. Bioinformatics and computational biology solutions using R and Bioconductor 20. Extending the linear model with R 2 2 First steps in R Here we familiarize ourselves with the R workspace by writing a bit of code. 2.1 Example Here is some sample code. Note that lines that begin with a number sign are code comments and don’t have any effect in R. > # A textual announcement > cat("Hello world!") Hello world! # A graphical announcement # Create a blank graphical plot with dimensions 100 by 100 plot.default(c(1,100), c(1,100), type="n", xlab="", ylab="") # Add your announcement in the middle of the plot text(50, 50, "Hello world!", cex=5) 60 80 100 > > > > > 0 20 40 Hello world! 0 20 40 60 80 100 We entered the code in the console window, and we saw the plot in its own window. Now let’s see a demo of some basic R graphics. 3 > demo(graphics) Finally, let’s review some of the help system. We can enter the following in the console and use the search engine to explore the function plot > help.start() 2.2 Exercise 1. Open R, and manually type in the code listed above. 2. Add a different message to the ”Hello World” plot. 3. Determine the function of the cex parameter. 2.3 Challenge 1. Use the help. Figure out how to change the colour of your text. 4 3 Help with help 3.1 Example Every R function is documented, but the documentation can sometimes be obscure. Here are a few hints to make sense of it. 1. Search for what you are looking for using help.start() 2. Functions are each associated with a package. A few packages are loaded automatically when you turn on R. Everything else needs to be loaded using the library() function before you use it. For example: > library(stats) 3. Functions in R do things. They may have a statistical, graphical, or data manipulation purpose. They accept arguments (e.g. data or parameters), produce values (e.g. results), and sometimes have side effects (e.g. saving files, or making pictures). Some arguments need to be supplied, and others can be left out. Help should clear this up. 4. Most functions come with examples. You can run these examples, and learn LOTS about how to use each function with example(functionName). > library(stats) > example(glm) 5. Many functions and packages come with demos. You can see the full list of demos using the demo function. Each demo also produces code which indicates how to create the results. > demo() > # Run one of these demos > # demo(persp) Using help, example() and demo() goes a long way to getting R to work for you quickly. If we want to go beyond R as a ”black box” for your analyses, we need to understand a bit better how R works. This is the goal of the next section. 3.2 Exercise 1. Use demo() to find a list of demos. Then run one that interests you. 5 4 R Objects The goal in this section is to cover enough of the basics so that the help section becomes intelligible, and you can begin to independently use the resources that come with R. 4.1 4.1.1 Example Simple objects Let’s start by creating some R objects. Another more generic programming term for an object is a variable. The <- operator means ”assigned to”. > a <- 1 > a [1] 1 > hero <- "Superman" > hero [1] "Superman" > capitalsMatter <- 365 > capitalsMatter==365 [1] TRUE > > > > > + + + + # The following line will not work because object does not exist: # Capitalsmatter==365 # Some logic if (hero=="Superman") { cat(hero, "is my hero too!") } else { cat(hero, "is a villain!") } Superman is my hero too! The basic type of R object is the vector. Here are some examples: > B <- 1:10 > B [1] 1 2 3 4 5 6 7 8 9 10 > tenRandomNumbers <- rnorm(10) > tenRandomNumbers 6 [1] [7] 2.8875586 -0.2024185 -0.1590609 -0.2871859 1.0879563 1.3634984 -0.1082369 -0.4326746 1.1751339 -1.3016199 Note the c() below. This is a way of creating a vector > myChemMarks <- c(0.49, 0.75, 0.86, 0.67) > myChemMarks [1] 0.49 0.75 0.86 0.67 > politicalHeroes <- c("Obama", "Layton", "Gandhi", "Mandela") > politicalHeroes [1] "Obama" "Layton" "Gandhi" "Mandela" > warblerCount <- c(6, 4, 10, 4, 8, 0, 0, 0, 5) > warblerCount [1] 6 4 10 4 8 0 0 0 5 Now let’s do something useful with these vectors: > # How many political heroes do I have? > length(politicalHeroes) [1] 4 > # What is my chemistry average? > mean(myChemMarks) [1] 0.6925 > # How often did I fail a chemistry assignment? > myChemMarks [1] 0.49 0.75 0.86 0.67 > myChemMarks < 0.5 [1] TRUE FALSE FALSE FALSE > sum(myChemMarks < 0.5) [1] 1 > # At which point count locations did I see no warblers? > warblerCount [1] 6 4 10 4 8 0 0 0 5 > warblerCount == 0 [1] FALSE FALSE FALSE FALSE FALSE > which(warblerCount==0) [1] 6 7 8 7 TRUE TRUE TRUE FALSE > # I didn't count yellow-rumpeds, and I saw them everywhere > warblerCount <- warblerCount + 1 > warblerCount [1] 7 5 11 5 9 1 1 1 6 > which(warblerCount==0) integer(0) > ## Aah good! 4.1.2 I saw them everywhere! Matrices Matrices and data frames are two important types of R objects, and it is these that make R especially suitable for data analysis and statistics. Let’s make some matrices first. These are just the two-dimensional equivalent of vectors, and you can treat them in the same way. > # Create a 2 x 5 matrix using a vector of numbers from 1 to 10 > A <- matrix(data=1:10, nrow=2, ncol=5) > A [1,] [2,] [,1] [,2] [,3] [,4] [,5] 1 3 5 7 9 2 4 6 8 10 > # Scalar arithmetic operations on matrices > X <- A*2 > X [1,] [2,] [,1] [,2] [,3] [,4] [,5] 2 6 10 14 18 4 8 12 16 20 > Y <- A+4 > Y [1,] [2,] [,1] [,2] [,3] [,4] [,5] 5 7 9 11 13 6 8 10 12 14 > # A matrix of text > politicalMatrix <- matrix(data=politicalHeroes, nrow=2, ncol=2) > politicalMatrix [,1] [,2] [1,] "Obama" "Gandhi" [2,] "Layton" "Mandela" 8 > > # You can't multiply these however! > # politicalMatrix*2 4.1.3 Error! Data frames Data frames are like matrices but more flexible. Unlike matrices, which can only be of one type (e.g. all numbers, all text), data frames can mix these elements, making them very useful for data analysis. Also notice how R doesn’t mind when we split things onto multiple lines. But watch the nested brackets. > blackBirds <- data.frame(species=c("CORA", "AMCR", "COGR", "EUST"), + count1=c(1, 3, 1, 12), + count2=c(2, 2, 6, 8), + count3=c(0, 1, 0, 6)) > blackBirds 1 2 3 4 species count1 count2 count3 CORA 1 2 0 AMCR 3 2 1 COGR 1 6 0 EUST 12 8 6 We can pull out parts of the data frame as follows: > # Get a row > blackBirds[1, ] 1 species count1 count2 count3 CORA 1 2 0 > # Get a column > blackBirds[ , 1] [1] CORA AMCR COGR EUST Levels: AMCR COGR CORA EUST > # Get a column by name > blackBirds$species [1] CORA AMCR COGR EUST Levels: AMCR COGR CORA EUST > # Another way to get a column by name > blackBirds[, "species"] [1] CORA AMCR COGR EUST Levels: AMCR COGR CORA EUST 9 > # Get a range of columns > blackBirds[ , 2:4] 1 2 3 4 count1 count2 count3 1 2 0 3 2 1 1 6 0 12 8 6 > # Get a single datum > blackBirds[1,2] [1] 1 And we can change a single datum by assigning > blackBirds[1, "count1"] <- 7 Adding columns can often by helpful, for example to produce new variables that are derived from existing ones. Adding rows (i.e. adding a new observation) occurs less often in typical data analysis. Often rows will be added outside R at the stage of raw data preparation (e.g. in Excel). Examples of adding both columns and rows to data frames follow: > # Add a column called "commonName" > blackBirds$commonName <+ c("Common raven", + "American crow", + "Common grackle", + "European starling") > blackBirds 1 2 3 4 species count1 count2 count3 commonName CORA 7 2 0 Common raven AMCR 3 2 1 American crow COGR 1 6 0 Common grackle EUST 12 8 6 European starling > # Adding a row is more complicated, however > blackBirds <- rbind(blackBirds, + data.frame(species="RUBL", + count1=0, + count2=0, + count3=1, + commonName="Rusty blackbird")) > blackBirds 1 2 3 species count1 count2 count3 CORA 7 2 0 AMCR 3 2 1 COGR 1 6 0 commonName Common raven American crow Common grackle 10 4 5 EUST RUBL 12 0 8 0 6 European starling 1 Rusty blackbird > # Add a row total column derived from count columns > blackBirds$totals <- rowSums(blackBirds[ , 2:4]) > blackBirds 1 2 3 4 5 species count1 count2 count3 commonName totals CORA 7 2 0 Common raven 9 AMCR 3 2 1 American crow 6 COGR 1 6 0 Common grackle 7 EUST 12 8 6 European starling 26 RUBL 0 0 1 Rusty blackbird 1 4.2 Exercise 1. Create a vector containing the numbers 10 to 20 2. Create a vector containing the names of five people in the room 3. Create a data frame with your favourite restaurants. It should contain four columns and at least three rows. Columns should be ”name”, ”cuisineStyle”, ”costPerVisit” and ”timesVisited” 4. Find your average expenditure per visit 5. Find your total restaurant expenditure 4.3 Challenge 1. Create a vector containing 25 random numbers using the rnorm() function. 2. Find how many of these numbers are less than zero, and how many are greater than zero. 11 5 Manipulating real data Here we will import some data from a CSV file (produced by Excel) into an R data frame object. We will demonstrate some basic and more advanced manipulations of this data. The datasets slktbbs.csv and slktbbslatlon.csv gives observations from breeding bird survey point counts conducted in the vicinity of Sioux Lookout, Ontario between 2006 and 2010. You should have received a copy of these datasets. They can also be downloaded by clicking on filenames. 5.1 Example We’ll use the dataset slktbbs.csv in this session. The second dataset with lat/lon information will be useful for future tutorials when we explore spatial data. Let’s start by loading the csv file. To do this we need to know where on our computer’s file system the csv file has been saved. In case you are unfamiliar with the concept of a file path we’ll assume that you saved the csv files in your Windows default documents folder (My Documents on Windows XP). In R we can refer to this default documents folder using the tilde. Note that if you do give a file path in R you must replace back slashes with forward slashes! > > > > > > # Assumes the file slktbbs.csv is saved in documents folder bbs <- read.csv("~/slktbbs.csv") # It is now loaded into a data frame with object name bbs # Examine the entire data frame (note "V"iew) View(bbs) You can see by viewing the data frame that it is organized by route, year and species with the counts of the species at each of 50 route locations given as variables. It is not terribly useful in this format. Our first goal is to impose some order on this. Add up counts for 25 consecutive stops. This gives us two abundance observations for each route/year/species combination. We’ll turn these into two new columns (or variables). > # Use column names to find column numbers > names(bbs) [1] [5] [9] [13] "route" "species" "stop3" "stop7" "year" "habitat" "stop4" "stop8" "aou" "stop1" "stop5" "stop9" 12 "speciesname" "stop2" "stop6" "stop10" [17] [21] [25] [29] [33] [37] [41] [45] [49] [53] > > > > > + "stop11" "stop15" "stop19" "stop23" "stop27" "stop31" "stop35" "stop39" "stop43" "stop47" "stop12" "stop16" "stop20" "stop24" "stop28" "stop32" "stop36" "stop40" "stop44" "stop48" "stop13" "stop17" "stop21" "stop25" "stop29" "stop33" "stop37" "stop41" "stop45" "stop49" "stop14" "stop18" "stop22" "stop26" "stop30" "stop34" "stop38" "stop42" "stop46" "stop50" # Sum across rows of selected columns to make 2 new columns bbs$stops1to25 <- rowSums(bbs[ , 7:31]) bbs$stops26to50 <- rowSums(bbs[ , 32:56]) # Make a smaller data frame with just what we need birds <- bbs[, c("route", "year", "species", "habitat", "stops1to25", "stops26to50")] We should also know the total abundance for each route/year/species. We can add the two new variable columns we made. This is vector addition, so the first element of the first vector is added to the first element of the second vector, and so on. > # Create another new column > birds$total <- birds$stops1to25 + birds$stops26to50 Before we move on, let’s take a look at our analysis-ready data frame. We can use View() as shown above, but we can also look at it in two additional ways: > head(birds) 1 2 3 4 5 6 route SilverDollar SilverDollar SilverDollar SilverDollar SilverDollar SilverDollar year species habitat stops1to25 stops26to50 total 2006 COLO water 2 4 6 2006 HEGU water 0 6 6 2006 COGO water 0 1 1 2006 AMBI wetland 3 0 3 2006 WISN wetland 5 0 5 2006 RUGR understory 4 1 5 > # Summarize the structure of the data frame > str(birds) 'data.frame': $ route : $ year : $ species : $ habitat : $ stops1to25 : $ stops26to50: $ total : 351 obs. of 7 variables: Factor w/ 2 levels "BigSandy","SilverDollar": 2 2 2 .. int 2006 2006 2006 2006 2006 2006 2006 2006 2006 20.. Factor w/ 83 levels "ALFL","AMBI",..: 24 39 22 2 75 .. Factor w/ 5 levels "forested","open",..: 4 4 4 5 5 3.. num 2 0 0 3 5 4 0 1 0 0 ... num 4 6 1 0 0 1 1 0 3 1 ... num 6 6 1 3 5 5 1 1 3 1 ... 13 Notice how each column or variable in the data frame is given a different row in this summary. The year variable is currently an integer ”int”. We need to convert this to a ”factor” so we can use it as a grouping variable in the subsequent steps. > birds$year <- as.factor(birds$year) Now comes the interesting part! Let’s ask some questions about the data by grouping observations in different ways. A powerful function for this purposes is aggregate(). We can tell aggregate() to use an entire data frame. This saves us constantly referring to the data frame object. We also use a ”formula” which appears often in R. A formula uses a tilde to separate the dependent from the independent variables. Although in this case we are not doing statistics and dependence doesn’t matter, this is a convenient way of saying how we want things to be grouped. The following finds the total number of birds found in each habitat, aggregating all data from years and routes. > # The bird abundance in each habitat > aggregate(total ~ habitat, data=birds, function(subtotal) sum(subtotal)) habitat total 1 forested 1772 2 open 148 3 understory 865 4 water 154 5 wetland 162 In the above example, aggregate sorts the total column into five groups representing each habitat type. It creates a vector called subtotal with the totals just for that group. The last part ”sum(subtotal)” takes that vector and sums it to produce the total abundance for just that group. We can find abundance by habitat and year in the same way > # The bird abundance in each habitat for each year > aggregate(total ~ habitat + year, data=birds, + function(subtotal) sum(subtotal)) 1 2 3 4 5 6 7 8 habitat forested open understory water wetland forested open understory year total 2006 544 2006 34 2006 277 2006 35 2006 46 2007 319 2007 31 2007 133 14 9 water 2007 10 wetland 2007 11 forested 2008 12 open 2008 13 understory 2008 14 water 2008 15 wetland 2008 16 forested 2009 17 open 2009 18 understory 2009 19 water 2009 20 wetland 2009 21 forested 2010 22 open 2010 23 understory 2010 24 water 2010 25 wetland 2010 31 27 523 45 244 65 51 194 25 118 16 20 192 13 93 7 18 What about species richness? Again, we need to ask for what subgroups in the data set we want species richness (i.e. by year, by route, by year and route). We’ll use a different approach because now we want to count the number of group members rather than perform a calculation based on the abundance in that group. We will find the subset of the data frame we are interested in and then count how many unique bird species names are in that subset. > # Species richness for Big Sandy route all years combined > nrow(unique(subset(birds, route=="BigSandy", "species"))) [1] 71 > # Species richness for the Silver Dollar route all years combined > nrow(unique(subset(birds, route=="SilverDollar", "species"))) [1] 69 > # Species richness for wetland habitats in 2008 > nrow(unique(subset(birds, + habitat=="wetland" & year=="2008", "species"))) [1] 7 > # Species richness for the dataset > length(unique(birds$species)) [1] 83 Now for some plots. On the subsequent page we use aggregate() and subset() to give us full control over a plot. We also can specify a subset with aggregate(). This code looks complicated, but don’t be alarmed. All the principles presented 15 here have been covered in previous pages. Use ?points ?lines and ?plot.default to get more info on these functions. We could go much further to make a more useful plot, with legend etc. But for simplicity we’ll stop here. 5.2 Exercise 1. Use read.csv() to load the slktbbs.csv dataset 2. Make the column variable ”total” as shown 3. Use aggregate to find abundance totals by species 4. Use aggregate and the subset= option to find abundance totals by species in open habitats 5.3 Challenge 1. Modify the plotting code provided to use colour 2. Then, modify the code provided to create a per-year plot of abundance for four species in open habitats on the Silver Dollar route 3. Then, create a third plot showing how species richness and total bird abundance for Silver Dollar in each of five years. 16 # Put an aggregated table into an object aggBird <- aggregate(total ~ species + year, data=birds, subset=(route=="SilverDollar"), function(subtotal) sum(subtotal)) # Turn year back into a number for plotting aggBird$year <- as.integer(as.character(aggBird$year)) # Create an empty plot plot.default(x=c(min(aggBird[,"year"]), max(aggBird[, "year"])), y=c(0, max(aggBird[, "total"])), type="n", xlab="Year", ylab="Abundance") # Plot data for three species subBird <- subset(aggBird, species=="WTSP", c("year", "total")) points(subBird, pch=20) lines(subBird, lty="solid") subBird <- subset(aggBird, species=="ALFL", c("year", "total")) points(subBird, pch=21) lines(subBird, lty="dashed") subBird <- subset(aggBird, species=="COWA", c("year", "total")) points(subBird, pch=22) lines(subBird, lty="dotted", lwd=4) 80 > > + + > > > > + + > > > > > > > > > > ● ● ● 60 ● 40 Abundance ● 20 ● ● ● ● 2009 2010 0 ● 2006 2007 2008 Year Figure 1: Abundance for three species on the Silver Dollar route. Whitethroated sparrow (solid), Alder flycatcher (dashed), Connecticut warbler (dotted) 17
© Copyright 2024