Let’s Practice!

In this section we will cover:

  1. Working directories
  2. Reading in data
  3. Finding help
  4. Basic data manipulation (renaming, subsetting, rotating)

Part I: Working Directories

A working directory is the folder on your computer where you are currently working. You can find out your current working directory by typing getwd()

> getwd()
[1] "/Users/zhian/Documents/Everhart/IntroR"

If you’ve downloaded and un-zipped this directory to your desktop, you might see something like /Users/<yourname>/Desktop/IntroR. This is the default place where R will begin reading and writing files. For example, you can use the function list.files() to view the files in your current working directory. These are the same files that we downloaded earlier. If you’re using Rstudio, you can compare the file list with the “Files” tab in the bottom right panel.

In order to use list.files(), we should provide it with a file path. We can provide it a “.”, which means “here” to most computer programs.

> list.files(".")
 [1] "analysis.R"            "data"                 
 [3] "docs"                  "IntroR.Rproj"         
 [5] "LICENSE"               "Makefile"             
 [7] "Part1-Introduction.R"  "Part2-Analysis.R"     
 [9] "Part3-Visualization.R" "parts.gif"            
[11] "README.md"            

You can see that the first entry in here is “data”. This is where we’ve placed the fungicide example data.

> list.files("data")
[1] "FungicideExample.csv" "README.md"           

Part II: Reading in Data

We can use the read.table() function to read these data in to R. It’s important to remember that while in R, these data are simply a copy kept in memory, not on the disk, so we don’t have to worry too much about accidentally deleting the data :).

So, how do we actually USE the read.table() function? A good first step to figuring out how you can use a function is to look at it’s help page. The way you can do that is by typing either help(“function_name”) or ?function_name.

> stop("
+ 
+      Type ?read.table and answer these three questions:
+      
+      1. What does it do? (Description)
+      2. What are the first three arguments and their defaults? (Usage/Arguments)
+      3. What does it return? (Value)
+      
+      ")
Error in eval(expr, envir, enclos): 

     Type ?read.table and answer these three questions:
     
     1. What does it do? (Description)
     2. What are the first three arguments and their defaults? (Usage/Arguments)
     3. What does it return? (Value)
     
     

In order to read our data into R, we will need to provide three things:

  1. The path to the data set : data/FungicideExample.csv
  2. If the first row are column names : yes
  3. The separator for each cell in the data : comma

Now that we have these elements, we can read the data into a variable, which we can call “fungicide”. Once we do this, we can check the dimensions to make sure that we have all of the data.

> fungicide <- read.csv("data/FungicideExample.csv", header = TRUE, sep = ",")
> nrow(fungicide)
[1] 9
> ncol(fungicide)
[1] 7

We can print the data to screen by simply typing its name

> fungicide
  Julian.Date TwentyOneThirtySevenWheat TwentyOneThirtySevenWheat.trt
1          97                      0.00                          0.00
2         104                      0.00                          0.00
3         111                      0.00                          0.00
4         118                      0.00                          0.00
5         125                      0.00                          0.00
6         132                      0.00                          0.00
7         139                      2.34                          1.81
8         146                      7.56                          7.89
9         154                     28.78                         15.04
  CutterWheat CutterWheat.Trt JaggerWheat JaggerWheat.Trt
1        0.00            0.00        0.00            0.00
2        0.00            0.00        0.00            0.00
3        0.00            0.00        0.00            0.00
4        0.00            0.00        0.00            0.00
5        0.00            0.00        0.00            0.00
6        0.00            0.00        0.00            0.00
7        1.15            1.79        1.85            2.27
8        3.62            2.40        6.92            5.00
9       17.89            6.21       47.39           20.17

We can also use the str() function (short for “structure”) to have a broad overview of what our data looks like. This is useful for data frames with many columns.

> str(fungicide)
'data.frame':   9 obs. of  7 variables:
 $ Julian.Date                  : int  97 104 111 118 125 132 139 146 154
 $ TwentyOneThirtySevenWheat    : num  0 0 0 0 0 ...
 $ TwentyOneThirtySevenWheat.trt: num  0 0 0 0 0 ...
 $ CutterWheat                  : num  0 0 0 0 0 ...
 $ CutterWheat.Trt              : num  0 0 0 0 0 0 1.79 2.4 6.21
 $ JaggerWheat                  : num  0 0 0 0 0 ...
 $ JaggerWheat.Trt              : num  0 0 0 0 0 ...

Part III: Finding the right tool for the job

The data presented here are from http://www.apsnet.org/edcenter/advanced/topics/EcologyAndEpidemiologyInR/DiseaseProgress/Pages/StripeRust.aspx and consist of three cultivars of wheat treated with different fungicides Jagger wheat is a succeptible variety whereas Cutter and 2137 are both resistant varieties. It is assumed that the fungicide will only be able to prevent new infections for two weeks after applicaton. With these data, we want to answer the following questions:

  1. How does the size of the fungicide effect compare to the effect of cultivar?
  2. Would it be the same as cultivar?

To answer these questions, we will use the summary statistic, Area Under the Disease Progress Curve (AUDPC).

So, how do you calculate this? You could code the trapezoid rule yourself, OR you could find a package that was designed for this.

> stop("
+ 
+     Do an internet search for AUDPC in R. What did you find?
+ 
+     ")
Error in eval(expr, envir, enclos): 

    Do an internet search for AUDPC in R. What did you find?

    

The first thing that likely popped up was the function audpc() in the agricolae package. If we want to use it, we can download the package to our computer with the function install.packages(). This will install a package from CRAN and place it into your R Library. Where is your R library? Type .libPaths() to find out!

> .libPaths()
> 
> install.packages("agricolae", repos = "https://cran.rstudio.com")

Now that we have the agricolae package installed, we can load into our R workspace using the library() function, which looks in our library and loads the functions and data sets within that package. You can find out about what functions are avialable with help(package = “agricolae”).

To accomplish our task, however, we need to find out how to use the audpc() function. First we should load the agricolae package

> library("agricolae")

Now we can look up help for audpc()

> ?audpc

This function will take in a data frame of severity and a vector of dates. Since our vector of dates is in the first column, we will need to manipulate the data frame by subsetting.

> jdate <- fungicide$Julian.Date # this is also the first column
> jdate
[1]  97 104 111 118 125 132 139 146 154
> fungicide.audpc <- audpc(evaluation = fungicide[, -1], dates = jdate, type = "relative")
Error:
The number of dates of evaluation 
must agree with the number of evaluations

Well this doesn’t look good. What can this error message mean? Let’s take a look at the audpc help page one more time; this time, we’ll look at the Examples section. Try to copy and paste the code one line at a time and see what happens. Check the evaluaton data relative to the dates and see how it’s relevant.

> stop("
+ 
+      Stop and look at the examples from ?audpc.
+      
+      ")
Error in eval(expr, envir, enclos): 

     Stop and look at the examples from ?audpc.
     
     

We can see from example 3, that the data must be arranged where dates are in separate columns. We need to transpose our data. To do this, we can use the t() function:

> t(fungicide[, -1])
                              [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
TwentyOneThirtySevenWheat        0    0    0    0    0    0 2.34 7.56
TwentyOneThirtySevenWheat.trt    0    0    0    0    0    0 1.81 7.89
CutterWheat                      0    0    0    0    0    0 1.15 3.62
CutterWheat.Trt                  0    0    0    0    0    0 1.79 2.40
JaggerWheat                      0    0    0    0    0    0 1.85 6.92
JaggerWheat.Trt                  0    0    0    0    0    0 2.27 5.00
                               [,9]
TwentyOneThirtySevenWheat     28.78
TwentyOneThirtySevenWheat.trt 15.04
CutterWheat                   17.89
CutterWheat.Trt                6.21
JaggerWheat                   47.39
JaggerWheat.Trt               20.17

Now we can plug this into the audpc() function:

> fungicide.audpc <- audpc(evaluation = t(fungicide[, -1]), dates = jdate, type = "relative")
> fungicide.audpc
    TwentyOneThirtySevenWheat TwentyOneThirtySevenWheat.trt 
                  0.033017544                   0.023158772 
                  CutterWheat               CutterWheat.Trt 
                  0.018729825                   0.009714035 
                  JaggerWheat               JaggerWheat.Trt 
                  0.044633333                   0.023521053 

This gives us a vector with the AUDPC values per treatment. The vector format does not make it particularly easy to compare the effect of cultivar or treatment. A better format to store these data would be in a table. Since all the values are numeric, we can place them into a matrix using the matrix() function. We are going to put the samples in rows and experiments in columns.

> fungicide.res <- matrix(fungicide.audpc, nrow = 3, ncol = 2, byrow = TRUE)
> fungicide.res
           [,1]        [,2]
[1,] 0.03301754 0.023158772
[2,] 0.01872982 0.009714035
[3,] 0.04463333 0.023521053

We have our result, but there are no labels for the rows and columns. Notice how we selected byrow = TRUE. This means that the matrix was filled row by row. So the first two elements of the vector would go in the first row, the next two would go in the second, and so on and so forth.

> names(fungicide.audpc)[1:2]
[1] "TwentyOneThirtySevenWheat"     "TwentyOneThirtySevenWheat.trt"
> names(fungicide.audpc)[3:4]
[1] "CutterWheat"     "CutterWheat.Trt"
> names(fungicide.audpc)[5:6]
[1] "JaggerWheat"     "JaggerWheat.Trt"

Since this is a small matrix, we can easily name these rows and columns ourselves using the rownames() and colnames() functions.

> rownames(fungicide.res) <- c("2137", "Cutter", "Jagger")
> colnames(fungicide.res) <- c("Control", "Treated")
> fungicide.res
          Control     Treated
2137   0.03301754 0.023158772
Cutter 0.01872982 0.009714035
Jagger 0.04463333 0.023521053

If we wanted to use these data as a table in a paper, we should export it to a csv file. We do this using the function write.table()

> dir.create("results")
> write.table(fungicide.res, file = "results/audpc.csv", sep = ",", 
+             col.names = NA,
+             row.names = TRUE)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.