In this section we will cover:
A working directory is the folder on your computer where you are currently working. You can find out your current working directory by typing getwd()
> getwd()
[1] "/Users/zhian/Documents/Everhart/IntroR"
If you’ve downloaded and un-zipped this directory to your desktop, you might see something like /Users/<yourname>/Desktop/IntroR
. This is the default place where R will begin reading and writing files. For example, you can use the function list.files()
to view the files in your current working directory. These are the same files that we downloaded earlier. If you’re using Rstudio, you can compare the file list with the “Files” tab in the bottom right panel.
In order to use list.files()
, we should provide it with a file path. We can provide it a “.”, which means “here” to most computer programs.
> list.files(".")
[1] "analysis.R" "data"
[3] "docs" "IntroR.Rproj"
[5] "LICENSE" "Makefile"
[7] "Part1-Introduction.R" "Part2-Analysis.R"
[9] "Part3-Visualization.R" "parts.gif"
[11] "README.md"
You can see that the first entry in here is “data”. This is where we’ve placed the fungicide example data.
> list.files("data")
[1] "FungicideExample.csv" "README.md"
We can use the read.table()
function to read these data in to R. It’s important to remember that while in R, these data are simply a copy kept in memory, not on the disk, so we don’t have to worry too much about accidentally deleting the data :).
So, how do we actually USE the read.table()
function? A good first step to figuring out how you can use a function is to look at it’s help page. The way you can do that is by typing either help(“function_name”) or ?function_name.
> stop("
+
+ Type ?read.table and answer these three questions:
+
+ 1. What does it do? (Description)
+ 2. What are the first three arguments and their defaults? (Usage/Arguments)
+ 3. What does it return? (Value)
+
+ ")
Error in eval(expr, envir, enclos):
Type ?read.table and answer these three questions:
1. What does it do? (Description)
2. What are the first three arguments and their defaults? (Usage/Arguments)
3. What does it return? (Value)
In order to read our data into R, we will need to provide three things:
Now that we have these elements, we can read the data into a variable, which we can call “fungicide”. Once we do this, we can check the dimensions to make sure that we have all of the data.
> fungicide <- read.csv("data/FungicideExample.csv", header = TRUE, sep = ",")
> nrow(fungicide)
[1] 9
> ncol(fungicide)
[1] 7
We can print the data to screen by simply typing its name
> fungicide
Julian.Date TwentyOneThirtySevenWheat TwentyOneThirtySevenWheat.trt
1 97 0.00 0.00
2 104 0.00 0.00
3 111 0.00 0.00
4 118 0.00 0.00
5 125 0.00 0.00
6 132 0.00 0.00
7 139 2.34 1.81
8 146 7.56 7.89
9 154 28.78 15.04
CutterWheat CutterWheat.Trt JaggerWheat JaggerWheat.Trt
1 0.00 0.00 0.00 0.00
2 0.00 0.00 0.00 0.00
3 0.00 0.00 0.00 0.00
4 0.00 0.00 0.00 0.00
5 0.00 0.00 0.00 0.00
6 0.00 0.00 0.00 0.00
7 1.15 1.79 1.85 2.27
8 3.62 2.40 6.92 5.00
9 17.89 6.21 47.39 20.17
We can also use the str()
function (short for “structure”) to have a broad overview of what our data looks like. This is useful for data frames with many columns.
> str(fungicide)
'data.frame': 9 obs. of 7 variables:
$ Julian.Date : int 97 104 111 118 125 132 139 146 154
$ TwentyOneThirtySevenWheat : num 0 0 0 0 0 ...
$ TwentyOneThirtySevenWheat.trt: num 0 0 0 0 0 ...
$ CutterWheat : num 0 0 0 0 0 ...
$ CutterWheat.Trt : num 0 0 0 0 0 0 1.79 2.4 6.21
$ JaggerWheat : num 0 0 0 0 0 ...
$ JaggerWheat.Trt : num 0 0 0 0 0 ...
The data presented here are from http://www.apsnet.org/edcenter/advanced/topics/EcologyAndEpidemiologyInR/DiseaseProgress/Pages/StripeRust.aspx and consist of three cultivars of wheat treated with different fungicides Jagger wheat is a succeptible variety whereas Cutter and 2137 are both resistant varieties. It is assumed that the fungicide will only be able to prevent new infections for two weeks after applicaton. With these data, we want to answer the following questions:
To answer these questions, we will use the summary statistic, Area Under the Disease Progress Curve (AUDPC).
So, how do you calculate this? You could code the trapezoid rule yourself, OR you could find a package that was designed for this.
> stop("
+
+ Do an internet search for AUDPC in R. What did you find?
+
+ ")
Error in eval(expr, envir, enclos):
Do an internet search for AUDPC in R. What did you find?
The first thing that likely popped up was the function audpc()
in the agricolae package. If we want to use it, we can download the package to our computer with the function install.packages()
. This will install a package from CRAN and place it into your R Library. Where is your R library? Type .libPaths()
to find out!
> .libPaths()
>
> install.packages("agricolae", repos = "https://cran.rstudio.com")
Now that we have the agricolae package installed, we can load into our R workspace using the library()
function, which looks in our library and loads the functions and data sets within that package. You can find out about what functions are avialable with help(package = “agricolae”).
To accomplish our task, however, we need to find out how to use the audpc()
function. First we should load the agricolae package
> library("agricolae")
Now we can look up help for audpc()
> ?audpc
This function will take in a data frame of severity and a vector of dates. Since our vector of dates is in the first column, we will need to manipulate the data frame by subsetting.
> jdate <- fungicide$Julian.Date # this is also the first column
> jdate
[1] 97 104 111 118 125 132 139 146 154
> fungicide.audpc <- audpc(evaluation = fungicide[, -1], dates = jdate, type = "relative")
Error:
The number of dates of evaluation
must agree with the number of evaluations
Well this doesn’t look good. What can this error message mean? Let’s take a look at the audpc help page one more time; this time, we’ll look at the Examples section. Try to copy and paste the code one line at a time and see what happens. Check the evaluaton data relative to the dates and see how it’s relevant.
> stop("
+
+ Stop and look at the examples from ?audpc.
+
+ ")
Error in eval(expr, envir, enclos):
Stop and look at the examples from ?audpc.
We can see from example 3, that the data must be arranged where dates are in separate columns. We need to transpose our data. To do this, we can use the t()
function:
> t(fungicide[, -1])
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
TwentyOneThirtySevenWheat 0 0 0 0 0 0 2.34 7.56
TwentyOneThirtySevenWheat.trt 0 0 0 0 0 0 1.81 7.89
CutterWheat 0 0 0 0 0 0 1.15 3.62
CutterWheat.Trt 0 0 0 0 0 0 1.79 2.40
JaggerWheat 0 0 0 0 0 0 1.85 6.92
JaggerWheat.Trt 0 0 0 0 0 0 2.27 5.00
[,9]
TwentyOneThirtySevenWheat 28.78
TwentyOneThirtySevenWheat.trt 15.04
CutterWheat 17.89
CutterWheat.Trt 6.21
JaggerWheat 47.39
JaggerWheat.Trt 20.17
Now we can plug this into the audpc()
function:
> fungicide.audpc <- audpc(evaluation = t(fungicide[, -1]), dates = jdate, type = "relative")
> fungicide.audpc
TwentyOneThirtySevenWheat TwentyOneThirtySevenWheat.trt
0.033017544 0.023158772
CutterWheat CutterWheat.Trt
0.018729825 0.009714035
JaggerWheat JaggerWheat.Trt
0.044633333 0.023521053
This gives us a vector with the AUDPC values per treatment. The vector format does not make it particularly easy to compare the effect of cultivar or treatment. A better format to store these data would be in a table. Since all the values are numeric, we can place them into a matrix using the matrix()
function. We are going to put the samples in rows and experiments in columns.
> fungicide.res <- matrix(fungicide.audpc, nrow = 3, ncol = 2, byrow = TRUE)
> fungicide.res
[,1] [,2]
[1,] 0.03301754 0.023158772
[2,] 0.01872982 0.009714035
[3,] 0.04463333 0.023521053
We have our result, but there are no labels for the rows and columns. Notice how we selected byrow = TRUE
. This means that the matrix was filled row by row. So the first two elements of the vector would go in the first row, the next two would go in the second, and so on and so forth.
> names(fungicide.audpc)[1:2]
[1] "TwentyOneThirtySevenWheat" "TwentyOneThirtySevenWheat.trt"
> names(fungicide.audpc)[3:4]
[1] "CutterWheat" "CutterWheat.Trt"
> names(fungicide.audpc)[5:6]
[1] "JaggerWheat" "JaggerWheat.Trt"
Since this is a small matrix, we can easily name these rows and columns ourselves using the rownames()
and colnames()
functions.
> rownames(fungicide.res) <- c("2137", "Cutter", "Jagger")
> colnames(fungicide.res) <- c("Control", "Treated")
> fungicide.res
Control Treated
2137 0.03301754 0.023158772
Cutter 0.01872982 0.009714035
Jagger 0.04463333 0.023521053
If we wanted to use these data as a table in a paper, we should export it to a csv file. We do this using the function write.table()
> dir.create("results")
> write.table(fungicide.res, file = "results/audpc.csv", sep = ",",
+ col.names = NA,
+ row.names = TRUE)