PIRE 2010 R Tutorial Exercises

Authors: Scott Stark and Brad Christoffersen
Sources: http://www.cyclismo.org/tutorial/R/index.html
         "An Introduction to R" at http://www.r-project.org/
         Our brains


Table of Contents


1 Introduction

We will start by loading data into the R Environment where it will become an R object with a name we have supplied. Our first R object will be a 'data frame,' which is a type of object that is conceptually similar to a spread sheet. Data is arrayed in rows and columns; columns can be special formats that correspond to the type of data that they contain, for example, categorical or 'factor' data.

Before you go any further, some spatial arrangement of windows on your screen may help. We suggest resizing this window so that you can see at least a reasonable portion of the R console window either below or beside this window.


Loading Data

NOTE: The assignment operator (‘<-’), which consists of these two characters strictly side-by-side, 'points' to the object receiving the value of the expression. Note that the assignment operator can 'point' in the other direction too (->)

Tell R where to look for data

     > WD <- getwd() # Store original working directory in 'WD'
     > WD

Set the new working directory--you need to fill in the below file path in the next line of code below! In Windows, the file explorer has an option to 'show path' that can help. On a mac just search for the file name in Finder and the full path is displayed when you click on the correct result. Also note that the mac GUI has a function 'Change Working Directory' under 'Misc'. If you use this function make sure to retrieve your path with getwd() and save it in your R script!

     > path <- "C:\\PATH\\TO\\LOCATION\\OF\\FILE" # Windows Format
     > path <- "/Users/YOU/AmazonPIRE/R_Tutorial" # Mac Format
     > setwd(path)

Important notes: 

Windows machines use "\\" to separate folders in a path designation. 
Linux and Mac use "/". This tutorial switches between them but you should use only the
form suitable for your system.

In current use, X = Y means the same thing as X <- Y. (The more familiar X = Y did
not exist in early versions of R. )
Check that "direct.diffuse.data.csv" is there in the folder (download and save from http://people.seas.harvard.edu/~swofsy/PIRE/direct.diffuse.data.csv.

     > dir() # Lists the working directory contents.
Load in Radiation data using the R function 'read.csv()'. 'as.is=TRUE' makes sure that data is not automatically formatted as factors.

     > rad.data <- read.csv("direct.diffuse.data.csv", as.is=TRUE)


Retrieving and setting important attributes of data

Take a look at the number of rows and columns in the data. 'rad.data' has 2 dimensions: Rows and Columns. 'dim(rad.data)' tells you how many rows and columns exist

     > dim(rad.data) # "[1] 13244     8" is the result (13244 rows by 8 columns)
     > nrow(rad.data)
     > ncol(rad.data)
Display the current column names

     > colnames(rad.data)
Rename the 8 columns to what we want

     > colnames(rad.data) <- c("Y", "M", "D", "H", "solar.dec", "predict.rad.Wm2",
                        "tot.rad.Wm2", "diffuse.Wm2")
'cbind()' binds vectors together as columns, making a new table. cbind( 1st vector, 2nd vector, 3rd vector, ... ). NOTE: the vectors must be the same length!! ':' gives a sequence of integers from the first to last integer. Example

     > 1:ncol(rad.data)
gives the numbers 1 2 3 4 5 6 7 8. Below we make a key of the column numbers and their names for quick reference

     > key <- cbind(1:ncol(rad.data),colnames(rad.data))
Take a look. This tells us what column names go with which column numbers

     > key


How to access elements of data in R

'[]' with numbers inside will subset elements of objects in R. '[]' is TOTALLY different than '()'! Example

     > a <- c(3,2,6,9,29)
Think of 'a' as stored like this in your computer:

#  a[1]    a[2]    a[3]    a[4]    a[5]
#-----------------------------------------
#|       |       |       |       |       |
#|   3   |   2   |   6   |   9   |  29   |
#|       |       |       |       |       |
#-----------------------------------------
Try accessing some elements of 'a'

     > a[1]        # 3
     > a[3]        # 29
     > a[1:3]      # 3 2 6
     > a[1,3,5]    # ERROR
     > a[c(1,3,5)] # 3  6  29
Why was there an Error above? use '[]' for vectors (1 dimension), and use '[,]' for matrices and dataframes (2 dimensions).

OK, back to our radiation data. Display the 1st 24 rows of data:

     > rad.data[1:24,] # All columns are displayed by default.
Display the first 24 entries of Column 1 in the data. NOTE: As a single column of data, R displays it on the screen left-to-right

     > rad.data[1:24,1]
Display the first 24 entries of Columns 1 - 5 in the data.

     > rad.data[1:24,1:5]
Displaying Columns 1 - 5, and 7, the column numbers must be given inside 'c()'

     > rad.data[1:24,c(1:5,7)]
Selecting certain columns can be done using their names too.

     > key # Look at the key first
     > rad.data[1:24,c("Y", "M", "D", "H", "solar.dec", "tot.rad.Wm2")]
Suppose we wanted to look at all columns EXCEPT columns 1-5 and 7. A minus sign can take care of that for us:

     > rad.data[1:24,-c(1:5,7)]
We can also look at single columns of dataframes using the '$' operator. Note: no comma is used in '[]' because it is only a single column

     > rad.data$predict.rad.Wm2[1:24] # R displays it on the screen left-to-right


Basic arithmetic operations on vectors and matrices

Some important arithmetic operators in R are:

     + (add)
     - (subtract)
     * (multiply)
     / (divide)
     ^ (exponent)
     %% (remainder of division)
See

     > ?"+"
to get more information on these operators. (Note that to get help on operators, you must put them in quotes, after the '?' symbol.

The order of evaluation of operators in an arithmetic expression is similar to the order learned in mathematics. Technically, R calculates things in this order: 1) Things inside '()', 2) '^', 3) '*' and '/', 4) '+' and '-'. Note that R has several other operators, (like ':'). A good rule of thumb is, when in doubt about the order something will be executed, put things in parentheses such as to get the order of execution that you desire.

     > 7 + (5 * 2) * 2 ^ 4 / 8
     > 7 + 10 * 2 ^ 4 / 8
     > 7 + 10 * 16 / 8
     > 7 + 10 * 2
     > 7 + 20
To add, subtract, multiply, and divide vectors, R performs the operation for each element of a vector. NOTE: Vectors must be the same length! Examine the output of each of the below statements.

     > a + a
     > a + 2
     > a - a
     > a - 2
     > a * a
     > a * 2
     > a / a
     > a / 2
     > a + c(1:5)
     > a + c(1:6) # Not what we want: vectors 'a' and 'c(1:6)' are not the same length!
OK, those were vectors. What about making a matrix? Note the differences below.

     > mat.a <- matrix(c(1:15), nrow=5, ncol=3, byrow=TRUE)
     > mat.a
     > mat.b <- matrix(c(1:15), nrow=5, ncol=3, byrow=FALSE)
     > mat.b
To add, subtract, multiply, and divide for matrices, NOTE that matrices must have the same dimensions!

     > mat.a + mat.a
     > mat.a + 2
     > mat.a - mat.a
     > mat.a - 2
     > mat.a * mat.a
     > mat.a * 2
     > mat.a / mat.a
     > mat.a / 2
     > mat.a / rbind(mat.a, c(1,2,3))  # ERROR One matrix has 5 rows, the other 6
For those familiar with linear algebra, it is important to note that the multiplication above with '*' is not "Matrix Multiplication" but instead the multiplication of the elments of one matrix by the corresponding elements of another. Linear algebra operators are available. For example

     > T.mat.a <- t(mat.a) # the transpose of mat.a
     > T.mat.a
     > mat.a%*%T.mat.a # is "Matrix Multiplication" (see: en.wikipedia.org/wiki/Matrix_multiplication)
To add, subtract, multiply, and divide each column of a matrix by a vector, we must turn the vector into a matrix first. For example, 'a' is 5 elements long, and 'mat.a' is 5 rows and 3 columns. 'rep(vector, n)' will repeat 'vector' n times. Note the difference:

     > rep(a, 3)
     > rep(a, each=3)
Repeat 'a' 3 times as a column, making it a 5 x 3 matrix

     > matrix(rep(a, 3), nrow=5, ncol=3, byrow=FALSE)
Turn

     > mat.a + matrix(rep(a, 3), nrow=5, ncol=3, byrow=FALSE)
We strongly suggest that you go through the first 3 tutorials (more if you like) at http://www.cyclismo.org/ tutorial/R/index.html for more basic info on classes of R objects and basic R styntax and object manipulation.


2 Subsetting and Modifying Data


2.1 Logical operators and deMorgan's Laws

Logical operators in R are very important. '|' means logical OR (Only one component must be true to evaluate to TRUE); '&' means logical AND (Both components must be true to evaluate to TRUE); '!' means logical NOT (Turns TRUE into FALSE and vice versa).

Let's examine how these work with some "logical vectors," which are nothing more than vectors of trues and falses. We will apply the logical operators on these vectors (remember that R will apply these operators to each element of the vectors).

     > a <- c(TRUE,FALSE,TRUE,FALSE)
     > b <- c(TRUE,TRUE,FALSE,FALSE)
     > a | b   # Applies logical OR to each corresponding element of 'a' and 'b'
     > a & b   # Applies logical AND to each corresponding element of 'a' and 'b'
     > !a      # Applies logical NOT to each element of 'a'
     > !b      # Applies logical NOT to each element of 'b'
Given the definitions of logical operators above, DeMorgan's Laws state that: !(a | b) is logically equivalent to !a & !b, and !(a & b) is logically equivalent to !a | !b Try it for yourself (note, we put 2 statements on the same line by separating them with a ';')

     > !(a|b); !a&!b   # These two are EQUIVALENT
     > !(a&b); !a|!b   # These two are EQUIVALENT
		# Note that the ";" tells R to consider what follows as a seperate line of code.
Save a handy table:

     > truth <- cbind(a=a, NOT.a=!a, b=b, NOT.b=!b, a.OR.b=(a|b),
      NOT_a.OR.b=!(a|b), NOT.a_AND_NOT.b=!a&!b,   # Note these two are the same
      a.AND.b=(a&b),
      NOT_a.AND.b=!(a&b), NOT.a_OR_NOT.b=!a|!b)   # Note these two are the same
     > truth


2.2 Index vectors

Now, back to our radiation data. Re-set the working directory if you need to (if you have closed and opend R again since the last time you were in this directory). Use the same path you set previously,

     > path <- "C:\\PATH\\TO\\LOCATION\\OF\\FILE"
     > setwd(path)
Load in data and rename columns

     > rad.data <- read.csv("direct.diffuse.data.csv", as.is=TRUE)
     > colnames(rad.data) <- c("Y", "M", "D", "H", "solar.dec", "predict.rad.Wm2",
                        "tot.rad.Wm2", "diffuse.Wm2")
Make the key again if you need to

     > key <- cbind(1:ncol(rad.data),colnames(rad.data)); key
When we looked at the data we saw that a lot of NaN values in the last two columns, total and diffuse radiation. Let's get rid of NaNs, which are not numbers the system recognizes, and any potential 'NAs,' or missing data. We can do this by using index (or logical) vectors in 4 steps:

1. Index vector: 'TRUE' if 'NaN' in Column 7, and 'FALSE' if 'NaN' in Column 7

     > pick.C7.nan <- is.nan(rad.data[,7])
2. Index vector: 'TRUE' if 'NA' in Column 7, and 'FALSE' if 'NA' in Column 7

     > pick.C7.na <- is.na(rad.data[,7])
3. Index vector: 'TRUE' if 'NA' OR 'NaN' in Column 7

     > pick.C7.na.nan <- pick.C7.nan | pick.C7.na
4. "Pick" out all rows which DO NOT have 'NaN' or 'NA'

     > RAD <-rad.data[!pick.C7.na.nan,]
     > RAD[1:24,]
Note that we have done this in a series of steps by creating intermediate R objects. This is not strictly necessary, but it greatly enhances readability of your code, as does the use of many informative comments. The less readable but overall more compact option is to do all the steps at once.

     > RAD.II <- rad.data[!((is.nan(rad.data[,7]))|(is.na(rad.data[,7]))),]
		# But, we do not need a second version of the same object.  Lets get rid of it.
     > rm(RAD.II)


2.3 Changing contents of data

Suppose we want to change all NaN's in Column 6 of RAD to -9999. Make a new index vector that will select only the NaN's in Column 6.

     > pick.C6.nan <- is.nan(rad.data$predict.rad.Wm2)
     > rad.data[pick.C6.nan,6] <- -9999
     > rad.data[1:24,]
Now let's change them back to NaN. '<' is another logical operator, meaning less than. It evaluates each element of the vector to see if it is less than some value.

     > pick.C6.9999 <- (rad.data$predict.rad.Wm2 < -9998)
     > rad.data[pick.C6.nan,6] <- NaN
     > rad.data[1:24,]


2.4 'for()' and 'while()': How to make your computer work for you.

2.4.1 'for()'

Suppose we wanted to insert -9999 for ALL NaN values in the data (columns 5-8). How do we do this? (Note that there are more than one solutions to this problem, which is usually true in R.) Let's utilize the same procedure as above on a column-by-column basis, except we'll use a "loop" rather than just copying and modifying the basic code for each column. NOTE: Using the "copy" aproach to doing a repatative task in R is usually unnecessary. It is better to automate the task and keep your code compact so that you can review it an understand it more easily. The for() loop--and while() loop--represent the most basic ways to automate tasks.

Execute the code line-by-line; an explanation of how it works is below.

     > rad.data[1:24,]
     > for(i in 5:8){
            pick.nan <- is.nan(rad.data[,i])
            rad.data[pick.nan,i] <- -9999
          }
     > rad.data[1:24,]
How did that work? 'for()' loops have 3 components: Sequence: in this case, 5 6 7 8. Index: 'i' in this case, which takes on each successive value in sequence. Expression: all commands inside the closed brackets '{ }'

Revert back to NaN

     > rad.data[1:24,]
     > for(i in 5:8){
     >   pick.9999 <- (rad.data[,i] < -9998)
     >   rad.data[pick.9999,i] <- NaN
     > }
     > rad.data[1:24,]

2.4.1 'while()'

Probably 'for' is the most used and important type of loop in R. Still there is another type of loop that can do things that 'for' cannot, 'while.' We will breifly intruduce this here. If you do not completely understand the example at first do not worry too much, this is something that you may want to reference when you have a little more experience. To show why the 'while' loop is special we will introduce another important function of R, RANDOM NUMBER GENERATION. R can generate random numbers from many statistical sampling distributions like the most famous normal distribution.

     > rnorm(10,mean=0,sd=5) # Gives 10 samples from a normal distribution centered on zero with a standard deviation of 5.
Suppose we want 25 random samples (or any number, 1 million, for example) from this distribution but only from between the ranges -1 and 1. This is 'TRUNCATED' random sampling. Note that many of the samples that we drew above are not in this range. How do we avoid these and get exactly 25 samples? We can use a while loop to do this. 'while()' takes the action you tell it (inside "{}") "while" a logical expression inside the '()' brackets is true.

######  Select and run everything at once from here..... 
     > random.numbers <- c(); i <- 0 # These just 'initialize' or 'set up' objects needed in the while loop. 
          while(length(random.numbers) < 25) {
          	i <- i + 1; print(i); print(random.numbers) 
	# This (line above) just lets you see the progress of the loop and is not neccesary.  
	# Try commenting it out with an "#" when you run the loop again.
	
          single.sample <- rnorm(1,mean=0,sd=5)
          if((single.sample >= -1)&(single.sample <= 1)) {random.numbers <- c(random.numbers, single.sample)}
          } 
######  .......to here.

     > i # The final output took i steps to generate th sample
     > random.numbers # Here is the final sample
     > length(random.numbers) # the vector is 25 units long.  Try making it longer, see the 25 in the code above? Change it and run it again.


2.5 if()' and using 'if()' with 'else'

You might have noticed that we snuck another important concept into the 'while' loop too: 'if()'. When the expression inside the "()" brackets is TRUE, 'if' evaluates the expression inside the "{}" braces. 'if' can be used with 'else' to create if-else statements (also, known as if-then statements).

     > atrue <- TRUE; afalse <- FALSE
     > if(atrue){c(1,2,3)} else {c(NA,NA,NA)}
     > if(afalse) {c(1,2,3)} else {c(NA,NA,NA)}
If you have multiple statements, you need to write your if() statements this way

     > if(atrue){
            c(1,2,3)
            c(4,5,6)
          } else {
            c(NA,NA,NA)
          }
Another useful if-else statement format is an R function

     > ifelse(atrue,1,0)
     > ifelse(afalse,1,0)
     > ifelse(atrue==afalse,1,0) # "atrue==afalse" tests if atrue and afalse are the same.


2.6 Help yourself: How to use R Help, and handy functions to make life easy

Suppose you need help figuring out how to use a known function

     > ?read.csv  # Opens up a help page automatically.
Note a few things on the Help page: Top left-hand corner says "read.table(utils)". This means 'read.csv' is in the family of functions "read.table". This also shows that 'read.csv' is in the "utils" package.

1. "Usage": Enclosed inside parentheses '()' are all the "arguments" of the function. Arguments followed by a '=' have a "default" value. If you supply no value for an argument, the default is used. In other words, the below

     > rad.data <- read.csv("direct.diffuse.data.csv", as.is=TRUE)
Actually is converted by R to the below before it is evaluated.

     > rad.data <- read.csv(file = "direct.diffuse.data.csv", header = TRUE, sep = ",",
        quote="\"", dec=".", fill = TRUE, comment.char="", as.is=TRUE)
If an argument has no default and you supply nothing, you get an error

     > rad.data <- read.csv( , as.is=TRUE)  # Error, no value for the 'file' argument was given
Writing the argument name before the value is necessary when the argument is out of order as shown in the Help. Below, TRUE is applied to 'header', not the 'as.is' argument.

     > rad.data <- read.csv("direct.diffuse.data.csv", TRUE)
2. "Arguments": This section describes each argument, and specifies what type it needs to be.

3. "Details": This Gives more info on how it works.

4. "Value": This tells you what kind of object the function returns. In other words, it's what gets stored in the object on the left side of "<-". NOTE: VERY USEFUL!!!

5. "Examples": Many R help pages have Examples at the bottom. Work through them! Here are some handy functions. You can use '?function' to teach yourself how to use all of these.

# round()     Rounds numbers
# trunc()     Truncates numbers
# floor()     Takes the next-lowest value numbers for a specified precision
# ceiling()   Takes the next-highest value numbers for a specified precision
# abs()       Takes the absolute value of numbers
# sign()      Tells you if numbers are positive or negative
# log()       Takes the logarithm of numbers (defaults to 'natural' base-e log, i.e., 'ln')
# sqrt()      Takes the square root of numbers
# exp()       exponential: natural 'e' raised to argument
# sin()       sine of numbers (enter argument in radians for all trig. functions)
# asin()      inverse sine of numbers
# cosh()      hyperbolic cosine of numbers
# sum()       Takes the sum of numbers
# prod()      Takes the product of numbers
# cumsum()    Takes the cumulative sum of numbers
# mean()      Takes the mean of numbers
# var()       Takes the variances of numbers
# sd()        Takes the standard deviation of numbers


2.7 Creating Your Own functions -or- Your 'Robot Army'

One of the great things about R is that you can turn virtually any piece of code--even very large pieces-- into an automated function. These functions take input of a specific format and generate specific output just like the built-in functions we introduced above. In fact, R users all over the world have gathered their most useful functions into "packages" that you can install in R and then access. We will show you how to instal a package later. First, though, a simple mathematical function.

CALCULATE SUN ANGLE. The angle of the sun above the horizon depends on the time of day, the latitude and the solar declination (which is related to latitude and the time of year and, ultimately, the tilt of the earth relative to the sun). See http://en.wikipedia.org/wiki/Solar_elevation_angle.

Solar zenith is the angle of the sun off of vertical, it is just 90 - solar elevation.

Here are the functions: (Do not worry about understanding the math, this is just an example)

beta=asin((sin(lat)*sin(sigma))+(cos(lat)*cos(sigma)*cos(15*(HR-12))))
zenith = 90 - beta
where lat is latitude in degrees, beta is the angle of the sun above the horizon (solar elevation), # sigma is the solar declination, and HR is the hour.

Our dataset contains the solar declination (sigma) and hour (HR) for each measurement. The latitude of our the Tapajos is -2.856648, so we have all of the data we need to calculate the predicted solar zenith for each data point. Lets do this, but lets use a function where we can set the latitude to anywhere and use data that is in a data frame of just time and solar declination angles.

First, though, lets start simpler. R has all normal triganometry functions built in like 'arcsine,' e.g., asin(), but the input must be in radians but many of us are accustomed to thinking in degrees. Let's make a functions that convert degrees to radians and vice versa.

     > radfx <- function(x){x*(pi/180)}
     > degfx <- function(x){x*(180/pi)}

     > rad.lat <- radfx(-2.856648); rad.lat 
     > degfx(rad.lat) 
	# Works on a vector of angles too
     > rads.ii <- radfx(0:90); rads.ii
     > degfx(rads.ii)
So you see we have created our own R functions, radfx() and degfx() !

Now let's make the important function for solar zenith (output in radians)

######  Select and run everything at once from here..... 
     > solar.zenith.fx <- function(lat.in.degrees, data) {
	# Create a vector of this latidude value, in radians, that is as long as
	# the rumber of rows in the data.
		# With this vector we can use simple format to calculate all of the values at once.  
	radians.lat.vec <- rep(radfx(lat.in.degrees), nrow(data))
	# The radian version of solar declination
	radians.HR.term <- radfx(15*(data[,1]-12))
	# Calculate the solar elevation with the function above
	radians.sigma <- radfx(data[,2])
	# The radian version of the hour angle term; e.g., 15*(HR-12)
	radians.solar.elevation <- asin((sin(radians.lat.vec)*sin(radians.sigma))+(cos(radians.lat.vec)*cos(radians.sigma)*cos(radians.HR.term)))
	# Now the function output, the solar zenith 
	radfx(90) - radians.solar.elevation
	} # End Solar zenith function definition
######  .......to here.
Lets apply it to the data, it has no defaults so we need to enter these.

     > solar.zenith.radians <- solar.zenith.fx(lat.in.degrees=-2.856648, data=rad.data[,c(4,5)])
Is it the right size to be a new data column? (It is supposed to be.)

     > length(solar.zenith.radians)==nrow(rad.data)
How did we know to enter columns 4 & 5 of rad data? We used the key we made above. The key told us that the hour was in column 4 and the solar declination in column 5. In the function, 'data[,1]' is the hour and 'data[,2]' is the solar declination. So, since we said 'data=rad.data[,c(4,5)]' and 'rad.data[,c(4,5)]' has two columns, hour and solar declination we entered the right thing.

Lets plot to see if our results make sense. Plot total radiation vs. solar zenith angle

     > plot(solar.zenith.radians, rad.data[,7], xlab="Solar Zenith Angle (radians)", ylab="Total Solar Radiation (W m^-2)", main="Total Radiation vs. Solar Zenith Angle") 
We will cover a little more on plotting in the next part of this tutorial. Does this figure make sense? Remember, solar zenith angle gets bigger as the sun moves away from being directly overhead. Our function does not tell us if the angle is negative or positive, we could go back and figure this our based upon whether it is before or after solar noon (when the sun hits a zenith of 0; we could say that after noon the angle becomes negative) but we need not do that to interpret the general pattern. So, what do you think? Have we done the calculation right?

(Do note that for some reason this dataset does not have many values from noon. This a feature of the data, not our calculations.)


3 Analyzing Data


3.1 Aggregating data to a desired timescale

Right now we have data every hour. Suppose we wanted daily average data (average the 24 hours for each day). We can use the 'aggregate()' function to do this.

Re-set the working directory if you need to (if you have closed and opened R again since the last time you were in this directory).

     > path <- "C:\\PATH\\TO\\LOCATION\\OF\\FILE"
     > setwd(path)
If necessary, load in data and rename columns

     > rad.data <- read.csv("direct.diffuse.data.csv", as.is=TRUE)
     > colnames(rad.data) <- c("Y", "M", "D", "H", "solar.dec", "predict.rad.Wm2",
                        "tot.rad.Wm2", "diffuse.Wm2")
Open the help page and read the argument descriptions for 'x', 'by', and 'FUN'

     > ?aggregate
Now we will take the mean of columns 6-8 in the data. We must specify the columns we will average by. NOTE: We need them to be in the order of fastest varying first (day, month, year). This is why we must use columns ordered as 3:1 instead of 1:3. Try it the other way (1:3), look at the first 5 rows, and you will see what we mean.

     > rad.data.ag <- aggregate(rad.data[,6:8], by=rad.data[,3:1], FUN=mean, na.rm=TRUE)
     > rad.data.ag[1:5,] # Daily averaged data.
NOTE: If you want to average based only on 1 column of data (like hour), do it this way:

     > rad.data.ag <- aggregate(rad.data[,6:8], by=list(H=rad.data[,4]), FUN=mean, na.rm=TRUE)
     > rad.data.ag # Average diurnal cycle of data.
     > ?aggregate


3.2 Plotting data in different ways

The important functions to learn are: 'plot()', for both general and advanced plotting, 'hist()', for histograms, and 'par()' for setting graphing parameters to make your plots look cool.

hist() gives us a histogram with the x-axis split up into bins of values present in the data. The y-axis tells you how many datapoints are in a given bin.

     > hist(rad.data$tot.rad.Wm2)
Look at the histogram only for daytime values

     > pick.day <- rad.data$H < 19 & rad.data$H > 5
     > hist(rad.data$tot.rad.Wm2[pick.day])
'plot()' allows us to plot either 2 variables against each other (a scatterplot), or to plot data on one axis, and an index (usually time) on the other axis (usually x-axis). Let's plot our daily-averaged total data we calculated above 'rad.data.ag'

First, let's aggregate to daily values.

     > rad.data.ag <- aggregate(rad.data[,6:8], by=rad.data[,3:1], FUN=mean, na.rm=TRUE)
     > rad.data.ag[1:5,] # Daily averaged data.
Approach #1 to plotting: Give plot() just the data; it will plot using a default index on the x-axis, using the order that the data is ordered in. This simple approach, however, gives you no x-axis values for time.

     > plot(rad.data.ag$predict.rad.Wm2, col="blue", ylab="Radiation (W m-2)")
     > points(rad.data.ag$tot.rad.Wm2, col="red")
Approach #2 to plotting: Give plot() a time object for the x-axis, and the data for the y-axis. Now we will get, by default, an x-axis with month and year. To do this, we will use some functions in a new package called 'chron'.

Let's download and install, then load the 'chron' package.

     > install.packages("chron") # This downloads the package. Select a location closest to you.
     > library(chron) # The package must be loaded
Create a chron date/time object

     > rad.time <- chron(dates=paste(rad.data.ag$M, "/", rad.data.ag$D, "/", rad.data.ag$Y, sep=""))
Plot the data

     > plot(rad.time, rad.data.ag$predict.rad.Wm2, col="blue")
     > points(rad.time, rad.data.ag$tot.rad.Wm2, col="red")
The par() function allows you to change all kinds of plotting parameters Take a quick look at what it can do and try an example or two at the bottom of the help page.

     > ?par


3.3 Joining different datasets together

Take a look at the help for the 'merge' function (i.e., type: ?merge), also think how you could use the function findInterval().
Here is a little script that makes a matrix with 2 columns, the first with integers
1:20, the 2nd with random numbers in the odd rows and NA in the evens. 
# Run this on your computer, figure out what was done at each step.
x=1:20
y=cbind(seq(1,19,2),rnorm(9)) ;
N=findInterval(y[,1],x) ;
X1=cbind(x,rep(NA,20)) ;
X1[N,2]=y[,2] ;
print(X1)


3.4 A little statistics (i): Doing a linear regression

See tutorial at http://www.cyclismo.org/tutorial/R/linearLeastSquares.html


3.5 A little statistics (ii): Doing SMA regression

See SMATR (smatr) and lmodel2 (lmodel2) under "Packages" on the R website cran.r-project.org and use Web resources to learn about Robust Linear Regressions, "Fitting a line with errors in both X and Y", etc.


3.6 Saving your results

The important functions for saving your work are 'save()', 'dump()', and 'write.csv()'. 'save()' saves only the objects you choose to a desired location. See '?save' for more details.

     > save(rad.data.ag, file=paste(path, "\\rad.data.ag", sep="")) 
'save.image()' saves ALL your objects in one file. Read in later using 'load()'

     > save.image()
'write.csv' save only 1 object (usually a matrix or data frame) of your choosing. Use this for saving data which might need to distribute to a colleague, or open up in another program (e.g., Excel). Note: We request it not to save the row numbers by specifying 'row.names=FALSE' as an argument.

     > write.csv(rad.data.ag, paste(path, "\\rad.data.ag.csv", sep=""), row.names=FALSE)