Imperial College/Courses/Fall2008/Synthetic Biology (MRes class)/'R' Tutorial/Practical

From OpenWetWare

Jump to: navigation, search
Fall 2008 - Synthetic Biology (MRes class)

Home        'R' Tutorial        Resources        Literature

blog stats

Introduction to 'R'



Exercise 1 (Vector and indexes)

  • First create a vector x, which contains 100 random values drawn from the standard normal distribution.
    • x <- rnorm(100)
  1. Plot x as an histogram (see hist() function)
  2. How do you form a vector which contains the entries of x at the positions 2, 30 and 67?
  3. How do you form a vector which contains all the entries of x except the first and the second?
  4. How do you create a logical vector b, whose i'th entry is TRUE if and only if the i'th entry of x is greater than -1.5.
  5. How do you select those entries of x which are greater than -1.5 and less than 1?
  6. How do you find out, how many entries of x are greater than -1.5 and less than 1?

(Try to formulate your answers so that they work not only for your particular random sample but for any random sample drawn as above.)

Suggested solutions: (it is possible to solve this exercise in many ways).

  1. hist(x)
  2. two possiblities: x[c(2, 30, 67)] OR c(x[2], x[30], x[67]) (alternative, longer way)
  3. three alternatives: x[-c(1, 2)] OR x[c(-1, -2)] OR x[3:100]
  4. b <- x > -1.5
  5. x[-1.5 < x & x < 1]
  6. sum(-1.5 < x & x < 1)

Exercise 2 (Regular sequences)

  1. Write an expression which produces a vector with the entries 0, 10, ..., 50, 60 followed by 11 equally spaced values from 70 to 100.
  2. By which command do you find out the length of the generated vector?

Suggested solution:

  1. v <- c(seq(0, 60, by = 10), seq(70, 100, length.out = 11))
  2. length(v)

Exercise 3 (Ordering data according to the values of a variable)

  • Suppose that you have to make a line plot of data which resembles the data we generate as follows.
    • x <- runif(100, -pi, pi)
    • y <- sin(x)
    • Here we first sample 100 value uniformly on the interval (-pi, pi) and then calculate the sine function.
  • Try the command: plot(x, y, type = 'l') (there is a lower case L inside the quotation marks) and notice that the resulting line drawing does not resemble the graph of the sine function.
  • The result was a line plot, where the point (x[1], y[1]) is connected to the points (x[2], y[2]), (x[3], y[3]) and so on. Since the x-values are not ordered, the line plot looks messy. Instead, you want a line plot which resembles the graph of the sine function. The trick is to sort the x vector into increasing order, and to apply the same permutation also to the y vector prior to plotting. How do you do this in practice?

Suggested solution: Since we know that y's are just sines of x's, we could sort x's first and then recalculate y's. However, the intention was to solve the exercise without using that knowledge. Then we can sort x's and reorder y's using the permutation, which sorts the x-vector.

plot(sort(x), y[order(x)], type = 'l')

If you do not understand the idea rightaway, see what are the first few values order(x) and think what happens when you index x and y, respectively, with the vector order(x). (Remember that x[order(x)] is the same thing as sort(x))

Exercise 4 (writing functions)

  1. Write a function which returns the average of three numbers which are considered as parameters.
  2. In the U.S. temperatures are usually expressed in degrees Fahrenheit (F) instead of degrees Celsius (C), which are used in the rest of the world. The conversion formula between the two temperature scales is the following.
    • C = 5 / 9 * (F - 32)
    • Write function FtoC which converts temperatures given in degrees Fahrenheit into degrees Celsius. Also write function CtoF which converts temperatures given in degrees Celsius into degrees Fahrenheit.
  1. Write a function which takes a 2D array of data generates three plots (in the same frame): one histogram of each column and one plot of the columns against each other (scatterplot).

Exercise 5 (For loop)

Supposedly, C. F. Gauss was given at school at the age of seven the problem of summing the integers 1, 2, ..., 100, and found the correct answer almost instantly (without having been told about the arithmetic series).

  1. How do you find that sum by using the function sum()?
  2. For the sake of practicing, do the same calculation using a for-loop. Of course, your first solution, with the sum-function, is much clearer and shorter.

Exercise 6 (Linear regression)

  • Create a function taking 3 input parameters (a,b,x) and returning y such as y = a*x +b
  • Create a vector X of 50 values uniformly sampled from [-5, 20]
  • Create the vector Y corresponding to the transformation of X by the function previously defined with a=-4.2 b=10.
  • Create the vector Y_noisy from Y, where you have added a 10% gaussian noise to the Y values.
  • Calculate the linear regression on Ynoisy.
  • Plot Y, Ynoisy, and the linear regression results on the same plot.
Personal tools