R2

Open Source Statistics with R

=Writing Faster R Code=

In the above sections we've introduced a number of features of R and have begun the journey to becoming a proficient and productive user of the language. In the remaining sections, we'll switch tack and focus on a question commonly asked by those beginning to use R in anger--"My R code is slow. How can I speed it up?". In this section we'll consider the related tasks of finding which bits of your R code is responsible for the majority of the run-time and what you can do about it.

Profiling & Timing
In order to remain productive (and sane, and have a social life...), it is essential that we first identify which portions of your R code are responsible for the majority of the run-time. We could spend ages optimising a portion that we think may be running slowly, but computers have the gift(!) to constantly surprise us, and if that portion of your program accounted for, say, 10% of the run-time, then you will have sweated for absolutely no useful gain.

The simplest method of investigation is to simply time the application of a function:

You can get a more detailed analysis of a block of code using the built-in R profiler. The general pattern of invocation is:

For example, here's an R script, profile.r:

Which I ran by typing:

R CMD BATCH profile.r

In the output file, profile.r.Rout, I found the following break down:

self.time self.pct total.time total.pct "simpleLoess"      4.84    88.00       5.10     92.73 "rnorm"            0.22     4.00       0.22      4.00 "loess.smooth"     0.18     3.27       5.28     96.00

The profile tells us that the function simpleLoess take 88% of the runtime, whereas rnorm takes only 4%.

Preallocation of Memory
As with other scripting languages, such as MATLAB, the simplest method that you can use to speed up your R code is to pre-allocate the storage for variables whenever possible. To see the benefits of this, consider the following two functions:

and:

Timing calls to each of them shows that the pre-allocation of memory gives a whopping ~x30 speed-up. Your mileage will vary depending upon the details of your code.

Vectorised Operations
The other principle method for speeding up your R code is to eliminate loops whenever you can. Many functions and operators in R will accept arrays as input, rather than just single values and this may allow you to not use a loop. The examples in the previous section used for loops to step through an array, squaring each element. However, you can achieve the same result far more quickly by passing the array en masse to exponentiation operator:

Here we've been able to square 1,000,000 items in half the time it took to process 30,000!

Calling Functions Written in a Compiled Language (e.g. C or Fortran)
Another way to get more speed is to outsource portions of R code that are found to be slow to a compiled language, such as C or Fortran. A good starting point on this topic is:


 * http://mazamascience.com/WorkingWithData/?p=1067

=R and HPC=

If you've profiled your code and tried all that you can to speed it up, as described in the previous section, you might be interested in the various initiatives that exist to run R on high performance computers, such as bluecrsytal:


 * http://cran.r-project.org/web/views/HighPerformanceComputing.html

We will see in the following examples, the general approach to running R in parallel is to arrange your task so that a function is applied to a list of inputs, and then to split the list over several CPU cores or cluster worker nodes.

Multicore
The multicore package allows us to make use of several CPU cores within a single machine. Note, however, that the package does not work on a MS Windows computers.

As an example, let's look at the use of the package's mclapply function, a multicore equivalent of R's built-in list apply mapper, lapply. I saved the following commands into an R script called mutlicore.r:

And used the following submission script to run it on bluecrystal phase2:

After the job had run, I got the following output in the file multicore.r.Rout: > library(multicore) > # how many cores are present? > multicore:::detectCores [1] 8 > # Create a 10 x 10,000 matrix of random numbers > data <- lapply(1:10, function(x) {rnorm(10000)}) > # Map a function over the matrix. First in serial.. > system.time(x <- lapply(data, function(x) {loess.smooth(x,x)})) user system elapsed 0.674  0.007   0.749 > # .. and secondly in parallel (using multicore, within a node) > system.time(x <- mclapply(data, function(x) {loess.smooth(x,x)})) user system elapsed 0.301  0.074   0.113

Rmpi
The Rmpi package allows us to create and use cohorts of message passing processes from within R. It does so by providing an interface to the MPI (Message Passing Interface) library.

In order to use the Rmpi package on BCp2, you will need the ofed/openmpi/gcc/64/1.4.2-qlc module loaded.

Here's a short example that I saved as Rmpi.r:

I submitted the job to BCp2 using the following submission script:

and got the following output: > library(Rmpi) > # spawn as many slaves as possible > mpi.spawn.Rslaves 4 slaves are spawned successfully. 0 failed. master (rank 0, comm 1) of size 5 is running on: u03n074 slave1 (rank 1, comm 1) of size 5 is running on: u03n098 slave2 (rank 2, comm 1) of size 5 is running on: u04n029 slave3 (rank 3, comm 1) of size 5 is running on: u04n030 slave4 (rank 4, comm 1) of size 5 is running on: u03n074 > mpi.remote.exec(mpi.get.processor.name) $slave1 [1] "u03n098"

$slave2 [1] "u04n029"

$slave3 [1] "u04n030"

$slave4 [1] "u03n074"

> mpi.remote.exec(runif(1)) X1       X2        X3        X4 1 0.5154871 0.5154871 0.5154871 0.5154871 > mpi.close.Rslaves [1] 1 > mpi.quit

Snow
Calling MPI routines from within R may be too low level for many people to use comfortably. Happily, the snow package provides a higher level abstraction for distributed memory programming from within R.

Here's my example program that a saved as snow.r:

I ran it on BCp2 using the same submission script given for Rmpi, save for changing Rmpi.r to snow.r. The output was:

> library(snow) > # request a cluster of 3 worker nodes > cl <- makeCluster(3) Loading required package: Rmpi 3 slaves are spawned successfully. 0 failed. > clusterCall(cl, function Sys.info[c("nodename","machine")]) 1 nodename  machine "u01n105" "x86_64"

2 nodename  machine "u02n014" "x86_64"

3 nodename  machine "u03n098" "x86_64"

> # Create a 10 x 10,000 matrix of random numbers > data <- lapply(1:10, function(x) {rnorm(10000)}) > # Map a function over the matrix. First in serial.. > system.time(x <- lapply(data, function(x) {loess.smooth(x,x)})) user system elapsed 0.711  0.001   0.715 > # .. and secondly in parallel (using snow, across a cluster of workers) > system.time(x <- clusterApply(cl, data, function(x) {loess.smooth(x,x)})) user system elapsed 0.259  0.001   0.260 > stopCluster(cl)

Parallel
The parallel package is an amalgamation of functionality from the multicore and snow packages. The shared memory parallelism in this package runs on an MS Windows machine (unlike the multicore package).

I trivial translation of our previous multicore example is:

I have not been able to get a distributed memory cluster working on BCp2 using the parallel package.

=Further Reading=


 * R in a Nutshell
 * Parallel R