-
Notifications
You must be signed in to change notification settings - Fork 4
/
08-rcpp.Rmd
544 lines (453 loc) · 19 KB
/
08-rcpp.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
# High performance computing
## Rcpp: using C++ within R
While R is a relatively new language essentially tailored for statisticians and data
analysts, C++ is a well-established general-purpose programming language.
This chapter does not aim at providing an overview of programming in C++ but rather how it can be used
to address specific problems related to R.
The point of view taken here is that of an R programmer with no notions of C++
or similar languages. Therefore we will not go into details concerning fundamental notions of C++ but rather provide
some necessary concepts in order to start programming with this language and use it within R.
Having said this, R is an [interpreted language](https://en.wikipedia.org/wiki/Interpreted_language)
which makes it particularly inefficient when performing tasks that require a repetition of operations such as `for` and `while` loops,
or recursive calling of functions. On the other hand, C++ is a
[compiled language](https://en.wikipedia.org/wiki/Compiled_language) which means that,
before being able to use your code, it is translated (compiled) to lower-level
machine language which allows to considerably accelerate the code running time.
Note that C++ is just one of many solutions that can make the execution of R code more efficient such as [vectorization](http://www.dartistics.com/fast-r-code.html) and
[memory management](http://adv-r.had.co.nz/memory.html). [Profiling](http://adv-r.had.co.nz/Profiling.html)
also helps to identify which parts of the code are the slowest to execute and therefore provides useful information
to understand where to improve code efficiency.
Another note that is of interest is that C++ compiles *ahead of time* (before using the code), but there are other
approaches that compile *just in time* (at runtime) such as the [Julia language](https://julialang.org/).
### Installation
Working with C++ in R does not come "out of the box", but it is nevertheless a well handled procedure due to the latest contributions to R. In order to make use of these tools, you basically need two elements: a C++ compiler and a
program that permits to connect R and C++. For the latter, you need to install
the `Rcpp` package:
```{r,eval=F}
# For CRAN official release
install.packages("Rcpp")
# For development version
devtools::install_github("RcppCore/Rcpp")
```
For the former, it depends on your OS:
1. On Windows, install `RTools`.
2. On Mac, install `Xcode` from the App Store.
3. On Linux, you need to install `r-bas-dev` (or similar) from
your package manager.
Verify that your installation works by running:
```{r, echo=TRUE, message=FALSE, warning=FALSE, include=TRUE, eval=FALSE}
require(Rcpp)
cppFunction("double sumC(double x, double y){
double s;
s = x + y;
return s;
} ")
sumC(3.5, 6.5)
```
```{block2, type="rmdcaution"}
In C++, a line of code must end with a semicolon `;` ottherwise the compiler will return an error.
```
The function `sumC(x,y)` takes two doubles (numeric) values (`x` and `y`)
and returns a double (the sum of the two values). Can you predict what happens if you run the following lines? Can you explain why?
```{r, eval = FALSE}
sumC(3L, 6L)
sumC(TRUE, FALSE)
sumC(c(3,4), c(5,6))
```
### A first program
The above `cppFunction()` creates an *inline* C++ function (within your R code).
However, it is a better practice to create a separate file dedicated to C++ functions.
It is possible then to compile your C++ functions and call them within R.
In RStudio, you can create a C++ file by clicking on *File* > *New File* > *C++ File*.
RStudio provides you with an example. At the top of C++ file, you should see:
```{Rcpp, eval = FALSE}
#include <Rcpp.h>
using namespace Rcpp;
```
The first line tells your program that you can access functionalities from the `Rcpp` library.
The second line tells your program that you can access those functionalities without specifying/loading
the `Rcpp` library. For example, suppose there is function `mean()` in `Rcpp`.
With the first line, you could use this function only by typing `Rcpp::mean()`.
Adding the second line allows you to directly write `mean()` in your program.
This is somehow equivalent to using `library(Rcpp)` in your R code.
Once the C++ code is written, it should be saved with the appropriate extension which is either
`*.cc` or `*.cpp`. Now you can compile your first C++ program and for this type the following command in your R file:
```{r, eval=FALSE}
sourceCpp("file_name.cpp")
```
To test this, try to write the `sumC()` function within a separate C++ file and compile it in R.
By default, C++ functions are not accessible within your R program.
One key element is to use the attribute `[[Rcpp::export]]` in comment
**just before the function definition**. For example, the separate C++ file could look something like this:
```{Rcpp, eval=FALSE}
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sumC(
double x,
double y
) {
double s;
s = x + y;
return s;
}
```
```{block2, type="rmdnote"}
In C++, the symbol `//` is used for single line comment. The symbol `/*` starts a comment
which can possibly cover multiple lines and `*/` ends the comment.
```
### Data structures
One difference you can immediately notice while working with C++ is
that you need to tell the program what type of data structure you
intend to use before using it. This makes the writing (and execution) of the code "safer" since
the programmer pay more attention to the implementation details
thereby preventing the code from being used in a way for which it was not intended.
On the other hand, a drawback of this characteristic is that the programmer will have to spend (at least at the beginning)
more time conceiving the code.
Here is a list of primitive data types with their keywords:
- `int` (4) : integer,
- `float` (4) : single precision floating number (numeric),
- `double` (8) : double precision floating number (numeric),
- `bool` : logical (true or false),
- `char` : character,
- `void` : without any value.
The numbers in parenthesis represents typical memory (in bytes) required to store the variable.
Single and double precision floating numbers represent how the computer
stores real numbers in the memory. `float` requires less memory than `double`
but is less precise. As a general rule, we only use type `double` since this is the default type in R:
> R has no single precision data type. All real numbers are stored in double precision format.
>
> `help(double)` in R
These data types can be modified by using one (or more) of the following keywords:
- `long` : allocate more space in memory,
- `short` : allocate less space in memory,
- `unsigned` : numbers take only positive value.
For more complex data structures such as vectors and matrices, the story is slightly more complicated.
Luckily for R programmers, `Rcpp` provides many predefined structures that make
the transition to C++ easier. Here is a non-exhaustive list:
Table : Common data structure in R with Rcpp equivalent.
Rcpp | R | Role
--- | --- | ---
`NumericVector` | `c(...)` or `vector(mode = "double", ...)` | A vector of double
`IntegerVector` | `c(...)` or `vector(mode = "integer", ...)` | A vector of integer
`LogicalVector` | `c(...)` or `vector(mode = "bool", ...)` | A vector of boolean
`NumericMatrix` | `matrix()` | A matrix of double
`IntegerMatrix` | `matrix()` | A matrix of integer
`LogicalMatrix` | `matrix()` | A matrix of boolean
`List`| `list` | A list
`DataFrame` | `data.frame()` | A data frame
#### Vectors
Before using a vector in C++, you need to create one.
For example,
```{Rcpp, eval=FALSE}
NumericVector v(10);
```
creates an empty vector of double of length 10.
To access elements of a vector, the syntax is the same as in R with one notable exception:
**accessor starts at 0**! So if you want to access the first element of a vector `v`,
you should write `v[0]`, the ith element, `v[i-1]`. Note that you can equivalently
write `v(0)` and `v(i-1)`.
The length of the vector `v` can be obtained as follows
```{Rcpp, eval=FALSE}
int n = v.length();
```
Here the length of the vector is saved into an integer `n`.
Note that `length()` is a *method* for the *class* `NumericVector`
and uses the syntax ''dot-function''. Theses concepts are beyond
the scope of this introduction and the only aspect to underline is that the above
notation gives the same result as if you wrote `length(v)` in R.
#### Matrices
Creating a matrix is similar to creating a vector. The command
```{Rcpp, eval=FALSE}
NumericMatrix m(3,4);
```
creates an empty matrix with 3 rows and 4 columns.
This is equivalent to writing the following in R
```{r, eval=FALSE}
m <- matrix(nr=3,nc=4)
```
The syntax for accessing matrix elements is slightly different.
For example, accessing a column or a row can be done as follows
```{Rcpp, eval=FALSE}
// Copy first column into a vector
NumericVector v = m( _ , 0);
// Copy first row into a vector
NumericVector v = m(0, _);
// Access the ith row, jth column
double x = m(i-1, j-1); // i and j are positive integers already defined
```
The number of columns/rows of a matrix can be retrieved using
```{Rcpp, eval=FALSE}
// number of rows
int nr = m.nrow();
// number of columns
int nc = m.ncol();
```
#### List
List creation can be accomplished as follows:
```{Rcpp, eval=FALSE}
// Create a list with elements e1 and e2
List L = List::create(e1, e2);
// Create a list with elements e1 and e2 with names
List L = List::create(Named("name1") e1, Named("name2") e2);
```
The elements `e1` and `e2` can be of any type (e.g. vector, matrix, etc.)
### Control structures
Logical operators in R and C++ are similar.
Here is a list where `x` and `y` are scalar (not vectors!).
Table: Common logical operators for C++.
Operator | Description
--- | ---
`x > y` | `x` greater than `y`
`x >= y` | `x` greater than or equal to `y`
`x < y` | `x` less than `y`
`x <= y` | `x` less than or equal to `y`
`x == y` | `x` equals to `y`
`x != y` | `x` not equal to `y`
`x && y` | `x` and `y`
`x || y` | `x` or `y`
`!x` | not `x`
#### if/else/elseif statements
R and C++ share the same syntax also for `if`/`else`/`elseif` statements
with the exception that `else if` has a space in C++ syntax. For example,
the following expression is valid within both R and C++:
```{r,eval=FALSE}
if (condition){
plan A;
}else{
plan B;
}
```
#### loops
Loops are much faster in C++ than in R and the syntax is slightly different.
For example, the following R `for` loop
```{r, eval=FALSE}
for(i in 1:10){
print("The value is: ", i)
}
```
can be written in C++ with the following code
```{Rcpp, eval=FALSE}
for(int i(1); i<=10; i++){
Rcpp::Rcout << "The value is: " << i << std::endl;
}
```
Three elements need to be defined:
1. The initialization statement: an iterator (`i` here), generally of type integer, needs to be initialized
(here `i` is initialized to 1).
2. The test expression: the expression is evaluated and should return a boolean. If `false` the loop stops and
if `true`, the body of the for loop is evaluated.
3. The update statement: it defines the increment of the iterator, usually set to `i++`.
Note that we can also allow the iterators to decrease via the decrement operator `i--`.
C++ also offers variants of assignment operators.
Table: Assignment operator in C++. `x` and `y` are already defined.
| Operator | Equivalence | Description |
| ------------ | ------------ | --------------------------------------------------------------- |
| `x = y;` | | Assign the value of `y` to `x` |
| `x += y;` | `x = x + y;` | Take the value of `x` and add `y`, then assign to `x` |
| `x -= y;` | `x = x - y;` | Take the value of `x` and substract `y`, then assign to `x` |
| `x *= y;` | `x = x * y;` | Take the value of `x` and multiply by `y`, then assign to `x` |
| `x /= y;` | `x = x / y;` | Take the value of `x` and divide by `y`, then assign to `x` |
Therefore, `i++` is equivalent to `i+=1`. If one therefore wanted to increment by 2 at each loop, one would use `i+=2`.
The C++ `while` loop has the same syntax as R. The following expression
```{Rcpp,eval=FALSE}
while( condition ) {
body;
}
```
is the same in both languages.
Note that C++ is also equipped with a `do ... while` loop.
This may be useful, for example, in the situation where
you need a first run of the body prior to checking the condition.
```{Rcpp, eval=FALSE}
do{
body;
}
while(condition);
```
### Functions
Creating functions in C++ is fundamentally not much different from writing them in R.
Neverthelss, C++ requires an additional layer of precision.
A function can follow this pattern:
```{Rcpp, eval=FALSE}
type function_name(type arg1, ..., type argn){
body of the function;
return obj;
}
```
First, the type of return of the function should be specified.
The `obj` that is returned should be of the same type as the one specified
at the start of the function declaration and each argument should have a type.
Otherwise, as in R, the body of the function contains statements that define what the function does.
As mentioned earlier, by default a C++ function cannot be used within R and the attribute `[[Rcpp::export]]` in comment
**just before the function definition** needs to be specified. For example
```{Rcpp, eval=FALSE}
// [[Rcpp::export]]
type function_name(type arg1, ..., type argn){
body of the function;
return obj;
}
```
will make `function_name` available within R.
### Comparing with R
Making an effort of writing C++ code instead of R code should deliver advantages.
One way to measure these advantages is by comparing the computational efficiency
of a C++ function with the equivalent R function.
This can be achieved, for example, using the `microbenchmark` package.
For example, suppose you create a function that computes the factorial number of
a positive integer input.
In R, you can write
```{r}
my_factorialR <- function(n){
factorial <- 1
for(i in 2:n){
factorial <- factorial * i
}
return(factorial)
}
```
An equivalent C++ function could be
```{Rcpp, eval=FALSE}
int my_factorialC(int n){
int factorial = 1;
for(int i(2); i<=n; ++i){
factorial *= i;
}
return factorial;
}
```
```{r, cache=FALSE, echo=FALSE,include=FALSE,eval=TRUE}
Rcpp::cppFunction(
"int my_factorialC(int n){
int factorial = 1;
for(int i(2); i<=n; ++i){
factorial *= i;
}
return factorial;
}")
microbenchmark::microbenchmark(my_factorialR(500), my_factorialC(500))
```
```{r, cache = FALSE, eval=TRUE}
require(microbenchmark)
microbenchmark(my_factorialR(500), my_factorialC(500))
```
It is clear that the computational performance is considerably different.
### Example: Buffon's needle (con't)
Let's reconsider Buffon's needle problem.
We propose to recode the R functions `cast_needle()` and
`buffon_experiment()` into C++.
For this purpose, first create a C++ file with header
```{Rcpp, eval=FALSE}
#include <Rcpp.h>
using namespace Rcpp;
```
The `cast_needle()` function was defined as
```{r, eval=FALSE}
cast_needle <- function(plane_width = 20){
available_range <- plane_width/2 - 1 # where 1 is the length of the needle (unit)
x_start <- runif(2, -available_range, available_range)
angle <- runif(1, 0, 2*pi)
x_end <- c(cos(angle), sin(angle)) + x_start # where the angles are multiplied by the needle length which is 1 in this example
cross <- floor(x_start[2]) != floor(x_end[2])
out <- list(start = x_start, end = x_end, cross = cross)
out
}
```
A possible C++ implementation is
```{Rcpp, eval=FALSE}
// [[Rcpp::export]]
List cast_needle(
unsigned int plane_width = 20
){
// Variable declaration
int available_range;
double angle,my_pi;
NumericVector x_start(2), x_end(2), tmp_angle(1);
bool cross;
// Computation
my_pi = atan(1) * 4; // defines pi (for simplicity)
available_range = plane_width / 2 - 1;
x_start = runif(2, -available_range, available_range);
tmp_angle = runif(1, 0, 2 * my_pi);
angle = tmp_angle[0];
x_end[0] = cos(angle) + x_start[0];
x_end[1] = sin(angle) + x_start[1];
cross = floor(x_start[2]) != floor(x_end[2]);
return List::create(
Named("start") = x_start,
Named("end") = x_end,
Named("cross") = cross
);
}
```
Based on the above function, we can make a few remarks:
- For better readibility, it is good practice to separate variable declaration and computation.
- The value of $\pi$ is not accessible by default so in this case we simply approximate it (but better solutions exist).
- Basic functions such as `atan()`, `runif()`, `floor()`, `cos()`, `sin()` are made available
by `Rcpp` (check for **Rcpp sugar**).
- To avoid an error we have first to save the result of `runif()` as `NumericVector` (its default result type).
The `buffon_needle()` function was declared in R as follows:
```{r,eval=FALSE}
buffon_experiment <- function(B = 2084, plane_width = 10, seed = NULL){
if (!is.null(seed)){
set.seed(seed)
}
X_start <- X_end <- matrix(NA, B, 2)
cross <- rep(NA, B)
for (i in 1:B){
inter <- cast_needle(plane_width = plane_width)
X_start[i, ] <- inter$start
X_end[i, ] <- inter$end
cross[i] <- inter$cross
}
out <- list(start = X_start, end = X_end, cross = cross, plane = plane_width)
class(out) <- "buffon_experiment"
out
}
```
The C++ alternative we propose is the following
```{Rcpp, eval=FALSE}
// [[Rcpp::export]]
List buffon_experiment(
int B = 2084,
int plane_width = 10
){
// Variable declaration
NumericMatrix X_start(B,2), X_end(B,2);
LogicalVector cross(B);
List L;
NumericVector tmp(2);
// Computation
for(int i(0); i<B; ++i){
L = cast_needle(plane_width);
tmp = L["start"];
X_start(i,_) = tmp;
tmp = L["end"];
X_end(i,_) = tmp;
cross(i) = L["cross"];
}
return List::create(
Named("start") = X_start,
Named("end") = X_end,
Named("cross") = cross,
Named("plane") = plane_width
);
}
```
Also in this case we have few remarks:
- C++ does not know the content of `L["start"]` in advance, so for this reason
it is first saved into a vector `tmp` of the correct dimension (the same for `L["end"]`).
- To avoid complexity, the list returned by the C++ implementationis not of
the class *buffon_experiment*. This should be added within R.
### C++ in an R package
Writing an R package with C++ code follows essentially the same steps
as a regular R package. Here we underline the main differences:
1. Create an "empty" R package: the option *R package using Rcpp* should be selected as
Project Type.
2. Edit description file: `Rcpp` has automatically been added in `Imports` and `LinkingTo` fields.
3. Move your R scripts into the R folder. In addition, **move the C++ files into the `src` folder**;
`src` stands for *source*.The C++ function must have the attribute
`// [[Rcpp::export]]` prior to function definition in order to make it available within R.
4. Documentation: follow the exact same steps as for R files. For C++ files, documentation
uses the same syntax except that `#` should be replaced by `//` (comments in C++).
5. The steps that follow are identical.