forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Expressions.Rmd
770 lines (569 loc) · 27.7 KB
/
Expressions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
# Expressions
```{r setup, include = FALSE}
source("common.R")
library(rlang)
library(lobstr)
```
To compute on the language, we first need to understand its structure. That requires some new vocabulary, some new tools, and some new ways of thinking about R code. \index{expressions}
The first thing you'll need to understand is the distinction between an operation and its result. Take this code, which takes a variable `x` multiplies it by 10 and saves the result to a new variable called `y`. It doesn't work because we haven't defined a variable called `x`:
```{r, error = TRUE}
y <- x * 10
```
It would be nice if we could capture the intent of the code, without executing the code. In other words, how can we separate our description of the action from performing it? One way is to use `base::quote()`: it captures an expression without evaluating it: \indexc{quote()}
```{r}
z <- quote(y <- x * 10)
z
```
`quote()` returns a quoted __expression__: an object that contains R code. In this chapter, you'll learn about the structure of those expressions, which will also help you understand how R executes code. Later, we'll learn about `eval()` which allows you to take such an expression and perform, or __evaluate__, it:
```{r}
x <- 4
eval(z)
y
```
## Abstract syntax trees
Quoted expressions are also called abstract syntax trees (AST) because the structure of code is fundamentally hierarchical and can be naturally represented as a tree. To make that more obvious we're going to introduce some graphical conventions, illustrated with the very simple call `f(x, "y", 1)`. \index{abstract syntax tree}
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/expression-simple.png", dpi = 450)
```
* Function __calls__ define the hierarchy of the tree. Calls are shown
with an orange square. The first child is the function that gets called,
here `f`. The second and subsequent children are the arguments.
NB:Unlike many tree diagrams the order of the children is important:
`f(x, 1)` is not the same as `f(1, x)`.
* The leaves of the tree are either __symbols__, like `f` and `x`, or
__constants__ like `1` or `"y"`. Symbols have a purple border and rounded
corners. Constants, which are atomic vectors of length one, have black
borders and square corners. Strings are always surrounded in quotes so
you can more easily distinguish from symbols --- more on that important
difference later.
Every call in R can be written in this form, even if it doesn't look like it at first glance. Take `y <- x * 10` again: what function is being called? It not as easy to spot as `f(x, 1)` because this expression uses __infix__ form. Infix functions come **in**between their arguments (so an infix function can only have two arguments). Most functions in R are __prefix__ functions where the name of the function comes first.
(Some programming languages use __postfix__ functions where the name of the function comes last. If you ever used an old HP calculator, you might have fallen in love with reverse Polish notation, postfix notation for algebra. There is also a family of "stack"-based programming languages descending from Forth which takes this idea as far as it might possibly go.)
### Infix vs. prefix
In R, any infix call can be converted to a prefix call if you you escape the function name with backticks. That means that these two lines of code are equivalent:
```{r}
y <- x * 10
`<-`(y, `*`(x, 10))
```
And their AST looks like this:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/expression-prefix.png", dpi = 450)
```
Drawing these diagrams by hand takes me some time, and obviously you can't rely on me to draw diagrams for your own code. So to supplement the hand-drawn trees, we'll also use some computer-drawn trees made by `lobstr::ast()`. `ast()` tries to make trees as similar as possible to my hand-drawn trees, while respecting the limitations of the console. I don't think they're quite as easy to visually parse, but they're not too bad. If you're running in an interactive terminal, you'll see calls in orange and names in purple.
```{r}
lobstr::ast(y <- x * 10)
```
`ast()` also prints the argument names when they're used:
```{r}
lobstr::ast(mean(x = mtcars$cyl, na.rm = TRUE))
```
We can use `ast()` to peek into even more complex calls. In this call, note the special forms used by `function`, `if`, and `{}` are just regular nodes in the tree.
```{r}
lobstr::ast(function(x, y, z) {
if (x > y) {
z - x
} else {
z + y
}
})
```
For more complex code, you can also use RStudio's tree viewer to explore the AST interactively. Activate with `View(quote(y <- x * 10))`.
### Constants
__constants__ include the length one atomic vectors, like `"a"` or `10`,
They have the interesting property that quoting is idempotent:
```{r}
identical(1, quote(1))
identical("test", quote("test"))
```
Note that you can't directly create longer vectors because there's no way to do that without calling a function:
```{r}
lobstr::ast(c(1, 2, 3))
lobstr::ast(1:3)
```
### Symbols {#names}
Symbols, represent the name of an object rather than its value. `ast()` prefixes names with a backtick. \index{names} \index{symbols|see{names}}
```{r}
lobstr::ast(x)
lobstr::ast(mean)
lobstr::ast(`an unusual name`)
```
Names that would otherwise be invalid are automatically surrounded by backticks
\index{non-syntactic names}.
### Unquoting
Note that `ast()` supports "unquoting" with `!!` (pronounced bang-bang). We'll talk about this in detail later; for now notice that this is useful if you've already used `quote` to captured the expression in a variable.
```{r}
lobstr::ast(z)
lobstr::ast(!!z)
```
### Base R naming conventions
Before we go much further, you need a word of caution about the naming conventions we've used here. Unfortunately, base R does not have a consistent set of conventions that are used throughout functions and documentation, and this book introduces another set of mild variations (which at least we do use consistently).
* Language object is used in two senses in base R. `is.language()` defines
as the union of symbols, calls, and expressions; `str()` uses as synonoym
for call.
* Symbol and name used interchangeably in base R. `as.name()` and `is.name()`
are identical to `as.symbol()` and `is.symbol()`. We prefer symbol because
name is too overloaded (i.e. the name of a variable).
* `expression()` and `parse()` create expression objects, which are basically a
list of calls. This is not very useful since you can just put calls in a list.
There is no reason to ever use `expression()`.
### Exercises
1. Use `ast()` and experimentation to figure out the three arguments in an
`if()` call. Which components are required? What are the arguments to
the `for()` and `while()` calls?
1. What does the call tree of an `if` statement with multiple `else`
conditions look like?
1. Why can't an expression contain an atomic vector of length greater than one?
Which two of the six types of atomic vector can't appear in an expression?
Why?
## Understanding R's grammar
The set of rules used to go from a sequence of tokens (like `x`, `y`, `+`) to a tree is known as a grammar. In this section, we'll explore some of the details of R's grammar, learning more about how a potentially ambiguous string is turned into a tree.
### Operator precedence and associativity
The AST has to resolve two sources of ambiguity when parsing infix operators. First, what does `1 + 2 * 3` yield? Do you get 6 (i.e. `(1 + 2) * 3`), or 7 (i.e. `1 + (2 * 3)`). Which of the two possible parse trees below does R use?
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/expression-ambig-order.png", dpi = 450)
```
Infix functions introduce an ambiguity in the parser in a way that prefix functions do not. Programming langauges resolve this using a set of conventions known as __operator precedence__. We can reveal the answer using `ast()`:
```{r}
lobstr::ast(1 + 2 * 3)
```
Second, is `1 + 2 + 3` parsed as `(1 + 2) + 3` or `1 + (2 + 3)`?
```{r}
lobstr::ast(1 + 2 + 3)
```
This is called __left-associativity__ because the the operations on the left are evaluated first. The order of arithmetic doesn't usually matter because `x + y == y + x`. However, some S3 classes define `+` in a non-associative way. For example, in ggplot2 the order of arithmetic does matter.
(These two sources of ambiguity do not exist in postfix languages which is one reason that people like them. They also don't exist in prefix languages, but you have to type a bunch of extra parentheses. For example, in LISP you'd write `(+ (+ 1 2) 3))`. In a postfix language you write `1 2 + 3 +`)
You override the default precendence rules by using parentheses. These also appear in the AST:
```{r}
lobstr::ast((1 + 2) + 3)
```
### Whitespace
R, in general, is not very sensitive to white space. Most white space is not signficiant and is not recorded in the AST. `x+y` yields exactly the same AST as `x + y`. There's is only one place where whitespace is quite important:
```{r}
lobstr::ast(y <- x)
lobstr::ast(y < -x)
```
### The function component
The first component of the call is usually a symbol that resolves to a function:
```{r}
lobstr::ast(f(a, 1))
```
But it might also be a function factory, a function that when called returns another function:
```{r}
lobstr::ast(f()(a, 1))
```
And of course that function might also take arguments:
```{r}
lobstr::ast(f(a, 1)())
```
And finally, there might be arguments at multiple levels:
```{r}
lobstr::ast(f(a, b)(1, 2))
```
These forms are relatively rare, but it's good to be able to recognise them when they crop up.
### Exercises
1. Which arithmetic operation is right associative?
1. Why does `x1 <- x2 <- x3 <- 0` work? There are two reasons.
1. Compare `x + y %+% z` to `x ^ y %+% z`. What does that tell you about
the precedence of custom infix functions?
## Extracting, modifying, and creating calls {#calls}
A call behaves similarly to a list. It has a `length()` and you can extract elements with `[[`, `[`, and `$`. Like lists can contain other lists, calls contain other calls. The main difference is that the first element of a call is special: it's the function that will get called. \index{calls}
Let's explore these ideas with a simple example:
```{r}
x <- quote(read.table("important.csv", row = FALSE))
lobstr::ast(!!x)
```
The length of a call minus one gives the number of arguments:
```{r}
length(x) - 1
```
The names of a call are always empty for the first element and for unnamed arguments.
```{r}
names(x)
```
### Subsetting
You can extract the leaves of the call by position and by name using `[[` and `$` in the usual way:
```{r}
x[[1]]
x[[2]]
x$row
```
You can use `[` to extract multiple components, but if you drop the the first element, you're usually going to end up with a weird call:
```{r}
x[2:3]
```
If you do want to extract multiple elements in this way, it's good practice to coerce the results to a list:
```{r}
as.list(x[2:3])
```
Note that if you want to extract a specific argument from a function call, it's going to be challenge because it could potentially be in any location, with the full name, with an abreviated name, or with no name. To work around this problem, you can use `rlang::lang_standardise()` which standardises all arguments to use the full name: \indexc{standardise\_call()}
```{r}
rlang::lang_standardise(x)
```
(Note that if the function uses `...` it's not possible to standardise all arguments.)
It's also possible to modify calls:
```{r}
x$header <- TRUE
x
```
But there are some important caveats which we'll come back to after we discuss how to construct a call from scratch.
### Function calls and pairlists
Pairlists are a holdover from R's past. They behave identically to lists, but have a different internal representation (as a linked list rather than a vector). Pairlists have been replaced by lists everywhere except in function arguments. \index{pairlists}
By and large it doesn't matter, but you might be surprised if you extract the function list.
```{r}
f <- quote(function(x = 10) x + 1)
length(f)
str(f[[1]])
str(f[[2]])
str(f[[3]])
str(f[[4]])
```
There's one special name that needs a little extra discussion: the empty name. It is used to represent missing arguments. This object behaves strangely. You can't bind it to a variable. If you do, it triggers an error about missing arguments. It's only useful if you want to programmatically create a function with missing arguments. \index{names|empty}
```{r, error = TRUE}
f <- function(x) 10
formals(f)$x
is.name(formals(f)$x)
as.character(formals(f)$x)
missing_arg <- formals(f)$x
# Doesn't work!
is.name(missing_arg)
```
To explicitly create it when needed, call `quote()` with a named argument:
```{r}
quote(expr = )
```
## Parsing and deparsing {#parsing-and-deparsing}
Sometimes code is represented as a string, rather than as an expression. You can convert a string to an expression with `parse()`. `parse()` is the opposite of `deparse()`: it takes a character vector and returns an expression object. The primary use of `parse()` is parsing files of code to disk, so the first argument is a file path. Note that if you have code in a character vector, you need to use the `text` argument: \indexc{parse()}
```{r}
z <- quote(y <- x * 10)
deparse(z)
parse(text = deparse(z))
```
Because there might be many top-level calls in a file, `parse()` doesn't return just a single expression. Instead, it returns an expression object, which is essentially a list of expressions: \index{expression object}
```{r}
exp <- parse(text = c("
x <- 4
x
5
"))
length(exp)
typeof(exp)
exp[[1]]
exp[[2]]
```
You can create expression objects by hand with `expression()`, but I wouldn't recommend it. There's no need to learn about this esoteric data structure if you already know how to use expressions. \indexc{expression()}
With `parse()` and `eval()`, it's possible to write a simple version of `source()`. We read in the file from disk, `parse()` it and then `eval()` each component in a specified environment. This version defaults to a new environment, so it doesn't affect existing objects. `source()` invisibly returns the result of the last expression in the file, so `simple_source()` does the same. \index{source()}
```{r}
simple_source <- function(file, envir = new.env()) {
stopifnot(file.exists(file))
stopifnot(is.environment(envir))
lines <- readLines(file, warn = FALSE)
exprs <- parse(text = lines)
n <- length(exprs)
if (n == 0L) return(invisible())
for (i in seq_len(n - 1)) {
eval(exprs[i], envir)
}
invisible(eval(exprs[n], envir))
}
```
The real `source()` is considerably more complicated because it can `echo` input and output, and also has many additional settings to control behaviour.
`_label()`, `_name()`, `_text()`.
### Exercises
1. What are the differences between `quote()` and `expression()`?
1. Read the help for `deparse()` and construct a call that `deparse()`
and `parse()` do not operate symmetrically on.
1. Compare and contrast `source()` and `sys.source()`.
1. Modify `simple_source()` so it returns the result of _every_ expression,
not just the last one.
1. The code generated by `simple_source()` lacks source references. Read
the source code for `sys.source()` and the help for `srcfilecopy()`,
then modify `simple_source()` to preserve source references. You can
test your code by sourcing a function that contains a comment. If
successful, when you look at the function, you'll see the comment and
not just the source code.
1. One important feature of `deparse()` to be aware of when programming is that
it can return multiple strings if the input is too long. For example, the
following call produces a vector of length two:
```{r, eval = FALSE}
g(a + b + c + d + e + f + g + h + i + j + k + l + m +
n + o + p + q + r + s + t + u + v + w + x + y + z)
```
Why does this happen? Carefully read the documentation for `?deparse`. Can you write a
wrapper around `deparse()` so that it always returns a single string?
## Case study: Walking the AST with recursive functions {#ast-funs}
It's easy to modify a single call with `substitute()` or `pryr::modify_call()`. For more complicated tasks we need to work directly with the AST. The base `codetools` package provides some useful motivating examples of how we can do this: \index{recursion!over ASTs}
* `findGlobals()` locates all global variables used by a function. This
can be useful if you want to check that your function doesn't inadvertently
rely on variables defined in their parent environment.
* `checkUsage()` checks for a range of common problems including
unused local variables, unused parameters, and the use of partial
argument matching.
To write functions like `findGlobals()` and `checkUsage()`, we'll need a new tool. Because expressions have a tree structure, using a recursive function would be the natural choice. The key to doing that is getting the recursion right. This means making sure that you know what the base case is and figuring out how to combine the results from the recursive case. For calls, there are two base cases (atomic vectors and names) and two recursive cases (calls and pairlists). This means that a function for working with expressions will look like:
```{r, eval = FALSE}
recurse_call <- function(x) {
if (is.atomic(x)) {
# Return a value
} else if (is.name(x)) {
# Return a value
} else if (is.call(x)) {
# Call recurse_call recursively
} else if (is.pairlist(x)) {
# Call recurse_call recursively
} else {
# User supplied incorrect input
stop("Don't know how to handle type ", typeof(x),
call. = FALSE)
}
}
```
### Finding F and T
We'll start simple with a function that determines whether a function uses the logical abbreviations `T` and `F`. Using `T` and `F` is generally considered to be poor coding practice, and is something that `R CMD check` will warn about. Let's first compare the AST for `T` vs. `TRUE`:
```{r}
ast(TRUE)
ast(T)
```
`TRUE` is parsed as a logical vector of length one, while `T` is parsed as a name. This tells us how to write our base cases for the recursive function: while an atomic vector will never be a logical abbreviation, a name might, so we'll need to test for both `T` and `F`. The recursive cases can be combined because they do the same thing in both cases: they recursively call `logical_abbr()` on each element of the object. \indexc{logical\_abbr}
```{r}
logical_abbr <- function(x) {
if (is.atomic(x)) {
FALSE
} else if (is.name(x)) {
identical(x, quote(T)) || identical(x, quote(F))
} else if (is.call(x) || is.pairlist(x)) {
for (i in seq_along(x)) {
if (logical_abbr(x[[i]])) return(TRUE)
}
FALSE
} else {
stop("Don't know how to handle type ", typeof(x),
call. = FALSE)
}
}
logical_abbr(quote(TRUE))
logical_abbr(quote(T))
logical_abbr(quote(mean(x, na.rm = T)))
logical_abbr(quote(function(x, na.rm = T) FALSE))
```
### Finding all variables created by assignment
`logical_abbr()` is very simple: it only returns a single `TRUE` or `FALSE`. The next task, listing all variables created by assignment, is a little more complicated. We'll start simply, and then make the function progressively more rigorous. \indexc{find\_assign()}
Again, we start by looking at the AST for assignment:
```{r}
ast(x <- 10)
```
Assignment is a call where the first element is the name `<-`, the second is the object the name is assigned to, and the third is the value to be assigned. This makes the base cases simple: constants and names don't create assignments, so they return `NULL`. The recursive cases aren't too hard either. We `lapply()` over pairlists and over calls to functions other than `<-`.
```{r}
find_assign <- function(x) {
if (is.atomic(x) || is.name(x)) {
NULL
} else if (is.call(x)) {
if (identical(x[[1]], quote(`<-`))) {
x[[2]]
} else {
lapply(x, find_assign)
}
} else if (is.pairlist(x)) {
lapply(x, find_assign)
} else {
stop("Don't know how to handle type ", typeof(x),
call. = FALSE)
}
}
find_assign(quote(a <- 1))
find_assign(quote({
a <- 1
b <- 2
}))
```
This function works for these simple cases, but the output is rather verbose and includes some extraneous `NULL`s. Instead of returning a list, let's keep it simple and use a character vector. We'll also test it with two slightly more complicated examples:
```{r}
find_assign2 <- function(x) {
if (is.atomic(x) || is.name(x)) {
character()
} else if (is.call(x)) {
if (identical(x[[1]], quote(`<-`))) {
as.character(x[[2]])
} else {
unlist(lapply(x, find_assign2))
}
} else if (is.pairlist(x)) {
unlist(lapply(x, find_assign2))
} else {
stop("Don't know how to handle type ", typeof(x),
call. = FALSE)
}
}
find_assign2(quote({
a <- 1
b <- 2
a <- 3
}))
find_assign2(quote({
system.time(x <- print(y <- 5))
}))
```
This is better, but we have two problems: dealing with repeated names and neglecting assignments inside other assignments. The fix for the first problem is easy. We need to wrap `unique()` around the recursive case to remove duplicate assignments. The fix for the second problem is a bit more tricky. We also need to recurse when the call is to `<-`. `find_assign3()` implements both strategies:
```{r}
find_assign3 <- function(x) {
if (is.atomic(x) || is.name(x)) {
character()
} else if (is.call(x)) {
if (identical(x[[1]], quote(`<-`))) {
lhs <- as.character(x[[2]])
} else {
lhs <- character()
}
unique(c(lhs, unlist(lapply(x, find_assign3))))
} else if (is.pairlist(x)) {
unique(unlist(lapply(x, find_assign3)))
} else {
stop("Don't know how to handle type ", typeof(x),
call. = FALSE)
}
}
find_assign3(quote({
a <- 1
b <- 2
a <- 3
}))
find_assign3(quote({
system.time(x <- print(y <- 5))
}))
```
We also need to test subassignment:
```{r}
find_assign3(quote({
l <- list()
l$a <- 5
names(l) <- "b"
}))
```
We only want assignment of the object itself, not assignment that modifies a property of the object. Drawing the tree for the quoted object will help us see what condition to test for. The second element of the call to `<-` should be a name, not another call.
```{r}
ast(l$a <- 5)
ast(names(l) <- "b")
```
Now we have a complete version:
```{r}
find_assign4 <- function(x) {
if (is.atomic(x) || is.name(x)) {
character()
} else if (is.call(x)) {
if (identical(x[[1]], quote(`<-`)) && is.name(x[[2]])) {
lhs <- as.character(x[[2]])
} else {
lhs <- character()
}
unique(c(lhs, unlist(lapply(x, find_assign4))))
} else if (is.pairlist(x)) {
unique(unlist(lapply(x, find_assign4)))
} else {
stop("Don't know how to handle type ", typeof(x),
call. = FALSE)
}
}
find_assign4(quote({
l <- list()
l$a <- 5
names(l) <- "b"
}))
```
While the complete version of this function is quite complicated, it's important to remember we wrote it by working our way up by writing simple component parts.
### Modifying the call tree {#modifying-code}
The next step up in complexity is returning a modified call tree, like what you get with `bquote()`. `bquote()` is a slightly more flexible form of quote: it allows you to optionally quote and unquote some parts of an expression (it's similar to the backtick operator in Lisp). Everything is quoted, _unless_ it's encapsulated in `.()` in which case it's evaluated and the result is inserted: \index{bquote()}
```{r}
a <- 1
b <- 3
bquote(a + b)
bquote(a + .(b))
bquote(.(a) + .(b))
bquote(.(a + b))
```
This provides a fairly easy way to control what gets evaluated and when. How does `bquote()` work? Below, I've rewritten `bquote()` to use the same style as our other functions: it expects input to be quoted already, and makes the base and recursive cases more explicit:
```{r}
bquote2 <- function (x, where = parent.frame()) {
if (is.atomic(x) || is.name(x)) {
# Leave unchanged
x
} else if (is.call(x)) {
if (identical(x[[1]], quote(.))) {
# Call to .(), so evaluate
eval(x[[2]], where)
} else {
# Otherwise apply recursively, turning result back into call
as.call(lapply(x, bquote2, where = where))
}
} else if (is.pairlist(x)) {
as.pairlist(lapply(x, bquote2, where = where))
} else {
# User supplied incorrect input
stop("Don't know how to handle type ", typeof(x),
call. = FALSE)
}
}
x <- 1
y <- 2
bquote2(quote(x == .(x)))
bquote2(quote(function(x = .(x)) {
x + .(y)
}))
```
The main difference between this and the previous recursive functions is that after we process each element of calls and pairlists, we need to coerce them back to their original types.
Note that functions that modify the source tree are most useful for creating expressions that are used at run-time, rather than those that are saved back to the original source file. This is because all non-code information is lost:
```{r}
bquote2(quote(function(x = .(x)) {
# This is a comment
x + # funky spacing
.(y)
}))
```
These tools are somewhat similar to Lisp macros, as discussed in [Programmer's Niche: Macros in R](http://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf#page=10) by Thomas Lumley. However, macros are run at compile-time, which doesn't have any meaning in R, and always return expressions. They're also somewhat like Lisp [fexprs](http://en.wikipedia.org/wiki/Fexpr). A fexpr is a function where the arguments are not evaluated by default. The terms macro and fexpr are useful to know when looking for useful techniques from other languages. \index{macros} \index{fexprs}
### Exercises
1. Why does `logical_abbr()` use a for loop instead of a functional
like `lapply()`?
1. `logical_abbr()` works when given quoted objects, but doesn't work when
given an existing function, as in the example below. Why not? How could
you modify `logical_abbr()` to work with functions? Think about what
components make up a function.
```{r, eval = FALSE}
f <- function(x = TRUE) {
g(x + T)
}
logical_abbr(f)
```
1. Write a function called `ast_type()` that returns either "constant",
"name", "call", or "pairlist". Rewrite `logical_abbr()`, `find_assign()`,
and `bquote2()` to use this function with `switch()` instead of nested if
statements.
1. Write a function that extracts all calls to a function. Compare your
function to `pryr::fun_calls()`.
1. Write a wrapper around `bquote2()` that does non-standard evaluation
so that you don't need to explicitly `quote()` the input.
1. Compare `bquote2()` to `bquote()`. There is a subtle bug in `bquote()`:
it won't replace calls to functions with no arguments. Why?
```{r}
bquote(.(x)(), list(x = quote(f)))
bquote(.(x)(1), list(x = quote(f)))
```
1. Improve the base `recurse_call()` template to also work with lists of
functions and expressions (e.g., as from `parse(path_to_file))`.
## Exercises that need a home
1. You can use `formals()` to both get and set the arguments of a function.
Use `formals()` to modify the following function so that the default value
of `x` is missing and `y` is 10.
```{r}
g <- function(x = 20, y) {
x + y
}
```
1. Write an equivalent to `get()` using `as.name()` and `eval()`. Write an
equivalent to `assign()` using `as.name()`, `substitute()`, and `eval()`.
(Don't worry about the multiple ways of choosing an environment; assume
that the user supplies it explicitly.)
1. Implement a pure R version of `do.call()`.
1. Concatenating a call and an expression with `c()` creates a list. Implement
`concat()` so that the following code works to combine a call and
an additional argument.
```{r, eval = FALSE}
c(quote(f()), list(a = 1, b = quote(mean(a)))
concat(quote(f), a = 1, b = quote(mean(a)))
#> f(a = 1, b = mean(a))
```