-
Notifications
You must be signed in to change notification settings - Fork 1
/
3_writing_functions.qmd
771 lines (523 loc) · 23 KB
/
3_writing_functions.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
---
output:
html_document:
df_print: paged
code_download: TRUE
toc: true
toc_depth: 1
editor_options:
chunk_output_type: console
---
```{r, setup, include=FALSE}
knitr::opts_chunk$set(
eval=FALSE, warning=FALSE, error=FALSE
)
```
```{r}
if (!require("babynames")) install.packages("babynames")
if (!require("palmerpenguins")) install.packages("palmerpenguins")
if (!require("tidyverse")) install.packages("tidyverse")
library(palmerpenguins)
library(tidyverse)
library(babynames)
```
# Writing your own functions
In this section we are going to learn some advanced concepts that are going to make you into a full-fledged R programmer. Before this you only used the built-in functions R came with, as well as the functions contained in packages. Now we will learn about building functions ourselves.
## Declaring functions in R
The standard syntax to declare functions in R is as follows:
```
function_name <- function(arguments) {
expressions
return(value) # optional
}
```
Suppose you want to create the following function: $$f(x) = 1/\sqrt{x}$$ Writing this in R is quite simple:
```{r}
my_function <- function(x) {
1/sqrt(x)
}
```
The argument of the function, `x`, gets passed to the `function()` function and the *body* of the function contains the function definition. Of course, you could define functions that use more than one input.
```{r}
my_function <- function(x, y) {
1/sqrt(x + y)
}
```
Functions written by the user get called just the same way as functions included in R:
```{r}
my_function(20, 100)
```
You could also provide default values to the function's arguments, which are values that are used if the user omits them:
```{r}
my_function <- function(x, y=1){
1/sqrt(x + y)
}
# call the function
my_function(20)
```
```{r}
1 / sqrt(20 + 1)
```
What if you don't know how many arguments you will have to the function?
For example, say you are collecting the grades of all students who gave an exam to calculate the average, but different numbers of students give the exam each year. You can pass a list or vector of values as the function argument as below:
```{r}
# define the function
average_grade_calculator <- function(x) {
mean(x)
}
# call the function
grades <- c(45, 67, 89, 99, 100, 34, 78, 94)
average_grade_calculator(grades)
```
What if you wanted to calculate two different values, such as the mean and the standard deviation?
To be able to do this, you must put your values in a list, and return the list of values. For example:
```{r}
# define the function
average_and_sd_calculator <- function(x) {
c(mean(x), sd(x))
}
# call the function
average_and_sd_calculator(grades)
```
You're still returning a single object, but it's a vector.
You can also return a named list:
```{r}
# define the function
average_and_sd_calculator <- function(x) {
list("mean_grade" = mean(x),
"sd_grade" = sd(x))
}
# call the function
average_and_sd_calculator(grades)
```
In general, function names should be of the form: `function_name()`. Verbs or action words are preferred. Always give your function very explicit names!
### EXERCISE 1
Write a function to calculate the volume of a cube given the length, width, and height values. Set default value for each argument to 1.
Formula for a volume of a cube: $f(l,w,h) = lwh$
```{r}
```
### EXERCISE 2
Write a function to calculate present age given birth year. Add the current year as a default value for the calculation
```{r}
```
## Returning values
Usually, the last evaluated value is returned as the function output. Therefore the function below will only return one value as the output.
Let's re-write the `average_and_sd_calculator` function to see how this works:
```{r}
# define the function
average_and_sd_calculator <- function(x) {
mean_x <- mean(x)
sd_x <- sd(x)
}
# call the function
average_and_sd_calculator(grades)
```
What did the function return? Essentially, the function evaluated all the statements but did not return any value since it was not asked to! In order to write robust code, you can use an explicit `return()` statement at the end of your functions to return the values you are interested in.
```{r}
# define the function
average_and_sd_calculator <- function(x) {
mean_x <- mean(x)
sd_x <- sd(x)
return(mean_x)
}
# call the function
average_and_sd_calculator(grades)
```
You are still returning a single object as earlier, but doing so more explicitly.
Let's see what happens when we try to return two objects explicitly with a `return()` statement:
```{r, error=TRUE, eval=FALSE}
# define the function
average_and_sd_calculator <- function(x) {
mean_x <- mean(x)
sd_x <- sd(x)
return(mean_x, sd_x) # does not work
}
# call the function
average_and_sd_calculator(grades)
```
We can bundle together several objects and return them as a single list of vector:
```{r}
# ------return as a vector------
average_and_sd_calculator <- function(x) {
mean_x <- mean(x)
sd_x <- sd(x)
return(c(mean_x, sd_x)) # works!
}
# call the function
average_and_sd_calculator(grades)
# ------return as a list------
average_and_sd_calculator <- function(x) {
mean_x = mean(x)
sd_x = sd(x)
return(list(mean_x, sd_x)) # also works!
}
# call the function
average_and_sd_calculator(grades)
# ------return as a named list------
average_and_sd_calculator <- function(x) {
mean_x = mean(x)
sd_x = sd(x)
return(list("mean_grade" = mean_x, "sd_grade" = sd_x)) # also works!
}
# call the function
average_and_sd_calculator(grades)
```
### EXERCISE 3
Write a function which takes in a data frame, removes all incomplete cases from it (can use `na.omit()` or `drop_na()`), adds a column called `complete` with value `TRUE` for all rows and returns the cleaned data back. Test your function on the `penguins` data set from package `{palmerpenguins}`
```{r}
if (!require(palmerpenguins)) install.packages("palmerpenguins")
```
### EXERCISE 4
Write a function which takes in two vectors `x` and `y` and returns the $R^2$ value for a linear model y \~ x. Use it to find the $R^2$ value for a linear fit between `body_mass_g` and `bill_length_mm` for the `penguins` data.
**Hint**: use `summary` on the output of `lm` and extract the `r.squared`
```{r}
```
## Anonymous functions
As the name implies, anonymous functions are functions that do not have a name. These are useful inside functions that have other functions as arguments such as `map`. Anonymous functions get defined in a very similar way to regular functions, you just skip the name and that's it.
Let's see a simple example:
```{r}
# named function
squaring_func <- function(x) {x^2}
squaring_func(3)
# anonymous function
(function(x) x^2)(3)
```
Let's see a few more examples
```{r}
# named
vol_cube <- function(x, y, z){x*y*z}
vol_cube(3,4,2)
# volume of a cube
(function(x,y,z) x*y*z)(3,4,2)
```
```{r}
# named
age <- function(birth_year, present=2023) {present - birth_year}
age(1998)
# present age given birth year
(function(birth_year, present=2023) present - birth_year)(1998)
```
What's the point of anonymous functions?
Suppose I'm trying to calculate the column means of a data frame `X` . We learned in intro to Tidyverse workshop that we can use `across()` with `summarize()` to calculate the column means.
```{r}
X <- data.frame(matrix(1:30, nrow = 10, ncol = 3))
X[c(2,3,4),2] <- NA # add some missing values
X[8,3] <- NA # add some missing values
print(X)
X %>%
summarize(across(everything(), mean))
```
Because there are missing values in some columns, I want to specify `na.rm=TRUE` inside the `mean()` function. I could define a new mean function with that specific argument.
```{r}
mean_ignore_na <- function(x) {
mean(x, na.rm = TRUE)
}
X %>%
summarize(across(everything(), mean_ignore_na))
```
We can avoid creating a new function for a single use case; we can use an anonymous function!
```{r}
X %>%
summarize(across(everything(), function(x) mean(x, na.rm = TRUE)))
```
### EXERCISE 5
Rewrite the following function as an anonymous function. Use it to select all columns that end with "mm" in the data frame `penguins`
```
pattern_match_colnames <- function(df, pattern) {
df %>%
select(ends_with(pattern))
}
```
```{r}
```
## Passing columns of data to functions with `{{}}`
So far we have looked at functions that take vectors as arguments, however, in many situations, you will want to write functions that look similar to this:
```
my_function(my_data, one_column_inside_data)
```
Such a function will be useful in situations where you will have to apply a certain number of operations to columns for different data frames, especially if you are working with `tidyverse`.
For example, if you need to create tables of descriptive statistics or graphs periodically, it might be more efficient to put these operations inside a function and then call the function whenever you need it, on the fresh batch of data.
Let's see how that approach might work:
```{r}
# get the mean value of a column
simple_function <- function(dataset, col_name) {
dataset %>%
summarize(mean = mean(col_name, na.rm=TRUE))
}
# call the function
simple_function(penguins, "body_mass_g")
```
The variable `col_name` is passed to `simple_function()` as a string, but `group_by()` requires a variable name.
What we're would like to achieve above is:
```{r}
penguins %>% summarize(mean = mean(body_mass_g, na.rm = TRUE))
```
Maybe we don't quote `body_mass_g` like we do with `dplyr` functions?
```{r error=TRUE}
simple_function(penguins, body_mass_g)
```
Alright, so that also didn't work.
So what is happening here? To understand what's happening, we will take a quick detour and make a distinction between two kinds of variables: **env-variables** and **data variables**
- **env-variables**: "programming"variables that live inside your environment. Usually created with `<-` operator
- **data-variables**: "statistical" variables that live inside your data frame. Usually come with data files (`.csv`, `.xlsx`), and are accessed in the context of your data frame.
In our case, `penguins` is an env-variable with data-variables such as `species`, `island`, `bill_length_mm`, and so on. With `dplyr` functions, when we pass the column names as variables, they are interpreted as data-variables. This technique, which allows us to refer to column names as if they are variables that live inside your environment, is called **data masking**.
In the function we defined above, the `col_name` inside `summarize()` function is first interpreted as a data-variable, and the function looks for the `col_name` column inside the data set, which doesn't exist. Then it checks what `col_name` represents as an environment variable. We passed `"body_mass_g"`, which is a string to `col_name`, but taking a mean of a string doesn't make sense!
What we would like to do is use `col_name` as a placeholder for data-variable. Then we can pass whatever column name to the function.
To be able to pass column names of your data as arguments, we need to use a framework that was introduced in the `{tidyverse}`, called *tidy evaluation*. This framework can be used by installing the `{rlang}` package version 0.4.0 or higher.
Let's rewrite the function from before with **curly curly** syntax (this is also referred to as **embracing** in tidyverse:
```{r}
# function using tidyeval
simple_function <- function(dataset, col_name) {
dataset %>%
summarize(mean = mean({{ col_name }}, na.rm=TRUE)) # "embrace" the argument
}
# call tidyeval function
print(simple_function(penguins, body_mass_g))
print(mean(penguins$body_mass_g, na.rm = TRUE))
# call tidyeval function
print(simple_function(penguins, bill_length_mm ))
print(mean(penguins$bill_length_mm , na.rm = TRUE))
```
Let's see another example:
```{r}
simple_function <- function(dataset, filter_col, mean_col, value) {
dataset %>%
filter( {{ filter_col }} == value) %>%
summarize(mean = mean( {{ mean_col }} , na.rm=TRUE))
}
# mean of body mass for females
simple_function(penguins, sex, body_mass_g, "female")
# mean of body mass for Adelie species
simple_function(penguins, species, body_mass_g, "Adelie")
```
Say you want to make the function generate more readable output, it would be easier to name the summary column similar to the input data column. Let's try that:
```{r, eval=FALSE, error=TRUE}
simple_function <- function(dataset, filter_col, mean_col, value) {
dataset %>%
filter( {{ filter_col }} == value) %>%
summarize( {{ mean_col }} = mean( {{mean_col}} , na.rm=TRUE)) # assign summary column name with =
}
simple_function(penguins, sex, body_mass_g, "female")
```
This throws an error! In this case, you have to use the `:=` operator instead of `=` to be able to assign the column name to the new column which stores the summary statistic.
```{r}
simple_function <- function(dataset, filter_col, mean_col, value) {
dataset %>%
filter( {{ filter_col }} == value) %>%
summarize( {{ mean_col }} := mean( {{ mean_col }} , na.rm=TRUE)) # assign summary column name with :=
}
simple_function(penguins, sex, body_mass_g, "female")
simple_function(penguins, sex, bill_length_mm, "female")
```
If you're creating new columns names with dplyr verbs, you can also enclose the data-variables inside quotation marks, which lets you programmatically generate names.
```{r}
simple_function <- function(dataset, filter_col, mean_col, value) {
dataset %>%
filter( {{ filter_col }} == value) %>%
summarize(
"{{ mean_col }}_mean" := mean( {{ mean_col }} , na.rm=TRUE),
"{{ mean_col }}_sd" := sd({{ mean_col }}, na.rm=TRUE)
) # assign summary column name with :=
}
simple_function(penguins, sex, body_mass_g, "female")
simple_function(penguins, sex, bill_length_mm, "female")
```
### EXERCISE 6
Load the starwars dataset with `data(starwars)` for this exercise.
```{r}
data(starwars)
View(starwars)
names(starwars)
```
Write a function `how_many_na` which counts the number of missing values for a given column of the dataset. Using this function, find the number of missing values for `hair_color` and `birth_year` in `starwars`
```{r}
```
### EXERCISE 7
Write a function `sorted_mean_mass` to calculate the average `mass` grouped by any categorical column of choice (e.g. `homeworld`, `species`), and arrange the results from lightest to heaviest.
The function should accept two arguments: dataset and grouping column.
Hint: First write out the tidyverse commands as you would normally, then convert it to a function.
```{r}
```
### EXERCISE 8
Write a function `sorted_mean_col` to calculate the average of *any* numeric column grouped by any categorical column, and arrange the results from lowest to highest. Name the summary statistic column the same as the input column.
Bonus: Have the function also calculate the median and have the column names in the summarized output include the input variable name.
```{r}
```
### Practical example for `{{}}`
A very useful application for `{{}}` is for making plot templates. Like `dplyr` verbs, `ggplot2` also uses data masking inside the `aes()` part.
```{r}
# plot code
penguins %>%
ggplot(aes(x = flipper_length_mm, y = body_mass_g, group = species)) +
geom_point(aes(color = species), size=2, alpha=0.8) +
geom_smooth(aes(color = species, fill = species), method="lm", alpha =0.1) +
theme_classic() +
labs(title = "Penguins Data Scatterplot")
```
Let's make this into a function:
```{r}
# functionalized plot code
scatterplot_with_lm <- function(data, x_col, y_col, grouping_var, title) {
data %>%
ggplot(aes(x = {{ x_col }}, y = {{ y_col }}, group = {{ grouping_var }})) +
geom_point(aes(color = {{ grouping_var }}), size=2, alpha=0.8) +
geom_smooth(aes(color = {{ grouping_var }}, fill = {{ grouping_var }}),
method="lm", alpha =0.1) +
theme_classic() + labs(title = title)
}
```
Call this function several times to make different plots
```{r}
# call on penguins data
scatterplot_with_lm(penguins, flipper_length_mm, body_mass_g, species, "penguins Data")
scatterplot_with_lm(penguins, bill_length_mm, body_mass_g, species, "penguins Data")
scatterplot_with_lm(penguins, bill_depth_mm, body_mass_g, species, "penguins Data")
# call on a different dataset
scatterplot_with_lm(mtcars, wt, mpg, gear, "mtcars Data")
scatterplot_with_lm(mtcars, wt, mpg, factor(gear), "mtcars Data")
```
### EXERCISE 9
Load the `{babynames}` package as below:
```{r}
library(babynames)
print(head(babynames))
```
```{r}
# plot the popularity of a name across time
chosen_name = "Tyler"
babynames %>%
filter(name == chosen_name) %>%
ggplot(aes(x=year, y=prop)) +
geom_col(color="orange", fill="orange", alpha=0.5) + theme_classic()
```
Rewrite the plotting code above as a function that can take any data set similar to `babynames` and plot the popularity measure (as either the `prop` or `n` for example) over time for the variable `chosen_name`. Call the function `name_popularity_plotter`.
```{r}
name_popularity_plotter <- function(){
}
```
## Functional Programming with `{purrr}`
In functional programming, your code is organised into functions that perform the operations you need. Your scripts will only be a sequence of calls to these functions, making them easier to understand. Some examples of functional programming are the `purrr::map` and `purrr::reduce` family of functions.
### The `map*()` family of functions
Instead of using loops, pure functional programming languages use functions that achieve the same result. The first part of these functions' names all start with `map_` and the second part tells you what this function is going to ***return***, such as `map_dbl()` for doubles, or `map_df()` for dataframes.
For example, here $X$ is a vector composed of the several scalar elements. The function we want to map to each element of $X$ is $f( x)=x+1$:
![](https://upload.wikimedia.org/wikipedia/commons/0/06/Mapping-steps-loillibe-new.gif){width="419"}
```{r}
plus_one <- function(x) {x+1} # function to apply to each element
```
You could use a loop to execute this task as follows:
```{r}
numbers <- c(0, 5, 8, 3, 2, 1)
my_results <- numeric(length(numbers))
for (i in seq_along(numbers)) {
my_results[i] <- plus_one(numbers[i])
}
my_results
```
Using the `map*()` functions, we can simplify this code a lot:
```{r}
map(numbers, plus_one)
```
You could also use the base R `apply` functions to same effect:
```{r}
lapply(numbers, plus_one)
```
So what is the added value of using `{purrr}`, you might ask. Well, imagine that instead of a list, I need to an atomic vector of `numeric` values. This is fairly easy with `{purrr}`:
```{r}
map_dbl(numbers, plus_one)
```
In base R, outputting something other than lists involves more effort. Whereas, using the map family of functions you get try `_lgl()`, `_int()`, `_dbl()`, `_chr()` to return a logical, integer, double, or character vector respectively.
### Mapping with two inputs
Sometimes you might want to perform operations by iterating over two lists. Then you would use the `map2` family of functions.
```{r}
x <- c(1, 1, 1)
y <- c(10, 20, 30)
addition <- function(x,y) (x+y)
map2_dbl(x, y, addition)
```
To learn about mapping with multiple inputs see [`pmap()`](https://dcl-prog.stanford.edu/purrr-parallel.html#pmap).
### EXERCISE 10
Write a function with map that takes in a names vector and adds the title "Dr." in front of each name. Make sure it returns a character vector of the modified names.
```{r}
names <- c("Doom", "Strange", "Fate", "Manhattan", "Octopus")
```
### Mapping with anonymous functions
Let's rewrite the addition function above as an anonymous function inside `map`
```{r}
map2_dbl(x, y, function(x,y) x+y)
```
### Mapping with formulas
Tidyverse functions also support formulas, as they get converted into anonymous functions. Using a formula instead of an anonymous function is less verbose; you use `~` instead of `function(x)` and a single dot `.` instead of `x` (or `.x` and `.y` if you have multiple inputs).
```{r}
map_dbl(x, ~{sqrt(.)}) # equivalent to map_dbl(x, function(x) sqrt(x))
```
Let's re-write the addition function from before with a formula to be even more concise:
```{r}
map2_dbl(x, y, ~{.x + .y}) # equivalent to map2_dbl(x, y, function(x,y) x + y)
```
### EXERCISE 11
Rewrite the code from the previous exercise to now make an anonymous function call with map.
```{r}
names <- c("Doom", "Strange", "Fate", "Manhattan", "Octopus")
```
### Practical examples for `map*()`
Let's look at a more practical example. Suppose you want to split your data set into pieces and then fit a linear model to each piece, and then extract the regression coefficient for each piece. You could use map functions to carry this out easily. To follow what's happening let's run the following code piece by piece.
```{r}
penguins %>%
nest(.by = species) %>%
mutate(
models = map(data, function(df) lm(body_mass_g ~ bill_depth_mm, data=df))
) %>%
mutate(
r_squared = map_dbl(models, function(m) summary.lm(m)$r.squared)
) %>%
select(species, r_squared)
```
Let's look at another example from the data frame `babynames`
```{r}
# plot the proportion of babies named "Tyler" from 2000 - 2020
babynames %>%
filter(name == "Tyler", year %in% 2000:2020) %>%
ggplot(aes(x=year, y=prop)) +
geom_col(color="orange", fill="orange", alpha=0.5) + theme_classic()
```
Let's use `map` to find the rate of change of the proportion of selected baby names from 2000-2020.
```{r}
selected_names <- c("Tyler", "Mary", "William", "Jessica")
babynames %>%
filter(name %in% selected_names, year %in% 2000:2020) %>%
group_by(year, name) %>%
summarize(prop = sum(prop), .groups = "drop") %>%
nest(.by = name) %>%
mutate(models = map(data, function(df) lm(prop ~ year, data=df))) %>%
mutate(slope = map_dbl(models, function(m) summary(m)$coefficients[2])) %>%
select(name, slope)
```
[Cheat sheet](https://github.com/rstudio/cheatsheets/blob/main/purrr.pdf) of `{purrr}` functions.
### EXERCISE 12
Write a code with `map()` function that takes in a data frame and returns a summary. Now try it with a variant of map\* family that forces the output to be a data frame.
Use it on the penguins data set after omitting `NA`s
```{r}
```
### EXERCISE 13
Find the female baby name that has increased in popularity from 2000 - 2020. Use only names that appear at least 5,000 times in the year 2000.
NOTE: To model change in popularity from 2000 to 2020, we can run linear regression of `prop` against `year` for each name.
As a reference, this is how you get slope and $R^2$ value from linear regression model
```{r}
# run linear regression model
lm_model <- lm(prop ~ n, data = babynames)
# get slope
summary(lm_model)$coefficients[2]
# get R-squared
summary(lm_model)$r.squared
```
```{r}
```
### EXERCISE 14
Write code to find the change in popularity since 2000 for *all* baby names with at-least 500 occurrences in the year 2000. Select only those models whose $R^2$ was at least 0.1.
```{r}
```
Which of these baby names had the greatest rise in popularity from 2000-2020?
Which of these baby names had the greatest decline in popularity from 2000-2020?
Which of these baby names fits a linear model extremely well?
```{r}
```