-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathPHSMethods.Rmd
614 lines (434 loc) · 27.2 KB
/
PHSMethods.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
---
title: "PHS Methods"
output:
learnr::tutorial:
css: "css/style.css"
progressive: false
allow_skip: true
runtime: shiny_prerendered
---
```{r setup, include=FALSE}
# Author: Russell McCreath
# Original Date: Oct 2020
# Modified by Tina Fu on 15th July 2022
# Version of R: 3.6.1
library(learnr)
library(gradethis)
library(stringr)
library(readr)
library(haven)
library(dplyr)
library(phsmethods)
library(tibble)
knitr::opts_chunk$set(echo = FALSE)
tutorial_options(
exercise.checker = gradethis::grade_learnr
)
```
```{r phs-logo, echo=FALSE, fig.align='right', out.width="40%"}
knitr::include_graphics("images/phs-logo.png")
```
## Introduction
Welcome to PHS Methods training. This course is designed as a self-led introduction to the internally developed `phsmethods` package for use in Public Health Scotland. Throughout this course there will be quizzes to test your knowledge and opportunities to modify and write R code. The course is split with a section for each function in the package.
<div class="info_box">
<h4>Course Info</h4>
<ul>
<li>You can select which parts of the course you want to do. If you're here to learn something specific or you're already comfortable with a particular section, you can skip it.</li>
<li>Most sections have multiple parts to them. Navigate the course by using the buttons at the bottom of the screen to Continue or go to the Next Topic.</li>
<li>The course will also show progress through sections, a green tick will appear on sections you've completed, and it will remember your place if you decide to close your browser and come back later.</li>
</ul>
</div>
</br>
### Overview
The package, `phsmethods`, can be used on desktop and server versions of RStudio. It contains functions for common analytical tasks in Public Health Scotland (PHS). The package is also intended to be in continuous development, with contributions to fixing typos, bugs, and even new functions are all very much encouraged. To learn more about contributing, see [this section on GitHub](https://github.com/Public-Health-Scotland/phsmethods#contributing-to-phsmethods).
These are the current functions in the package which are included in this training app. As more functions are added, the aim is to include them here too. As such, this training can be treated as modular and each section is self-contained:
[`age_group()`](#age_group) [`chi_check()`](#chi_check) [`chi_pad()`](#chi_pad) [`file_size()`](#file_size) [`fin_year()`](#fin_year) [`match_area()`](#match_area) [`postcode()`](#postcode) and the [quarters](#quarters) functions.
### How to Use
The `phsmethods` package is just like any other package. However, it is not *currently* hosted on CRAN which means it needs to be installed from GitHub **or** a local copy of the GitHub repo. These methods are outlined below to get you started. Once the package is installed, it will show in your list of packages on RStudio.
#### Install from GitHub
For this method, the package `remotes` is required. If you don't already have this installed, it's on CRAN so run `install.packages("remotes")`. The `phsmethods` package can then be installed on RStudio from GitHub with:
```{r phsmethods-install, include=TRUE, echo = TRUE, eval = FALSE, message=FALSE, warning=FALSE}
remotes::install_github("Public-Health-Scotland/phsmethods", upgrade = "never")
```
#### Install from Local Copy
It may be necessary to install from a local copy. This is most likely the case if you want to install on RStudio Desktop and network security settings are preventing access to the `remotes::install_github()` function. You'll still need the `remotes` package installed, as above, so if you don't have it installed, run `install.packages("remotes")`. Then, there's 2 steps:
1. Download a [zip of the GitHub repo](https://github.com/Public-Health-Scotland/phsmethods/archive/master.zip)
2. Run the following code, replacing `<>` with the file-path of the zipped file (remember to remove the `<>` symbols too.):
```{r phsmethods-install-local, eval=FALSE, echo = TRUE}
remotes::install_local("<FILEPATH OF ZIPPED FILE>/phsmethods-master.zip", upgrade = "never")
```
#### Load
At this point, it's just the same as any other installed package, just run `library()` when you want to use it in your current session:
```{r phsmethods-load, include=TRUE, echo = TRUE, message=FALSE, warning=FALSE}
library(phsmethods)
```
## create_age_groups()
The function, `create_age_groups()`, categorises ages into groups. The function takes a number of arguments:
- `x` - a vector of numeric values (e.g. `c(1, 2, 3, 21, 18, 88, 91)`)
- `from` - the start of the smallest age group (*default is 0*)
- `to` - the end of the largest age group (*default is 90*)
- `by` - the size of the age groups (*default is 5*)
- `as_factor` - the default behaviour is to return a character vector, assigning `TRUE` to this argument will return a factor vector instead (*default is FALSE*)
**If `to` is not a multiple of `by`, it'll be rounded down to the nearest multiple of `by`.**
#### Exercise
Have a look and click 'Run Code' below to see the output. Then, change the code to return age groups from 0 to 80 in groups of 10.
```{r create_age_groups-example, exercise=TRUE}
ages <- c(54, 7, 77, 1, 26, 101)
create_age_groups(ages)
```
```{r create_age_groups-example-check}
grade_result(
pass_if(~ identical(.result, c("50-59", "0-9", "70-79", "0-9", "20-29", "80+"))),
fail_if(~ identical(.result, c("50-54", "5-9", "75-79", "0-4", "25-29", "90+")), message = "Have you changed the groups? Read the exercise again and use the list of arguments above to construct the right function."),
fail_if(~ TRUE)
)
```
## chi_check()
The CHI (Community Health Index) is a register of all patients in NHS Scotland. Each CHI number is a unique, 10-digit, identifier assigned to each patient on the index.
- **Digits 1 - 6** are a patients date of birth in DDMMYY format. As an example of a check, the first number must be 3 or less.
- **Digits 7 - 8** are additional digits to create the unique number.
- **Digit 9** identifies a patient's sex, odd for men, even for women.
- **Digit 10** is a check digit, denoted 'checksum'.
The function, `chi_check()`, takes a CHI number or a vector of CHI numbers with a `character` class (see why `character below). It returns feedback on the validity of the CHI number(s) and, if found to be invalid, provides detail on why. The function only takes one argument, and that's the CHI number(s).
*While a CHI number is made exclusively of numeric digits, it cannot be stored as a `numeric` class. This is because a leading `0` in numeric values are silently dropped. For this reason, `chi_check()` accepts input values of `character` class only. A leading `0` can be added to a 9-digit CHI number using [`chi_pad()`](#chi_pad).*
These are the criteria `chi_check()` validates:
- no non-numeric characters
- 10 digits in length
- first 6 digits represent a valid date
- the checksum digit is correct
The result of `chi_check()` is a character string. Depending on the validity of the CHI number, it will return one of the following:
- 'Valid CHI'
- 'Invalid character(s) present'
- 'Too many characters'
- 'Too few characters'
- 'Invalid date'
- 'Invalid checksum'
- 'Missing (NA)'
- 'Missing (Blank)'
Have a look and click 'Run Code' below to see the output.
```{r chi_check-example, exercise=TRUE}
chi_numbers_check <- c("0101011237", "3213201234", "123456789", "12345678900", "010120123?")
chi_check(chi_numbers_check)
```
```{r chi_check-quiz}
quiz(
question("What is the result of checking this CHI number `2611210001`",
answer("Valid CHI"),
answer("Invalid character(s) present"),
answer("Too many characters"),
answer("Too few characters"),
answer("Invalid date"),
answer("Invalid checksum", correct = TRUE),
answer("Missing (Blank)"),
incorrect = "Not quite. Try running the CHI number in the code chunk above.",
allow_retry = TRUE,
random_answer_order = TRUE
)
)
```
## chi_pad()
The CHI (Community Health Index) is a register of all patients in NHS Scotland. Each CHI number is a unique, 10-digit, identifier assigned to each patient on the index. The first 6 digits are a patients date of birth in DDMMYY format. Depending on the source, CHI numbers may have a missing leading `0`.
*While a CHI number is made exclusively of numeric digits, it cannot be stored as a `numeric` class. This is because a leading `0` in numeric values are silently dropped. For this reason, `chi_check()` accepts input values of `character` class only. A leading `0` can be added to a 9-digit CHI number using [`chi_pad()`](#chi_pad).* **The function does not check for validity, in this case the [`chi_check()`](#chi_check) function should be used.**
The function, `chi_pad()`, takes a CHI number with `character` class and, if it's 9 digits, adds a leading zero (`0`). Any values provided which are not a string comprised of 9 numeric digits are left unchanged. The function only takes one argument, and that's the CHI number(s).
Have a look and click 'Run Code' below to see the output.
```{r chi_pad-example, exercise=TRUE}
chi_numbers_pad <- c("101011237", "101201234", "3213201234", "123223", "abcdefghi", "12345tuvw")
chi_pad(chi_numbers_pad)
```
## file_size()
The `file_size()` function takes a file-path, with an optional regular expression pattern, and returns the names and sizes of files in that file-path which match the given pattern. The function takes 2 arguments:
- `filepath` - a character string denoting a file-path (*default is working directory, `getwd()`*)
- `pattern` - an optional character string denoting a [regular expression](https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html) pattern (*default is `NULL`*). For more on regular expressions, see this [Jumping Rivers blog post](https://www.jumpingrivers.com/blog/regular-expressions-every-r-programmer-should-know/).
The sizes of files with certain extensions are returned with the type of file prefixed. For example, the size of a 12 KB `.xlsx` file is returned as 'Excel 12 KB'. File sizes are returned as the appropriate multiple of the unit byte (bytes (B), kilobytes (KB), megabytes (MB), etc.). Each multiple is taken to be 1,024 units of the preceding denomination. The complete list of handled file extensions and their output prefixes are as follows:
- `.xls`, `.xlsb`, `.xlsm` and `.xlsx` files are prefixed with *'Excel'*
- `.csv` files are prefixed with *'CSV'*
- `.sav` and `.zsav` files are prefixed with *'SPSS'*
- `.doc`, `.docm` and `.docx` files are prefixed with *'Word'*
- `.rds` files are prefixed with *'RDS'*
- `.txt` files are prefixed with *'Text',*
- `.fst` files are prefixed with *'FST',*
- `.pdf` files are prefixed with *'PDF',*
- `.tsv` files are prefixed with *'TSV',*
- `.html` files are prefixed with *'HTML',*
- `.ppt`, `.pptm` and `.pptx` files are prefixed with *'PowerPoint',*
- `.md` files are prefixed with *'Markdown'*
Files with extensions not contained within this list will have their size returned with no prefix. To request that a certain extension be explicitly catered for, please create an issue on [GitHub](https://github.com/Health-SocialCare-Scotland/phsmethods/issues).
The output from the function is a tibble listing the names of files within `filepath` which match `pattern` and their respective sizes. The column names of this tibble are `'name'` and `'size'`. If no pattern is specified, `file_size()` returns the names and sizes of all files within `filepath.` File names and sizes are returned in alphabetical order of file name. Sub-folders contained within `filepath` will return a file size of '0 B'. If `filepath` is an empty folder, or `pattern` matches no files within `filepath`, `file_size()` returns NULL.
*Unfortunately, it's not possible to have the code exercises in this course reach directories. So a run example with output is given below.*
```{r file_size-example, eval=FALSE, echo = TRUE}
file_size(pattern = "\\.xlsx$")
```
```{r file_size-example-output, eval=FALSE, echo = TRUE}
# A tibble: 3 x 2
name size
<chr> <chr>
1 aa.xls Excel 6 KB
2 borders.xlsx Excel 4 KB
3 fife.xlsx Excel 2 KB
```
```{r file_size-quiz}
quiz(
question("What file types are **not** prefixed by `file_size()`?",
answer("Excel (`.xls`, `.xlsb`, `.xlsm` and `.xlsx`)"),
answer("SPSS (`.sav` and `.zsav`)"),
answer("Markdown (`.md`)"),
answer("RMarkdown (`.Rmd`)", correct = TRUE),
incorrect = "Not quite. Have a look at the list above.",
allow_retry = TRUE,
random_answer_order = TRUE
),
question("Which file size will not be returned by `file_size()`?",
answer("0 B"),
answer("256 B"),
answer("1250 B", correct = TRUE),
answer("20 KB"),
allow_retry = TRUE,
random_answer_order = TRUE
)
)
```
## extract_fin_year()
The function, `extract_fin_year()`, takes a date and assigns to a financial year in the PHS specified format `YYYY/YY`, e.g. 2017/18. The function takes only 1 argument, the date. The date which is supplied must be in the class `Date` or `POSIXct`, `as.Date()`, `lubridate::dmy()` and `as.POSIXct()` are examples of functions which can be used to store dates as an appropriate class.
Have a look and click 'Run Code' below to see the output.
```{r extract_fin_year-example, exercise=TRUE}
dates <- lubridate::dmy(c(21012017, 04042017, 17112017))
extract_fin_year(dates)
```
## match_area()
The function, `match_area()`, converts geography codes (or a vector of geography codes), matches to the single corresponding value in the [`area_lookup` dataset](https://github.com/Public-Health-Scotland/phsmethods/blob/master/data-raw/area_lookup.R), and returns the equivalent area name(s). The function only takes 1 argument, the geography code(s).
Current and previous versions of geography codes (e.g. 2014 and 2019 Health Boards) are catered for. The standard 9 digit geography codes are the only exempted by:
- `RA2701`: No Fixed Abode
- `RA2702`: Rest of UK (Outside Scotland)
- `RA2703`: Outside the UK
- `RA2704`: Unknown Residency
The `match_area()` function handles codes related to:
- Health Boards
- Council Areas
- Health and Social Care Partnerships
- Intermediate Zones, Data Zones (2001 and 2011)
- Electoral Wards
- Scottish Parliamentary Constituencies
- UK Parliamentary Constituencies
- Travel to work areas
- National Parks
- Community Health Partnerships
- Localities (S19)
- Settlements (S20)
- Scotland
If an exact match is not found in the `area_lookup` dataset, an `NA` value will be returned. The exact matching is sensitive to both case and spacing. If an unexpected result is produced, it's worth comparing the input data to the `area_lookup` dataset for any differences.
Have a look and click 'Run Code' below to see the output.
```{r match_area-example, exercise=TRUE}
areas <- c("S02000656", "S02001042", "S08000020", "S12000013", "S13002605")
match_area(areas)
```
## format_postcode()
The `format_postcode()` function takes a character string or vector of character strings, extracts the input values which adhere to the standard UK postcode format (with or without spaces), and formats by assigning the appropriate spacing to them (for pc7 and pc8 formats) and ensures all letters are capitalised. The function takes two arguments:
- `x`- a character string or vector of character strings. This input is validated and must be in the standard UK postcode format (excluding case and spacing). Any values not in this format will result in an `NA` and warning (warnings will not prevent the execution of a whole vector, compared to errors which will stop all execution of the function). Warnings are also generated where input contains lower case letters to explain that they will be capitalised.
- `format` - a character string to determine the desired output format. Options are 'pc7' and 'pc8' (*default is 'pc7'*)
- **pc7** - character string of length 7. 7 character postcodes have no space, 6 character postcodes have 1 space after the 2nd character, 5 character postcodes have 2 spaces after the 2nd character.
- **pc8** - character string of maximum length 8. All postcodes have 1 space before the last 3 characters.
The standard format for UK postcodes is:
- 1/2 letters - followed by
- 1 number - followed by
- 1 letter or number (optional) - followed by
- 1 number - followed by
- 2 letters
*[UK government regulations](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/283357/ILRSpecification2013_14Appendix_C_Dec2012_v1.pdf) mandate which letters and numbers can be used in specific sections of a postcode. However, these regulations are liable to change over time. For this reason, `format_postcode()` does not validate whether a given postcode actually exists, or whether specific numbers and letters are being used in the appropriate places. It only assesses whether the given input is consistent with the above format and, if so, assigns the appropriate amount of spacing and capitalises any lower case letters.*
Have a look and click 'Run Code' below to see the output. Read the warnings output and see if you can understand where they've come from.
```{r format_postcode-example, exercise=TRUE}
postcodes <- c("G26QE", "KA89NB", "PA152TY", "G 4 2 9 B A", "g207al", "Dg98bS", "DD37J y", "80210")
format_postcode(postcodes)
```
#### Exercise
Have a look and click 'Run Code' below to see the output. The default format is 'pc7', change the output to be 'pc8'?
```{r pc8-example, exercise=TRUE}
postcodes_pc8 <- c("G26QE", "KA89NB", "PA152TY", "G 4 2 9 B A", "g207al", "Dg98bS", "DD37J y", "80210")
format_postcode(postcodes_pc8)
```
```{r pc8-example-check}
grade_result(
pass_if(~ identical(.result, c("G2 6QE", "KA8 9NB", "PA15 2TY", "G42 9BA", "G20 7AL", "DG9 8BS", "DD3 7JY", NA))),
fail_if(~ identical(.result, c("G2 6QE", "KA8 9NB", "PA152TY", "G42 9BA", "G20 7AL", "DG9 8BS", "DD3 7JY", NA)), message = "Try changing the output format to be pc8."),
fail_if(~ TRUE)
)
```
## Quarters
This section contains [`qtr()`](), [`qtr_end()`](), [`qtr_next()`](), and [`qtr_prev()`]() which are used to assign dates to quarters. The `qtr` functions take a date input and calculates the relevant quarter-related value from it. They all return the year as part of its value:
- `qtr()` returns the current quarter
- `qtr_end()` returns the last month in the quarter
- `qtr_next()` returns the next quarter
- `qtr_prev()` returns the previous quarter
Each variant of the functions takes 2 arguments:
- `date` - a date supplied with `date` class.
- `format` - a character string specifying the format the quarter should be displayed in. Valid options are 'long' (e.g. January to March 2018) and 'short' (e.g. Jan-Mar 2018). (*Default is 'long'*).
Quarters are defined as:
- January to March / Jan-Mar
- April to June / Apr-Jun
- July to September / Jul-Sep
- October to December / Oct-Dec
### qtr()
#### Exercise
Have a look and click 'Run Code' below to see the output. The default format is 'long', can you change the output to be 'short'?
```{r qtr-example, exercise=TRUE}
qtr_eg_dates <- lubridate::dmy(c(26032012, 04052012, 23092012))
qtr(qtr_eg_dates)
```
```{r qtr-example-check}
grade_result(
pass_if(~ identical(.result, c("Jan-Mar 2012", "Apr-Jun 2012", "Jul-Sep 2012"))),
fail_if(~ identical(.result, c("January to March 2012", "April to June 2012", "July to September 2012")), message = "Try changing the output format to be short."),
fail_if(~ TRUE)
)
```
### qtr_end()
#### Exercise
Have a look and click 'Run Code' below to see the output. The default format is 'long', can you change the output to be 'short'?
```{r qtr_end-example, exercise=TRUE}
qtr_eg_dates <- lubridate::dmy(c(26032012, 04052012, 23092012))
qtr_end(qtr_eg_dates)
```
```{r qtr_end-example-check}
grade_result(
pass_if(~ identical(.result, c("Mar 2012", "Jun 2012", "Sep 2012"))),
fail_if(~ identical(.result, c("March 2012", "June 2012", "September 2012")), message = "Try changing the output format to be short."),
fail_if(~ TRUE)
)
```
### qtr_next()
#### Exercise
Have a look and click 'Run Code' below to see the output. The default format is 'long', can you change the output to be 'short'?
```{r qtr_next-example, exercise=TRUE}
qtr_eg_dates <- lubridate::dmy(c(26032012, 04052012, 23092012))
qtr_next(qtr_eg_dates)
```
```{r qtr_next-example-check}
grade_result(
pass_if(~ identical(.result, c("Apr-Jun 2012", "Jul-Sep 2012", "Oct-Dec 2012"))),
fail_if(~ identical(.result, c("April to June 2012", "July to September 2012", "October to December 2012")), message = "Try changing the output format to be short."),
fail_if(~ TRUE)
)
```
### qtr_prev()
#### Exercise
Have a look and click 'Run Code' below to see the output.
```{r qtr_prev-example, exercise=TRUE}
qtr_eg_dates <- lubridate::dmy(c(26032012, 04052012, 23092012))
qtr_prev(qtr_eg_dates)
```
```{r qtr_prev-example-check}
grade_result(
pass_if(~ identical(.result, c("Oct-Dec 2011", "Jan-Mar 2012", "Apr-Jun 2012"))),
fail_if(~ identical(.result, c("October to December 2011", "January to March 2012", "April to June 2012")), message = "Try changing the output format to be short."),
fail_if(~ TRUE)
)
```
## age_calculate()
This function, `age_calculate()`, calculates the age between two dates using `lubridate`. It calculates age in either years or months. The function takes four arguments:
- `start` - A start date (e.g. date of birth) which must be supplied with `Date` or `POSIXct` or `POSIXlt` class. `as.Date()`, `lubridate::dmy()` and `as.POSIXct()` are examples of functions which can be used to store dates as an appropriate class.
- `end` - An end date which must be supplied with `Date` or `POSIXct` or `POSIXlt` class. Default is `Sys.Date()` or `Sys.time()` depending on the class of `start`.
- `units` - Type of units to be used. years and months are accepted. Default is `years`.
- `round_down` - Should returned ages be rounded down to the nearest whole number. Default is `TRUE`.
Have a look and click ‘Run Code’ below to see the output.
```{r age_calculate-example, exercise=TRUE}
my_date <- as.Date("2020-02-29")
end_date <- as.Date("2022-07-11")
age_calculate(my_date, end_date)
```
## dob_from_chi()
This function, `dob_from_chi()`, takes a CHI number or a vector of CHI numbers and returns the DoB as implied by the CHI number(s). If the DoB is ambiguous it will return `NA`. The function takes four arguments:
- `chi_number` - a CHI number or a vector of CHI numbers with `character` class.
- `min_date` and `max_date` - optional min and/or max dates that the Date of Birth could take as the century needs to be guessed. Must be either length 1 for a 'fixed' date, or the same length as `chi_number` for a date per CHI number if it's a vector. `min_date` can be date based on common sense in the dataset, whilst `max_date` can be date when an event happens such as discharge date.
- `chi_check` - logical, optionally skip checking the CHI for validity which will be faster but should only be used if you have previously checked the CHI(s). The default (`TRUE`) will check the CHI numbers.
Have a look and click ‘Run Code’ below to see the output.
```{r dob_from_chi-example, exercise=TRUE}
data <- tibble(chi = c(
"0101336489",
"0101405073",
"0101625707"
), adm_date = as.Date(c(
"1950-01-01",
"2000-01-01",
"2020-01-01"
)))
data %>%
mutate(chi_dob = dob_from_chi(chi,
min_date = as.Date("1930-01-01"),
max_date = adm_date
))
```
#### Exercise
In the last example, we set the minimum date as `1930-01-01`. How about changing it to `1940-01-01` and see what happens?
```{r min_date-example, exercise=TRUE}
data <- tibble(chi = c(
"0101336489",
"0101405073",
"0101625707"
), adm_date = as.Date(c(
"1950-01-01",
"2000-01-01",
"2020-01-01"
)))
data %>%
mutate(chi_dob = dob_from_chi(chi,
min_date = as.Date("1930-01-01"),
max_date = adm_date
))
```
```{r min_date-example-check}
grade_result(
pass_if(~ identical(.result[[1, 3]], as.Date(NA))),
fail_if(~ identical(.result[[1, 3]], as.Date("1933-01-01")), message = "Try changing the minimum date to 1940-01-01."),
fail_if(~ TRUE)
)
```
As you can see the date of birth of the first record is `NA`, as it can either be 1933 or 2033 - 1933 is not within the range between minimum year 1940 and admission year 1950, 2033 is a future year, so it has to be returned as `NA`.
## age_from_chi()
This function, `age_from_chi()`, takes a CHI number or a vector of CHI numbers and returns the age as implied by the CHI number(s). If the DoB is ambiguous it will return `NA`. It uses `dob_from_chi`. The function takes five arguments:
- `chi_number` - a CHI number or a vector of CHI numbers with `character` class.
- `ref_date` - calculate the age at this date, default is to use `Sys.Date()` i.e. today.
- `min_age` and `max_age` - optional min and/or max dates that the Date of Birth could take as the century needs to be guessed. Must be either length 1 for a 'fixed' age, or the same length as `chi_number` for an age per CHI number if it's a vector. `min_age` can be age based on common sense in the dataset, whilst `max_age` can be age when an event happens such as the age at discharge.
- `chi_check` - logical, optionally skip checking the CHI for validity which will be faster but should only be used if you have previously checked the CHI(s), the default (`TRUE`) will to check the CHI numbers.
Have a look and click ‘Run Code’ below to see the output.
```{r age_from_chi-example, exercise=TRUE}
data <- tibble(chi = c(
"0101336489",
"0101405073",
"0101625707"
), dis_date = as.Date(c(
"1950-01-01",
"2000-01-01",
"2020-01-01"
)))
data %>%
mutate(chi_age = age_from_chi(chi))
```
#### Exercise
In the last example, the default ref_date is date of today. How about calculating the age at date of discharge (dis_date)?
```{r dis_date-example, exercise=TRUE}
data <- tibble(chi = c(
"0101336489",
"0101405073",
"0101625707"
), dis_date = as.Date(c(
"1950-01-01",
"2000-01-01",
"2020-01-01"
)))
data %>%
mutate(chi_age = age_from_chi(chi))
```
```{r dis_date-example-check}
grade_result(
pass_if(~ identical(.result[[3]], c(17, 60, 58))),
fail_if(~ TRUE, "Try changing the reference date to discharge date.")
)
```
## Help & Feedback
Now it's time to embed your new knowledge and skills, expand with related technologies (e.g. git), and when you're ready *Take R Further* with more training opportunities.
#### Help
* Vignettes (Help) / `?<function_name>`
* Google / Stack Overflow (tag queries with "[r]")
* [R User Group Teams](https://teams.microsoft.com/l/team/19%3ae9f55a12b7d94ef49877ff455a07f035%40thread.tacv2/conversations?groupId=ec4250f9-b70a-4f32-9372-a232ccb4f713&tenantId=10efe0bd-a030-4bca-809c-b5e6745e499a) / [Technical Queries](https://teams.microsoft.com/l/channel/19%3a9620ef6cf8234d50a0f95caba65a3edf%40thread.tacv2/Technical%2520Queries?groupId=ec4250f9-b70a-4f32-9372-a232ccb4f713&tenantId=10efe0bd-a030-4bca-809c-b5e6745e499a)
* [Transforming Publishing Team email](mailto:[email protected]?subject=Introduction to R Training Online - Help)
#### Feedback
<iframe width="100%" height= "2300" src= "https://forms.office.com/Pages/ResponsePage.aspx?id=veDvEDCgykuAnLXmdF5JmibxHi_yzZ9Pvduh8IqoF_5UMFRROEtHRFI1OU4xSzg5Qk5FQU1aWDdTMSQlQCN0PWcu&embed=true" frameborder= "0" marginwidth= "0" marginheight= "0" style= "border: none; max-width:100%; max-height:100vh" allowfullscreen webkitallowfullscreen mozallowfullscreen msallowfullscreen> </iframe>