These materials support a series of hour long interactive, virtual workshops on using packages that are part of the tidyverse.
Install R and RStudio on your own laptop (both are free). If you can't install these programs or run into issues installing packages, RStudio Cloud is a good option.
Also, install (or update) the tidyverse package. Instructions are at the bottom of the installation instructions page. If you run into installation issues (about 5-10% of you are likely to based on past workshops), you (Northwestern folks) can send me an email at [email protected].
I will be using tidyverse version 1.3 and R version 4.1 or later in the workshop (any R version starting with a 4 should be OK). You will want to update the tidyverse package to at be at least 1.3.0, as there were notable changes from earlier versions. To update a package, make sure you are starting from a fresh R session (no packages loaded), and then just install it again: install.packages("tidyverse")
.
These workshops assume you are familiar with R outside of these packages. Sessions build on each other to some extent. Sessions after the first one assume you understand the material covered in the first session. Later sessions may also use additional concepts from earlier sessions.
You may also want to learn about R Markdown files (and R Notebooks, which are a special type): https://rmarkdown.rstudio.com. They're useful for when you want to combine R code, output, and text together in a document. The files for these workshops are R Markdown documents, which will be run in Notebook mode so the output appears below the code cells in the file.
Links for each session will show the rendered html file. There is a button in the upper right of each file to download the .Rmd file to work with in RStudio yourself. Or download this entire repository instead to get the .Rmd files for all sessions.
The July 2022 workshop will include the following five parts (one each day):
- Session 1: Tidyverse basics
- Session 2: dplyr – select, filter, mutate
- Session 3: dplyr – group by, summarize, arrange, across
- Session 4: dplyr – joins - working with two data frames
- Session 5: tidyr – pivot_longer, pivot_wider, separate, separate_rows
Additional Sessions:
- Visualization with ggplot2: a quick intro to ggplot2
- Bonus 1: utility packages – stringr, lubridate, forcats, readxl: this is an overview lecture without exercises
- Regular Expressions with stringr: requires knowledge of regular expressions
- dplyr with databases: our database workshop materials include an example of using dplyr directly with a database connection instead of writing SQL queries
R for Data Science covers the use of tidyverse packages for data analysis in depth. Written by two of the authors of these packages, this free online book is a good place to go for additional information and practice.
Note, however, that this book was written a few years ago, so some of the packages and functions have evolved. In particular, pivot_longer and pivot_wider used to be functions called gather and spread. For a tutorial on pivot_* functions, see https://tidyr.tidyverse.org/articles/pivot.html
RStudio Videos: Data Wrangingling with R and the Tidyverse. These cover material from roughly the first 4 sessions of this workshop series.
For more on ggplot2, see our guide with free (for Northwestern folks) resources.