Next session for Northwestern affiliates will be in June 21-25, 2021. More information: https://www.it.northwestern.edu/research/training.html
Quick links to schedule:
- Monday: Introduction
- Tuesday: Data Frames
- Wednesday: Visualization
- Thursday: Data Exploration and Statistics
- Friday: Programming
This introductory workshop is designed to help you get familiar with the core concepts and functions in R. You'll learn the basics that will help you use R in your research and provide a foundation for further learning in the future. This workshop assumes no prior knowledge of R.
This workshop is designed for beginners with limited or no prior programming experience; however, we strongly recommend you also register for the Programming Concepts workshop if you are unfamiliar with any of the following terms: working directory, vector, boolean, string, list index, function, or any of the other content covered in the Programming Concepts workshop.
The live sessions will only cover a fraction of the material we typically cover in person. You need to complete the independent work as well.
If this workshop does not fit with your schedule, or you have significant experience with other programming languages, you may want to use our recommended resources for learning R on your own instead. Request a consultation if you'd like help finding a course or book that fits your needs.
This is a remote workshop that combines live sessions on Zoom with independent work on recommended exercises and materials. We'll use Canvas to manage the Zoom sessions, email communications, and discussion lists.
Morning Zoom sessions will be recorded and available in Canvas for a limited time. But you will get the most out of this workshop if you cover the topics according to the schedule. If this workshop doesn't fit with your schedule at the moment, we recommend one of the many online R courses instead, as they are designed for remote, self-paced learning.
Morning Zoom Session: ~90 minutes live session teaching new skills at 10am each day. These sessions will be demonstration plus a few exercises, but you will NOT need to watch the video and code in R at the same time. We'll pause when you need to switch to doing something yourself in R. Questions welcome via the chat; no need to share your video for this session.
Mid-day Work Session: Each day has online materials for you to work through on your own. The materials include explanations of new concepts as well as exercises. Start with the Start here materials; check out the other resources if you need more information or want more practice. Then do the exercises. As questions arise, ask them on the Canvas discussion board - we'll actively monitor it during the day. Or bring your questions to the afternoon office hours.
Afternoon Zoom Session: 30-60 minutes live session at 3pm daily reviewing material, with the opportunity to ask questions. Think of this session as group office hours. We'll have a quiz to review key concepts, but otherwise, the content will be driven by your questions and the quiz results. Share your video if you can, since this session will be more interactive.
Other Times: Everyone's schedule is different, so work on exercises earlier or later in the day as needed. Post questions on the discussion board on Canvas, and we'll answer them as soon as we can.
Don't just read along with the materials during the independent work sessions -- try out the code in R! Typing it, fixing the errors, and getting used to using R is important.
Focus on the Start here resources. The others are provided in case you want another perspective, additional info on a topic, or more practice. But don't try to do it all!
While some general exercises are provided, you'll learn more by trying to apply your new skills to your own data (or a dataset you're interested in).
Errors are normal.
Try changing the code to see what happens - you won't break anything.
All times are Central Time
Install R and RStudio on your own laptop (both are free). If you can't install these programs or run into issues installing packages, RStudio Cloud is a good option.
Also, install the tidyverse package. Instructions are at the bottom of the installation instructions page. If you run into installation issues (about 5-10% of you are likely to based on past workshops), post an issue on the Canvas discussion board or email the workshop instructors.
Download this repository (but wait until a few days before the workshop starts to make sure you get the most up-to-date materials)
Then you will have the datasets and exercises downloaded to your computer to work on. Open the r-online-2020.Rproj
file in the project directory to make sure your working directory is set correctly for the scripts.
During in-person workshops, these are the handouts we usually provide. You may find them useful to have available as you work.
RStudio Cheat Sheets are short pdfs that summarize key R functions on specific topics. Many people print them out for reference while working in R. The ggplot2
cheat sheet, in particular, is indispensable when working with that package.
R Reference Card: lists many commonly used functions, so that you can find what you're looking for, since the R help is most useful when you already know the function name you want.
There are links to many free online resources below. A few resources require registration first. None of these are required -- they are supplemental materials.
R Cookbook - create an account with O'Reilly online through the library first.
Cloud Based Data Science - an online set of courses from Johns Hopkins University faculty. Not required for the workshop, but links to relevant material from those courses is included below under extra materials. Pricing is flexible (including free).
10am Zoom session: RStudio Tour, Basics of Using R
Session File: session_notes/part1.Rmd
or HTML version - you don't need the .Rmd file open today
3pm Zoom session: Quiz, Review, and Office Hours
- Using RStudio
- Installing and loading packages
- Creating an R script
- Using the R console
- Using functions
- Variables
- Data types
- Vectors and indexing vectors
- Missing Data
- Factors
- Create a frequency/count table
Start here:
- Are you completely new to R and/or RStudio? Start with a few videos
- R Programming 101 - How to Use RStudio
- R Programming 101 - Part3 - covers importing a CSV file and installing a package; the end of the video shows a few commands using the tidyverse packages, which the other materials below don't use (but this workshop does)
- Learning Statistics with R v2: Chapter 2 Getting Started in R; Appendices A1 Vectors and A2 Factors
- Logical Indexing and Changing values of a vector from YaRrr! The Pirate's Guide to R
Alternative/extra material:
- Additional sections of YaRrr! The Pirate's Guide to R may be helpful -- the index is fairly clear for finding relevant material, chapters 4-7 roughly correspond to today's material
- Covers many of the same concepts, except factors: Data Carpentry Data Analysis and Visualization in R for Ecologists: Introduction to R Lesson -- not specific to ecologists!
- Covers many of the same concepts, except factors: Cloud Based Data Science Intro to R Course - first 6 sections (through Working with Logicals)
- Data types and vectors: Software Carpentry Programming in R Data Structures. At the end this introduces data frames, which are the topic for Tuesday.
- Similar material to above, from the same author: Learning Statistics with R v1: Chapter 3 Getting Started with R; Chapter 4 Additional R Concepts Sections 4.1 through 4.7
- Factors: Software Carpentry Programming with R: Understanding Factors. Get the data set used in the lesson from the course setup page. Ask if you need help loading the data file.
- Working directory and RStudio Projects: Stat 545 R Basics
- Packages: Learning Statistics with R v1
- Reference: R Cookbook Chapter 2 Some Basics: mostly about working with vectors
BONUS: You may also want to learn about R Markdown files (and R Notebooks, which are a special type): https://rmarkdown.rstudio.com. They're useful for when you want to combine R code, output, and text together in a document. The files for the morning sessions are partly R Markdown files and partly R Notebook files.
See the exercises in the exercises/part1. Open the .R files on your own computer and write your answers in the scripts. There are answer files to check your work.
10am Zoom session: Reading in Data, Working with Data Frames
File: session_notes/part2.Rmd
or HTML version
3pm Zoom session: Quiz, Review, and Office Hours
- Read in a CSV file
- What is a data frame
- Subsetting data frames
- Making new variables in a data frame
- Recoding a variable
- Reading an R help page
Start here:
- Software Carpentry Programming in R Reading and Writing CSV Files: NOTE: Do not change your working directory with
setwd()
; thecars-speeds.csv
andcar-speeds-cleaned.csv
files are already included in thedata/
directory in this repository, so you already have them downloaded. Thedata/car-speeds.csv
paths will work with your R Studio project for this workshop. - Software Carpentry Programming in R Addressing Data. NOTE: Same note as above concerning the data file.
- Learning Statistics with R v1 7.2 Transforming and recoding a variable. Reviews some material from yesterday, and then adds in data frames. NOTE: you do not need to load a file called
likert.Rdata
(although it's available as part of the "Data sets" link here; instead, run the code below to create thelikert.raw
vector she works with in the section:
likert.raw <- c(1, 7, 3, 4, 4, 4, 2, 6, 5, 5)
Alternative/extra material:
- YaRrr! The Pirate's Guide to R Matrices and Dataframes
- YaRrr! The Pirate's Guide to R Advanced dataframe manipulation - includes a small part on dplyr, which we aren't covering this week
- Data Carpentry Data Analysis and Visualization in R for Ecologists Lesson 2 Starting with Data: covers data frames and factors (a Monday topic)
- Interactive tutorial on data frames : DataCamp 15 Easy Solutions To Your Data Frame Problems In R
- Subset a data frame: Learning Statistics with R v1 7.5 Extracting a subset of a data frame (7.3 on vectors is also useful)
- Getting Help:
- Cloud Based Data Science Intro to R Course - Getting Help in R section
- R Cookbook Starting Section 1.7
- Reading in Data Files: Cloud Based Data Science Getting Data Section 2 CSV, Excel, and TSV Files
Notes:
- If you see a reference to the stringsAsFactors option for read.csv(): it defaults to TRUE for R versions < 4 but FALSE for R 4.0.0 and later. Some materials have not yet been updated to reflect this change.
See the exercises in the exercises/part2. Open the .R files on your own computer and write your answers in the scripts. There are answer files to check your work.
Want more practice? Read one of your own datasets into R and use the skills you've learned to count rows, compute max and min values, make tables, create new variables, etc. Need a data set to work with? Download one from: https://www.openintro.org/data/index.php. It's good to practice doing all of the steps you've learned with data that you haven't been handed as part of the workshop to make sure you really do know how to do it on your own.
10am Zoom session: Base R Plots, ggplot2 basics
File: session_notes/part3.Rmd
or HTML version and session_notes/ggplot2.Rmd
HTML version
Note: the ggplot2 part of today is the same material in the Tidyverse workshop series
3pm Zoom session: Quiz, Review, and Office Hours
- Use plot() and hist()
- Understand ggplot2 code
- Make basic plots with ggplot2
Start here: (focus on just one)
- ggplot2: R for Data Science Chapter 3 Visualization - this is a lot of material to digest; you don't have to get through it all at once. Focus on the basics, and then skim the rest so you know what features are available. Come back to it as you need to learn more to do what you want with your plots.
- Base R graphics: Learning Statistics with R v1 Drawing Graphs
Alternative/extra material:
- R Cookbook Graphics uses ggplot2
- Cloud Based Data Science Data Visualization
- Data Carpentry Data Analysis and Visualization in R for Ecologists Data visualization with ggplot2
- Learning Statistics with R v2 Pretty Pictures - this chapter uses the
%>%
operator prevalent in the tidyverse; see an explanation here.
Want more? See our online guide to learning ggplot2 - most of these will take you longer than a few hours.
See the exercises in the exercises/part3. Open the .R files on your own computer and write your answers in the scripts. There are answer files to check your work.
Want more practice? Read one of your own datasets into R and start making some plots. Don't like what they look like? Start trying to change the styling (lots of googling is to be expected). Looking for data to work with? Try https://github.com/rfordatascience/tidytuesday or https://data.fivethirtyeight.com/. As with the data frame material, trying your new skills on your own data will help you learn faster.
10am Zoom session: Formula syntax, interpreting linear regression output
File: session_notes/part4.Rmd
or HTML version
3pm Zoom session: Quiz, Review, and Office Hours
- Summarize by group (tapply, aggregate)
- Correlation
- Run a t-test
- Formula syntax
- Linear regression
- ANOVA
Start here: Learning Statistics with R Ch. 5 Descriptive Statistics and Part V Statistical Tools: choose the sections relevant to the methods and models for your field. Note: This book teaches statistics in addition to R, so there's theoretical material you may be able to skip.
NOTE: Get the datasets from the "Data sets" link on the book's main page: https://learningstatisticswithr.com/. Unzip the file, and it contains multiple .RData
files that are named according to the data in them. (.RData
is a way to save data in an R format -- CSVs and other file types are better for replication and portability.) Move the files you need to the data/
folder in this project. Load (open) a file with:
load("data/parenthood.RData")
Then you'll see one (or more) new objects in your Environment tab that you can use.
Alternative/extra material:
- YaRrr! The Pirate's Guide to R Hypothesis Tests
- YaRrr! The Pirate's Guide to R Regression
- Descriptive and Exploratory Analysis: Cloud Based Data Science Data Analysis Course: focuses on concepts more than the R code
- Statistics: R Cookbook - has several useful sections if you already know what you want to do
- UCLA Statistical Consulting Data Analysis Examples - if you're familiar with Stata, SAS, SPSS, or MPlus, this site has examples worked for many types of different statistical models (mostly regression based) for these programs as well as R.
Want more? See Statistics and Machine Learning resource list, or our online guide to learning linear regression in R.
See the exercises in the exercises/part4 directory for t-test and linear regression exercises. If these methods aren't relevant to your field, try instead to run a statistical test or model that you have computed in another program in R instead. Ask on the discussion forum if you need help finding the right package or function in R for your analysis (because one is almost guaranteed to exist).
10am Zoom session: Functions, for loops, if-else
File: session_notes/part5.Rmd
or HTML version
3pm Zoom session: Quiz, Review, and Office Hours
- If/else statements
- ifelse()
- for loops
- Writing functions
Start here:
- YaRrr! The Pirate's Guide to R Custom Functions
- YaRrr! The Pirate's Guide to R Loops - you may want to skip the section on loops and lists, since we didn't cover lists this week
- If/else: Software Carpentry Programming in R Making Choices
NOTE: Download the data for the Software Carpentry lesson from the Setup Page before starting the lesson.
Alternative/extra material:
- Writing Functions: Software Carpentry Programming in R Creating Functions
- Loops: Software Carpentry Programming in R Loops
- Learning Statistics with R v1 Chapter 8 Basic Programming
- R Cookbook Simple Programming - gets less "simple" as the chapter progresses
See the exercises in the exercises/part5 directory.