Day4_stats_and_data_visualization.qmd

---
output:
  html_document:
    df_print: paged
    code_download: TRUE
    toc: true
    toc_depth: 1
editor_options:
  chunk_output_type: console
---

```{r, setup, include=FALSE}
knitr::opts_chunk$set(
  eval=FALSE, warning=FALSE, error=FALSE
)
```

# Brief intro to statistics in R

## How to approach statistics in R

R is a great language for statistics. Most statistical methods that you'll want to use are probably available in R, either in base R or in a package.

There are so many statistical methods implemented in R, and new ones are constantly developed, that we can't teach you everything. (Although we do offer a workshop on [statistical modeling in R](https://github.com/nuitrcs/r-statistical-modeling)).

Learning statistics in R is not about memorizing a set of functions. Once you know base R (as you do now!), you can use any statistical method in R. Some methods may have more or less complexity, but you can do it.

To use statistics in R, you first need to know statistics. Say you already do and you want to use linear regression.

1.  The first step is to Google "linear regression in R." That search will result in many resources (which is one of the great things of using R!).
2.  Once you find a good resource, it's all a matter of using your knowledge of R to implement any statistical method.

Throughout the process, it's important that you don't lose yourself in the code and forget to use your statistical knowledge. Are you using the method in an appropriate way? Are you satisfying the key assumptions? Are you using the right tests and diagnostics?

## Example method: linear regression

To start familiarizing yourself with statistics in R, let's take the example of linear regression, which is a commonly used method and is helpful to illustrate some key aspects about statistics in R.

Here we're focusing on how to implement statistical methods in R. We're not covering the statistics of linear regression. That said, our team does have statisticians and we do teach workshops focused on statistics.

Let's use the `penguins` dataset that we've used before from the package `palmerpenguins`.

```{r}
library(palmerpenguins)
data(penguins)
head(penguins)
```

To compute a linear regression, we use the `lm()` function (linear model) and the special formula syntax.

A formula object specifies a relationship between variables. A formula is specified using the "tilde operator" `~`. A very simple example of a formula is shown below:

```         
outcome_variable ~ predictor_variable
```

The *precise* meaning of this formula depends on exactly what you want to do with it, but in broad terms it means "the outcome variable, analysed in terms of the predictor variable". That said, although the simplest and most common form of a formula uses the "one variable on the left, one variable on the right" format, there are others.

With this knowledge of the formula syntax, we can predict body mass with flipper and bill length:

```{r}
reg1 <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm , data=penguins)
reg1
```

To get more than just the coefficient estimates in the output, use the summary function on the resulting regression object:

```{r}
summary(reg1)
```

The result of the regression is an lm model object, which contains multiple pieces of information. We can access these different components if we need to.

```{r}
names(reg1)
```

```{r}
reg1$coefficients
```

The result of the summary function is a different object, with some different components:

```{r}
names(summary(reg1))
```

But we can still extract the information

```{r}
summary(reg1)$coefficients
```

### EXERCISE

What is the Pearson correlation coefficient between flipper length and bill length?

Hint: Break down this problem into steps. Select the columns for correlation and remove any NA values that might interfere. Then find out the R function you would need to calculate the correlation coefficient between 2 vectors.

```{r}
library(palmerpenguins)
data(penguins)

```

### EXERCISE

Perform a Welch two sample t-test to check if there is a statistically significant difference in body mass between male and female penguins. Assume that the variance of the two groups is unequal. The test should be two-sided with a confidence level of 95%. Get the p-value for this test.

Hint: Find out the R function you would need to calculate the t-test statistic between 2 vectors. Use `names()` to access the p-value of your test statistic.

```{r}

```

## Resources to learn more about statistics in R

You can learn more in our workshop on [statistical modeling in R](https://github.com/nuitrcs/r-statistical-modeling). We also have a [resource guide on linear regression](https://sites.northwestern.edu/researchcomputing/resource-guides/resource-guide-linear-regression/), which includes materials for R. Further, we have a [resource guide on R](https://sites.northwestern.edu/researchcomputing/resource-guides/resource-guide-r/), which includes statistical aspects. Finally, you can always use our [free consultation service](https://services.northwestern.edu/TDClient/30/Portal/Requests/ServiceDet?ID=93) for questions about research using statistical methods and their implementation in R.

# Data Visualization

Often, plotting your data will be much more informative than summary statistics alone. The R package *ggplot2* is the most widely used and aesthetically pleasing graphics framework available in R. It relies on a structure called the "grammar of graphics". Essentially, it follows a layered approach to describe and construct the visualization.

Here is a handy [cheat sheet for ggplot2](https://statsandr.com/blog/files/ggplot2-cheatsheet.pdf)! Most users rely on the cheat-sheet because it is often difficult to remember the exact syntax and options.

Load the *ggplot2* package in your workspace as below.

```{r}
# install.packages("ggplot2")
library(ggplot2)
```

## The grammar of graphics

![](https://miro.medium.com/v2/resize:fit:2000/format:webp/1*mcLnnVdHNg-ikDbHJfHDNA.png){width="665"}

To build or describe any visualization with one or more dimensions, we can use the components as follows.

1.  **Data**: Always start with the data, identify the dimensions you want to visualize.

2.  **Aesthetics**: Confirm the axes based on the data dimensions, positions of various data points in the plot. Also check if any form of encoding is needed including size, shape, color and so on, which are useful for plotting multiple data dimensions.

3.  **Scale:** Do we need to scale the potential values, use a specific scale to represent multiple values or a range? For example, you can scale the data to log of the original values.

4.  **Geometric objects:** These are popularly known as 'geoms'. This would cover the way we would depict the data points on the visualization. Should it be points, bars, lines and so on?

5.  **Statistics:** Do we need to show some statistical measures in the visualization like measures of central tendency, spread, confidence intervals?

6.  **Facets:** Do we need to create subplots based on specific data dimensions?

7.  **Coordinate system:** What kind of a coordinate system should the visualization be based on ---should it be Cartesian or polar?

You don't have to remember everything - refer to your handy [cheat sheet for ggplot2](https://statsandr.com/blog/files/ggplot2-cheatsheet.pdf)!

# Understanding the `{ggplot}` Syntax

The syntax for constructing ggplots could be puzzling if you are a beginner or work primarily with base graphics. The main difference is that, unlike base graphics:

-   `{ggplot}` works with dataframes and not individual vectors. All the data needed to make the plot are typically contained within the dataframe supplied to the `ggplot()` function itself or can be supplied to respective geoms.

-   The second noticeable feature is that you can keep enhancing the plot by adding more layers (and themes) to an existing plot created using the `ggplot()` function.

-   The order of the layers matters, the first command/layer is executed first, and so on. *Sometimes, this can make a difference in your final plot.*

Let's initialize a basic ggplot based on the midwest dataset:

```{r}
# Setup
data(midwest, package = "ggplot2")
View(midwest)
```

```{r}
ggplot(midwest) # Initialize a plot by calling the ggplot() function
```

A blank ggplot is drawn. ggplot knows we want a plot with a given data set!

![](https://miro.medium.com/v2/resize:fit:2000/format:webp/1*mcLnnVdHNg-ikDbHJfHDNA.png){width="665"}

```{r}
ggplot(midwest) + # Initialize a plot by calling the ggplot() function
  aes(x=area, y=poptotal) # area and poptotal are columns in 'midwest'
```

Now ggplot knows the columns we want to plot!

Notice how layering works with the + operator (not %\>%).

Even though `x` and `y` are specified, there are no points or lines in it. This is because ggplot doesn't assume that you want a scatterplot or a line chart to be drawn. We have only told ggplot what dataset to use and what columns should be used for X and Y. We haven't explicitly asked it to draw any points.

Also note that `aes()` function is used to specify the X and Y axes. That's because, any information that is part of the source dataframe has to be specified inside the `aes()` function.

![](https://miro.medium.com/v2/resize:fit:2000/format:webp/1*mcLnnVdHNg-ikDbHJfHDNA.png){width="665"}

# Make a simple scatterplot

Let's make a scatterplot on top of the blank ggplot by adding points using a geom layer called `geom_point`

```{r}
ggplot(midwest) +
  aes(x=area, y=poptotal) +
  geom_point()
```

But the scientific notation doesn't look too appealing - no problem, you can set an R option to turn it off! Options settings allow the user to set and examine a variety of global *options* which affect the way in which R computes and displays its results. Typically you would include these options on top of the script, after loading the packages that you're going to use.

```{r}
options(scipen=999)  # turn off scientific notation like 1e+06

ggplot(midwest) +
  aes(x=area, y=poptotal) +
  geom_point()
```

We got a basic scatterplot, where each point represents a county. However, it lacks some basic components such as the plot title, meaningful axis labels, etc. Moreover, most of the points are concentrated on the bottom portion of the plot, which is not so helpful. You will see how to rectify these in upcoming steps.

![](https://miro.medium.com/v2/resize:fit:2000/format:webp/1*mcLnnVdHNg-ikDbHJfHDNA.png){width="665"}

## Using geoms to make other plot types

Like `geom_point()`, there are many such geom layers such as `geom_line` , `geom_bar`, `geom_density`, `geom_histgram`, etc.

```{r}
# histogram of county area
ggplot(midwest) + 
  aes(x=area) +
  geom_histogram(bins=100)
```

```{r}
# density plot of county area
ggplot(midwest) + 
  aes(x=area) +
  geom_density()
```

```{r}
# make a boxplot
ggplot(midwest) + 
  aes(x=area, y=state) +
  geom_boxplot() 
```

```{r}
# make a violinplot
ggplot(midwest) + 
  aes(x=area, y=state) +
  geom_violin()
```

## EXERCISE

Load the penguins data from before by loading the `palmerpenguins` package.

Make a scatter plot of flipper length and body mass with point representations for each observation.

```{r}
library(palmerpenguins)
data(penguins)


```

## Adding a statistic to your plot

For now, let's just add a smoothing layer using `geom_smooth(method='lm')`. Since the `method` is set as `lm` (short for [*linear model*](http://r-statistics.co/Linear-Regression.html)), it draws the line of best fit.

```{r}
g <- ggplot(midwest) + 
  aes(x=area, y=poptotal) +
  geom_point() + 
  geom_smooth(method="lm")  # set se=FALSE to turnoff confidence bands
g
```

The line of best fit is in blue. Can you find out what other `method` options are available for `geom_smooth` (note: see `?geom_smooth`)?

## EXERCISE

Add a linear fit line to the `penguin_plot` from before.

```{r}

```

## EXERCISE

What if we just want a smooth (non-linear) fit to the plot above? Hint: use method `"loess"`.

```{r}

```

## Adjusting the X and Y axes limits

You might have noticed that the majority of points lie in the bottom of the chart which doesn't really look nice. So, let's change the Y-axis limits to focus on the lower half.

The X and Y axes limits can be controlled in 3 ways.

### **Method 1**: By deleting the points outside the range

*This will change the lines of best fit or smoothing lines as compared to the original data.*

This can be done by `xlim()` and `ylim()`. You can pass a numeric vector of length 2 (with max and min values) or just the max and min values themselves.

```{r}
g <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm")  
g

# Delete the points outside the limits
g + 
  xlim(c(0, 0.1)) + 
  ylim(c(0, 1000000))   # deletes points
```

In this case, the chart was not built from scratch but rather was built on top of `g`. This is because, the previous plot was stored as `g`, a ggplot object, which when called will reproduce the original plot. Using ggplot, you can add more layers, themes and other settings on top of this plot.

Did you notice that the line of best fit became more horizontal compared to the original plot? This is because, when using `xlim()` and `ylim()`, the points outside of the specified range are deleted and will not be considered while drawing the line of best fit (using `geom_smooth(method='lm')`). This feature might come in handy when you wish to know how the line of best fit would change when some extreme values (or outliers) are removed.

### **Method 2**: Zooming In

The other method to adjust the X and Y axes is to zoom in to the region of interest *without* deleting the points. This is done using `coord_cartesian()`.

Let's store this plot as `g1`.

```{r}
g <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm")  # set se=FALSE to turnoff confidence bands

# Zoom in without deleting the points outside the limits. 
# As a result, the line of best fit is the same as the original plot.
g1 <- g + 
  coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000))  # zooms in
g1
```

Since all points were considered, the line of best fit did not change.

### **Method 3**: Using the scales layer

Scales in *ggplot2* control the mapping from data to aesthetics. They take your data and turn it into something that you can see, like size, color, position, or shape. They also provide the tools that let you interpret the plot: the axes and legends.

You can generate plots with ggplot2 without knowing how scales work, but understanding scales and learning how to manipulate them will give you much more control. To get an in-depth understanding of how scales work in *ggplot2* see [this book chapter on scales and guides](https://ggplot2-book.org/scales-guides).

An important property of *ggplot2* is the principle that every aesthetic in your plot is associated with exactly one scale. For instance, when you write this:

```{r}
ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point()
```

*ggplot2* adds a default scale for each aesthetic used in the plot:

```{r}
ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() +
  scale_x_continuous() + 
  scale_y_continuous() 
```

Inside the `scale` layer, you can add limits as:

```{r}
scale_plot <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() +
  scale_x_continuous(limits=c(0, 0.1)) + 
  scale_y_continuous(limits=c(0, 1000000)) +
  geom_smooth(method="lm")
scale_plot
```

Notice that, similarly to method 1, *this method deletes the points outside of the range.*

Position scales have many more features that you can [explore](https://ggplot2-book.org/scales-position).

![](https://miro.medium.com/v2/resize:fit:2000/format:webp/1*mcLnnVdHNg-ikDbHJfHDNA.png){width="665"}

## EXERCISE

Limit the flipper length to values between 180 and 220 in `penguin_plot` **without** removing the original points from the best-fit line calculation.

```{r}

```

## Change the Title and Axes Labels

Let's continue with our previous scatter plot of the midwest data and add the plot title and labels for the X and Y axes. This can be done in one go using the `labs()` function with `title`, `x`, and `y` arguments. Another option is to use `ggtitle()`, `xlab()`, and `ylab()`.

```{r}
g <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm")  # set se=FALSE to turnoff confidence bands

g1 <- g + 
  coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000))  # zooms in

# Add Title and Labels
g1 + 
  labs(title="Area Vs Population", 
       subtitle="From midwest dataset", 
       y="Population", 
       x="Area", 
       caption="Midwest Demographics")

# or

g1 + 
  ggtitle("Area Vs Population", 
          subtitle="From midwest dataset") + 
  xlab("Area") + 
  ylab("Population")
```

Excellent! So here is the full function call:

```{r}
ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point() + 
  geom_smooth(method="lm") + 
  coord_cartesian(xlim=c(0,0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", 
       subtitle="From midwest dataset", 
       y="Population", 
       x="Area", 
       caption="Midwest Demographics")
```

## EXERCISE

Add suitable axis labels and titles to `penguin_plot`.

```{r}

```

## Change the Color and Size of Points

### Set a static color or size

We can change the aesthetics of a geom layer by modifying the respective geoms. Let's change the color of the points and the line to a static value.

```{r}
ggplot(midwest, aes(x=area, y=poptotal)) +
  geom_point(col="plum3") +
  geom_smooth(method="lm", col="aquamarine3") +
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", 
       subtitle="From midwest dataset", 
       y="Population", 
       x="Area", 
       caption="Midwest Demographics")
```

Let's try new colors and also change the size of the points:

```{r}
ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(col="steelblue", size=3) +   # Set static color and size for points
  geom_smooth(method="lm", col="firebrick") +  # change the color of line
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", 
       subtitle="From midwest dataset", 
       y="Population", 
       x="Area", 
       caption="Midwest Demographics")
```

## EXERCISE

Change the color of the points to red and the transparency (alpha) to 0.5 by setting parameters within `geom_point` for a plot of flipper_length_mm vs body_mass_g.

```{r}

```

### Change the Color To Reflect Categories in Another Column

If we want the color to change based on another column in the source dataset (`midwest`), it must be specified inside the `aes()` function.

```{r}
gg <- ggplot(midwest, aes(x=area, y=poptotal)) + 
  geom_point(aes(colour=state), size=2) +  # Vary color based on state categories.
  geom_smooth(method="lm", col="firebrick", linewidth=2) + 
  coord_cartesian(xlim=c(0, 0.1), ylim=c(0, 1000000)) + 
  labs(title="Area Vs Population", 
       subtitle="From midwest dataset", 
       y="Population", 
       x="Area", 
       caption="Midwest Demographics")
gg
```

Now each point is colored based on the `state` it belongs to because of `aes(colour=state)`. Not just color, but `size`, `shape`, `stroke` (thickness of boundary), and `fill` (fill color) can be used to discriminate groupings.

As an added benefit, the legend is added automatically. If needed, it can be removed by setting the `legend.position` to `None` from within a `theme()` function. We'll talk about themes later.

### Scales for color

Just like you can add scales for position, you can also add [scales for color](https://ggplot2-book.org/scales-colour).

The default scale for continuous fill scales is `scale_fill_continuous()` which in turn defaults to `scale_fill_gradient()`. As a consequence, these three commands produce the same plot using a gradient scale:

```{r}
gg

gg + scale_fill_continuous()

gg + scale_fill_gradient()
```

You can change the color palette entirely using a different color scale:

```{r}
gg + scale_colour_brewer(palette = "Set1")  # change color palette
```

More of such palettes can be found in the RColorBrewer package.

```{r}
library(RColorBrewer)

head(brewer.pal.info, 10)  # show 10 palettes

gg + scale_colour_brewer(palette = "PuOr")
```

*When selecting colors for your visualizations, in addition to other factors, it's important to consider [friendly colors for people with color vision deficiency](https://www.tableau.com/blog/examining-data-viz-rules-dont-use-red-green-together).*

## EXERCISE

Change the color of the points according to the species column for a plot of flipper_length_mm vs body_mass_g.

```{r}

```

# Themes

Let's make the graph more elegant! Let's use themes to clean up the background.

`theme_pubr()` from the `{ggpubr}` package is a nice option:

```{r}
#install.packages("ggpubr")
library(ggpubr)

ggplot(data=penguins, 
       aes(x = flipper_length_mm, 
           y = body_mass_g)) +
  geom_point(aes(color = species)) +
  geom_smooth(method="lm", aes(color = species)) + 
  labs(x = "Flipper length (mm)", 
       y = "Body mass (g)", 
       title = "Penguins Data") +
  theme_pubr()
```

Themes can be [customized](https://rpubs.com/mclaire19/ggplot2-custom-themes) to every detail - the [possibilities](https://ggplot2.tidyverse.org/reference/theme.html) are endless!

The [`{ggsci}`](https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html) package offers a collection of high-quality color palettes inspired by colors used in scientific journals, data visualization libraries, science fiction movies, and TV shows. But there are several built-in themes in ggplot that you can use:

```{r}
ggplot(data=penguins, 
       aes(x = flipper_length_mm, 
           y = body_mass_g, 
           color = species)) +
  geom_point() +
  geom_smooth(method="lm") +
  labs(x = "Flipper length (mm)", 
       y = "Body mass (g)", 
       title = "Penguins Data") +
  theme_classic()
```

## Fonts

Themes are extremely versatile. For example, you change the axis font styles using custom inputs:

```{r}
ggplot(data=penguins, 
       aes(x = flipper_length_mm, 
           y = body_mass_g, 
           color = species)) +
  geom_point() + 
  theme_classic() +
  theme(axis.title.x = element_text(color="dodgerblue", 
                                    size=15, 
                                    face="italic", 
                                    family="Comic Sans MS")) 
```

Notice how the theme custom input overrides the defaults set in `theme_classic()`.

Remember that the order of layering matters! The last passed input will be retained.

```{r}
ggplot(data=penguins, 
       aes(x = flipper_length_mm, 
           y = body_mass_g, 
           color = species)) +
  geom_point() +
  theme(axis.title.x = element_text(color="dodgerblue", 
                                    size=15, 
                                    face="italic", 
                                    family="Comic Sans MS")) +
  theme_classic()
```

## EXERCISE

Add a theme of your choosing from [here](https://r-charts.com/ggplot2/themes/) to `penguin_plot`.

Reminder: if you are using a new package, first install and load that package to access the themes inside it.

```{r, eval=FALSE}
penguin_plot <- ggplot(data = penguins, 
                       aes(x = flipper_length_mm, 
                           y = body_mass_g)) +
  geom_point(aes(color=species)) 

penguin_plot + _________
```

# Separating by Facets

The graph above still looks quite messy. Maybe we want to separate out the three species into individual panels:

```{r}
ggplot(data=penguins, 
       aes(x = flipper_length_mm, 
           y = body_mass_g)) +
  geom_point(aes(color = species), 
             size = 2, 
             alpha = 0.8) +
  geom_smooth(method="lm", 
              aes(color=species, 
                  fill=species), 
              alpha=0.1) + 
  theme_classic() +
  
  facet_grid(~species) + # facet by species -- note the use of a formula!!
  labs(x = "Flipper length (mm)",
       y = "Body mass (g)", 
       title = "Penguins Data")
   
```

You can separate by more than one column. Note also the difference between "color" and "fill" in the example below:

```{r}
ggplot(data=penguins, 
       aes(x = flipper_length_mm, 
           y = body_mass_g, 
           group = species)) +
  geom_point(aes(color = species), 
             size = 2, 
             alpha = 0.8) +
  geom_smooth(method="lm", 
              aes(color=species),
              fill="black",
              alpha=0.15) + 
  facet_grid(~species + island) +
  labs(x = "Flipper length (mm)", 
       y = "Body mass (g)", 
       title = "Penguins Data") +
  theme_classic() 
```

## EXERCISE

Separate the plot above into facets by sex. Make sure to remove observations that have no sex specified.

```{r}

```

# Bonus 1: Saving your beautiful plots!

Suppose you created a bunch of plots and now you want to save them, you can use the [`ggsave()`](https://ggplot2.tidyverse.org/reference/ggsave.html) function:

```{r}
# save the plot object as a variable
p <-
  ggplot(data = penguins, 
         aes(x = flipper_length_mm, 
             y = body_mass_g,
             group = species)) +
  geom_point(aes(color = species), 
             size = 2, 
             alpha = 0.8) +
  geom_smooth(method = "lm",
              aes(color = species, fill = species),
              alpha = 0.1) +
  facet_grid( ~ species) + # facet by species -- note the use of a formula!!
  labs(x = "Flipper length (mm)", 
       y = "Body mass (g)", 
       title = "Penguins Data") +
  theme_classic()

ggsave("my_beautiful_plot.pdf", plot=p, height = 4, width = 6, device="pdf", useDingbats=F) # setting useDingbats to FALSE lets you edit plot fonts in Illustrator
```

![](https://miro.medium.com/v2/resize:fit:2000/format:webp/1*mcLnnVdHNg-ikDbHJfHDNA.png){width="665"}

# Bonus 2: Arranging multiple plots

If you are preparing several different plots for publication, perhaps you want to arrange them in a grid before exporting and saving them.

Let's create another plot first:

```{r}
# We already have a plot p
p

# create a new plot to display the sample size for each species
p_count <- ggplot(penguins, aes(species)) + 
  geom_bar(aes(fill=species)) + # this is coloring the bars by species
  theme_classic() +
  scale_color_brewer(palette = "Accent") +
  labs(title="Count of penguin species")

p_count
```

Now let's put the plots together using the packages `{grid}` and `{gridExtra}`:

```{r}
# arrange plot p and plot_new
library(gridExtra)
library(grid)

grid.arrange(p_count, p, ncol=2)

grid.arrange(p_count, p, nrow=2)
```

You can create very [complicated layouts](https://cran.r-project.org/web/packages/gridExtra/vignettes/arrangeGrob.html) if needed. You can also use the `{patchwork}` package to same effect.

# PRACTICE EXERCISES

Several of these exercises will require you to Google or look up the documentation, especially for custom theme inputs.

Load in the data set below:

```{r}
chic <- readr::read_csv(file="data/chicago_weather_data.csv")
```

```{r}
dim(chic)
str(chic)
View(chic)
```

### EXERCISE 1

Generate a scatter plot of temperature vs date and add color the points `firebrick`

```{r}

```

### EXERCISE 2

Now add a title to the plot and appropriate axis labels.

```{r}

```

### EXERCISE 3

Make the main title bold and grey in color

```{r}

```

### EXERCISE 4

Get rid of axis ticks. Hint: look up the `theme()` and `axis.ticks.y`

```{r}

```

### EXERCISE 5

Limit the range of the axis. Hint: See `ylim()`, `scale_x_continuous()` or `coord_cartesian()`

```{r}

```

### EXERCISE 6

Limit the range of the x-axis from 1995 to 2005. Hint: See `ylim()`, `scale_x_continuous()` or `coord_cartesian()`

```{r}

```

### EXERCISE 7

Color the plot based on season

```{r}

```

### EXERCISE 8

Turn off the legend title. See `legend.title` parameter in `theme()`

```{r}

```

### EXERCISE 9

Change the title of the legend to be more informative

```{r}

```

### EXERCISE 10

Set the theme to make the graph more clean in appearance

```{r}

```

### EXERCISE 11

Create a single row of 4 plots showing data from each year separately. Hint: use facets

```{r}

```

### EXERCISE 12

Make the above plot so that you have a 2x2 grid of plots instead of a row

```{r}

```

### EXERCISE 13

Fit a smooth line to each facet plot. Change the color of the line to `black` and set the fill option to be translucent with `alpha`

```{r}

```

### EXERCISE 14

Set the parameter `scales` to "free" inside the `facet_wrap()` function. What do you notice?

```{r}

```

### EXERCISE 15

Create a grid of plots using 2 variables `year` and `season`

```{r}

```

### EXERCISE 16

Create 2 unrelated plots of your choice from the data. And then arrange them together as you might for a publication.

```{r}

```

### EXERCISE 17

Plot temperature vs date, color by season, and then choose a color palette instead of the default colors

```{r}

```

### EXERCISE 18

Create a boxplot of o3 values by season. Then flip the coordinates for the boxplot

```{r}

```

### EXERCISE 19

Create a violinplot similar to above, that is not trimmed at the ends. Color each season differently.

```{r}

```

### EXERCISE 20

Add the mean and standard deviation for each season to the plot above

```{r}

```

### EXERCISE 21

Generate a scatter plot of temperature vs date and add a trend line to the plot

```{r}

```

### EXERCISE 22

Increase or decrease the smoothness of the trend line

```{r}

```

### EXERCISE 23

Create a density plot of `o3` and color the density lines by factor season. Fill in the density plots also colored by factor season.

```{r}

```

### EXERCISE 24

Add a summary statistic mean for each season to the plot above

```{r}

```

### EXERCISE 25

Save your plot from above with appropriate dimensions

```{r}

```