forked from poldrack/psych10-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path15b-ComparingMeansInR.Rmd
355 lines (254 loc) · 13.2 KB
/
15b-ComparingMeansInR.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
---
output:
pdf_document: default
bookdown::gitbook:
lib_dir: "book_assets"
includes:
in_header: google_analytics.html
html_document: default
---
# Comparing means in R
```{r echo=FALSE,warning=FALSE,message=FALSE}
library(tidyverse)
library(ggplot2)
library(tidyr)
library(fivethirtyeight)
library(BayesFactor)
library(lme4)
library(lmerTest)
library(cowplot)
library(knitr)
library(pwr)
library(emmeans)
library(ggfortify)
set.seed(123456) # set random seed to exactly replicate results
# load the NHANES data library
library(NHANES)
# drop duplicated IDs within the NHANES dataset
NHANES <-
NHANES %>%
dplyr::distinct(ID,.keep_all=TRUE)
NHANES_adult <-
NHANES %>%
subset(Age>=18) %>%
drop_na(BMI)
```
## Testing the value of a single mean (Section \@ref(single-mean))
In this example, we will show multiple ways to test a hypothesis about the value of a single mean. As an example, let's test whether the mean systolic blood pressure (BP) in the NHANES dataset (averaged over the three measurements that were taken for each person) is greater than 120 mm, which is the standard value for normal systolic BP.
First let's perform a power analysis to see how large our sample would need to be in order to detect a small difference (Cohen's d = .2).
```{r}
pwr.result <- pwr.t.test(d=0.2, power=0.8,
type='one.sample',
alternative='greater')
pwr.result
```
Based on this, we take a sample of 156 individuals from the dataset.
```{r}
NHANES_BP_sample <- NHANES_adult %>%
drop_na(BPSysAve) %>%
dplyr::select(BPSysAve) %>%
sample_n(pwr.result$n)
print('Mean BP:')
meanBP <- NHANES_BP_sample %>%
summarize(meanBP=mean(BPSysAve)) %>%
pull()
meanBP
```
First let's perform a sign test to see whether the observed mean of `r I(sprintf('%.2f', meanBP))` is significantly different from zero. To do this, we count the number of values that are greater than the hypothesized mean, and then use a binomial test to ask how surprising that number is if the true proportion is 0.5 (as it would be if the distribution were centered at the hypothesized mean).
```{r}
NHANES_BP_sample <- NHANES_BP_sample %>%
mutate(BPover120=BPSysAve>120)
nOver120 <- NHANES_BP_sample %>%
summarize(nOver120=sum(BPover120)) %>%
pull()
binom.test(nOver120, nrow(NHANES_BP_sample), alternative='greater')
```
This shows no significant difference. Next let's perform a one-sample t-test:
```{r}
t.test(NHANES_BP_sample$BPSysAve, mu=120, alternative='greater')
```
Here we see that the difference is not statistically signficant. Finally, we can perform a randomization test to test the hypothesis. Under the null hypothesis we would expect roughly half of the differences from the expected mean to be positive and half to be negative (assuming the distribution is centered around the mean), so we can cause the null hypothesis to be true on average by randomly flipping the signs of the differences.
```{r}
nruns = 5000
# create a function to compute the
# t on the shuffled values
shuffleOneSample <- function(x,mu) {
# randomly flip signs
flip <- runif(length(x))>0.5
diff <- x - mu
diff[flip]=-1*diff[flip]
# compute and return correlation
# with shuffled variable
return(tibble(meanDiff=mean(diff)))
}
index_df <- tibble(id=seq(nruns)) %>%
group_by(id)
shuffle_results <- index_df %>%
do(shuffleOneSample(NHANES_BP_sample$BPSysAve,120))
observed_diff <- mean(NHANES_BP_sample$BPSysAve-120)
p_shuffle <- mean(shuffle_results$meanDiff>observed_diff)
p_shuffle
```
This gives us a very similar p value to the one observed with the standard t-test.
We might also want to quantify the degree of evidence in favor of the null hypothesis, which we can do using the Bayes Factor:
```{r}
ttestBF(NHANES_BP_sample$BPSysAve,
mu=120,
nullInterval = c(-Inf, 0))
```
This tells us that our result doesn't provide particularly strong evidence for either the null or alternative hypothesis; that is, it's inconclusive.
## Comparing two means (Section \@ref(comparing-two-means))
To compare two means from independent samples, we can use the two-sample t-test. Let's say that we want to compare blood pressure of smokers and non-smokers; we don't have an expectation for the direction, so we will use a two-sided test. First let's perform a power analysis, again for a small effect:
```{r}
power_results_2sample <- pwr.t.test(d=0.2, power=0.8,
type='two.sample'
)
power_results_2sample
```
This tells us that we need 394 subjects in each group, so let's sample 394 smokers and 394 nonsmokers from the NHANES dataset, and then put them into a single data frame with a variable denoting their smoking status.
```{r}
nonsmoker_df <- NHANES_adult %>%
dplyr::filter(SmokeNow=="Yes") %>%
drop_na(BPSysAve) %>%
dplyr::select(BPSysAve,SmokeNow) %>%
sample_n(power_results_2sample$n)
smoker_df <- NHANES_adult %>%
dplyr::filter(SmokeNow=="No") %>%
drop_na(BPSysAve) %>%
dplyr::select(BPSysAve,SmokeNow) %>%
sample_n(power_results_2sample$n)
sample_df <- smoker_df %>%
bind_rows(nonsmoker_df)
```
Let's test our hypothesis using a standard two-sample t-test. We can use the formula notation to specify the analysis, just like we would for `lm()`.
```{r}
t.test(BPSysAve ~ SmokeNow, data=sample_df)
```
This shows us that there is a significant difference, though the direction is surprising: Smokers have *lower* blood pressure!
Let's look at the Bayes factor to quantify the evidence:
```{r}
sample_df <- sample_df %>%
mutate(SmokeNowInt=as.integer(SmokeNow))
ttestBF(formula=BPSysAve ~ SmokeNowInt,
data=sample_df)
```
This shows that there is very strong evidence against the null hypothesis of no difference.
## The t-test as a linear model (Section \@ref(ttest-linear-model))
We can also use `lm()` to implement these t-tests.
The one-sample t-test is basically a test for whether the intercept is different from zero, so we use a model with only an intercept and apply this to the data after subtracting the null hypothesis mean (so that the expectation under the null hypothesis is an intercept of zero):
```{r}
NHANES_BP_sample <- NHANES_BP_sample %>%
mutate(BPSysAveDiff = BPSysAve-120)
lm_result <- lm(BPSysAveDiff ~ 1, data=NHANES_BP_sample)
summary(lm_result)
```
You will notice that this p-value is twice as big as the one obtained from the one-sample t-test above; this is because that was a one-tailed test, while `lm()` is performing a two-tailed test.
We can also run the two-sample t-test using `lm()`:
```{r}
lm_ttest_result <- lm(BPSysAve ~ SmokeNow, data=sample_df)
summary(lm_ttest_result)
```
This gives the same p-value for the SmokeNowYes variable as it did for the two-sample t-test above.
## Comparing paired observations (Section \@ref(paired-ttests))
Let's look at how to perform a paired t-test in R. In this case, let's generate some data for a set of individuals on two tests, where each indivdual varies in their overall ability, but there is also a practice effect such that performance on the second test is generally better than the first.
First, let's see how big of a sample we will require to find a medium (d=0.5) sized effect. Let's say that we want to be extra sure in our results, so we will find the sample size that gives us 95% power to find an effect if it's there:
```{r}
paired_power <- pwr.t.test(d=0.5, power=0.95, type='paired', alternative='greater')
paired_power
```
Now let's generate a dataset with the required number of subjects:
```{r}
subject_id <- seq(paired_power$n)
# we code the tests as 0/1 so that we can simply
# multiply this by the effect to generate the data
test_id <- c(0,1)
repeat_effect <- 5
noise_sd <- 5
subject_means <- rnorm(paired_power$n, mean=100, sd=15)
paired_data <- crossing(subject_id,test_id) %>%
mutate(subMean=subject_means[subject_id],
score=subject_means +
test_id*repeat_effect +
rnorm(paired_power$n, mean=noise_sd))
```
Let's perform a paired t-test on these data. To do that, we need to separate the first and second test data into separate variables, which we can do by converting our *long* data frame into a *wide* data frame.
```{r}
paired_data_wide <- paired_data %>%
spread(test_id, score) %>%
rename(test1=`0`,
test2=`1`)
glimpse(paired_data_wide)
```
Now we can pass those new variables into the `t.test()` function:
```{r}
paired_ttest_result <- t.test(paired_data_wide$test1,
paired_data_wide$test2,
type='paired')
paired_ttest_result
```
This analysis is a bit trickier to perform using the linear model, because we need to estimate a separate intercept for each subject in order to account for the overall differences between subjects. We can't do this using `lm()` but we can do it using a function called `lmer()` from the `lme4` package. To do this, we need to add `(1|subject_id)` to the formula, which tells `lmer()` to add a separate intercept ("1") for each value of `subject_id`.
```{r}
paired_test_lmer <- lmer(score ~ test_id + (1|subject_id),
data=paired_data)
summary(paired_test_lmer)
```
This gives a similar answer to the standard paired t-test. The advantage is that it's more flexible, allowing us to perform *repeated measures* analyses, as we will see below.
## Analysis of variance (Section \@ref(ANOVA))
Often we want to compare several different means, to determine whether any of them are different from the others. In this case, let's look at the data from NHANES to determine whether Marital Status is related to sleep quality. First we clean up the data:
```{r}
NHANES_sleep_marriage <-
NHANES_adult %>%
dplyr::select(SleepHrsNight, MaritalStatus, Age) %>%
drop_na()
```
In this case we are going to treat the full NHANES dataset as our sample, with the goal of generalizing to the entire US population (from which the NHANES dataset is mean to be a representative sample). First let's look at the distribution of the different values of the `MaritalStatus` variable:
```{r}
NHANES_sleep_marriage %>%
group_by(MaritalStatus) %>%
summarize(n=n()) %>%
kable()
```
There are reasonable numbers of most of these categories, but let's remove the `Separated` category since it has relatively few members:
```{r}
NHANES_sleep_marriage <-
NHANES_sleep_marriage %>%
dplyr::filter(MaritalStatus!="Separated")
```
Now let's use `lm()` to perform an analysis of variance. Since we also suspect that Age is related to the amount of sleep, we will also include Age in the model.
```{r}
lm_sleep_marriage <- lm(SleepHrsNight ~ MaritalStatus + Age,
data=NHANES_sleep_marriage)
summary(lm_sleep_marriage)
```
This tells us that there is a highly significant effect of marital status (based on the F test), though it accounts for a very small amount of variance (less than 1%).
It's also useful to look in more detail at which groups differ from which others, which we can do by examining the *estimated marginal means* for each group using the `emmeans()` function.
```{r}
# compute the differences between each of the means
leastsquare <- emmeans(lm_sleep_marriage,
pairwise ~ MaritalStatus,
adjust="tukey")
# display the results by grouping using letters
CLD(leastsquare$emmeans,
alpha=.05,
Letters=letters)
```
The letters in the `group` column tell us which individual conditions differ from which others; any pair of conditions that don't share a group identifier (in this case, the letters `a` and `b`) are significantly different from one another. In this case, we see that Divorced people sleep less than Married or Widowed individuals; no other pairs differ significantly.
### Repeated measures analysis of variance
The standard analysis of variance assumes that the observations are independent, which should be true for different people in the NHANES dataset, but may not be true if the data are based on repeated measures of the same individual. For example, the NHANES dataset involves three measurements of blood pressure for each individual. If we want to test whether there are any differences between those, then we would need to use a *repeated measures* analysis of variance. We can do this using `lmer()` as we did above. First, we need to create a "long" version of the dataset.
```{r}
NHANES_bp_all <- NHANES_adult %>%
drop_na(BPSys1,BPSys2,BPSys3) %>%
dplyr::select(BPSys1,BPSys2,BPSys3, ID) %>%
gather(test, BPsys, -ID)
```
Then we fit a model that includes a separate intercept for each individual.
```{r}
repeated_lmer <-lmer(BPsys ~ test + (1|ID), data=NHANES_bp_all)
summary(repeated_lmer)
```
This shows us that the second and third tests are significant different from the first test (which was automatically assigned as the baseline by `lmer()`). We might also want to know whether there is an overall effect of test. We can determine this by comparing the fit of our model to the fit of a model that does not include the test variable, which we will fit here. We then compare the models using the `anova()` function, which performs an F test to compare the two models.
```{r}
repeated_lmer_baseline <-lmer(BPsys ~ (1|ID), data=NHANES_bp_all)
anova(repeated_lmer,repeated_lmer_baseline)
```
This shows that blood pressure differs significantly across the three tests.