forked from poldrack/psych10-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path16-PracticalExamples.Rmd
262 lines (191 loc) · 12.7 KB
/
16-PracticalExamples.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
---
output:
bookdown::gitbook:
lib_dir: "book_assets"
includes:
in_header: google_analytics.html
pdf_document: default
html_document: default
---
# Practical statistical modeling {#practical-example}
```{r echo=FALSE,warning=FALSE,message=FALSE}
library(tidyverse)
library(ggplot2)
library(BayesFactor)
library(emmeans)
library(brms)
library(cowplot)
library(knitr)
library(ggfortify)
set.seed(123456) # set random seed to exactly replicate results
# load the NHANES data library
library(NHANES)
# drop duplicated IDs within the NHANES dataset
NHANES <-
NHANES %>%
dplyr::distinct(ID,.keep_all=TRUE)
NHANES_adult <-
NHANES %>%
subset(Age>=18)
```
In this chapter we will bring together everything that we have learned to apply our knowledge to a practical example.
## The process of statistical modeling
There is a set of steps that we generally go through when we want to use our statistical model to test a scientific hypothesis:
1. Specify your question of interest
2. Identify or collect the appropriate data
3. Prepare the data for analysis
4. Determine the appropriate model
5. Fit the model to the data
6. Criticize the model to make sure it fits properly
7. Test hypothesis and quantify effect size
Let's look at a real example. In 2007, Christopher Gardner and colleagues from Stanford published a study in the *Journal of the American Medical Association* titled "Comparison of the Atkins, Zone, Ornish, and LEARN Diets for Change in Weight and Related Risk Factors Among Overweight Premenopausal Women
The A TO Z Weight Loss Study: A Randomized Trial" [@gard:kiaz:alha:2007].
### 1: Specify your question of interest
According to the authors, the goal of their study was:
> To compare 4 weight-loss diets representing a spectrum of low to high carbohydrate intake for effects on weight loss and related metabolic variables.
### 2: Identify or collect the appropriate data
To answer their question, the investigators randomly assigned each of 311 overweight/obese women to one of four different diets (Atkins, Zone, Ornish, or LEARN), and measured their weight and other measures of health over time.
The authors recorded a large number of variables, but for the main question of interest let's focus on a single variable: Body Mass Index (BMI). Further, since our goal is to measure lasting changes in BMI, we will only look at the measurement taken at 12 months after onset of the diet.
### 3: Prepare the data for analysis
```{r echo=FALSE}
# generate a dataset based on the results of Gardner et al. Table 3
set.seed(123456)
dietDf <-
data.frame(diet=c(rep('Atkins',77),
rep('Zone',79),
rep('LEARN',79),
rep('Ornish',76))) %>%
mutate(
BMIChange12Months=ifelse(diet=='Atkins',
rnorm(n=77,mean=-1.65,sd=2.54),
ifelse(diet=='Zone',
rnorm(n=79,mean=-0.53,sd=2.0),
ifelse(diet=='LEARN',
rnorm(n=79,mean=-0.92,sd=2.0),
rnorm(n=76,mean=-0.77,sd=2.14)))),
physicalActivity=ifelse(diet=='Atkins',
rnorm(n=77,mean=34,sd=6),
ifelse(diet=='Zone',
rnorm(n=79,mean=34,sd=6.0),
ifelse(diet=='LEARN',
rnorm(n=79,mean=34,sd=5.0),
rnorm(n=76,mean=35,sd=7) )))
)
summaryDf <-
dietDf %>%
group_by(diet) %>%
summarize(
n=n(),
meanBMIChange12Months=mean(BMIChange12Months),
varBMIChange12Months=var(BMIChange12Months)
) %>%
mutate(
crit_val_lower = qt(.05, n - 1),
crit_val_upper = qt(.95, n - 1),
ci.lower=meanBMIChange12Months+(sqrt(varBMIChange12Months)*crit_val_lower)/sqrt(n),
ci.upper=meanBMIChange12Months+(sqrt(varBMIChange12Months)*crit_val_upper)/sqrt(n)
)
tableDf <- summaryDf %>%
dplyr::select(-crit_val_lower,
-crit_val_upper,
-varBMIChange12Months) %>%
rename(Diet = diet,
N = n,
`Mean BMI change (12 months)`=meanBMIChange12Months,
`CI (lower limit)`=ci.lower,
`CI (upper limit)`=ci.upper)
```
```{r AtoZBMIChangeDensity,echo=FALSE,fig.cap="Violin plots for each condition, with the 50th percentile (i.e the median) shown as a black line for each group.", fig.width=4, fig.height=4, out.width="50%"}
ggplot(dietDf,aes(diet,BMIChange12Months)) +
geom_violin(draw_quantiles=.5)
```
The actual data from the A to Z study are not publicly available, so we will use the summary data reported in their paper to generate some synthetic data that roughly match the data obtained in their study. Once we have the data, we can visualize them to make sure that there are no outliers. Violin plots are useful to see the shape of the distributions, as shown in Figure \@ref(fig:AtoZBMIChangeDensity). Those data look fairly reasonable - in particular, there don't seem to be any serious outliers. However, we can see that the distributions seem to differ a bit in their variance, with Atkins and Ornish showing greater variability than the others.
This means that any analyses that assume the variances are equal across groups might be inappropriate. Fortunately, the ANOVA model that we plan to use is fairly robust to this.
### 4. Determine the appropriate model
There are several questions that we need to ask in order to determine the appropriate statistical model for our analysis.
* What kind of dependent variable?
* BMI : continuous, roughly normally distributed
* What are we comparing?
* mean BMI across four diet groups
* ANOVA is appropriate
* Are observations independent?
* Random assignment and use of difference scores should ensure that the assumption of independence is appropriate
### 5. Fit the model to the data
Let's run an ANOVA on BMI change to compare it across the four diets. It turns out that we don't actually need to generate the dummy-coded variables ourselves; if we pass `lm()` a categorical variable, it will automatically generate them for us.
```{r echo=FALSE}
# perform ANOVA and print result
lmResult <- lm(BMIChange12Months ~ diet, data = dietDf)
summary(lmResult)
```
Note that lm automatically generated dummy variables that correspond to three of the four diets, leaving the Atkins diet without a dummy variable. This means that the intercept models the Atkins diet, and the other three variables model the difference between each of those diets and the Atkins diet. By default, ```lm()``` treats the first value (in alphabetical order) as the baseline.
### 6. Criticize the model to make sure it fits properly
```{r diagnosticsPlot, echo=FALSE, fig.width=8, fig.height=4}
autoplot(lmResult,which=1:2)
```
The first thing we want to do is to critique the model to make sure that it is appropriate. One thing we can do is to look at the residuals from the model. In the left panel of Figure \@ref(fig:diagnosticsPlot), we plot the residuals for each individual grouped by diet, which are positioned by the mean for each diet.
There are no obvious differences in the residuals across conditions, although there are a couple of datapoints (#34 and #304) that seem to be slight outliers.
Another important assumption of the statistical tests that we apply to linear models is that the residuals from the model are normally distributed. The right panel of Figure \@ref(fig:diagnosticsPlot) shows a Q-Q (quantile-quantile) plot, which plots the residuals against their expected values based on their quantiles in the normal distribution. If the residuals are normally distributed then the data points should fall along the dashed line --- in this case it looks pretty good, except for those two outliers that are once again apparent here.
### 7. Test hypothesis and quantify effect size
First let's look back at the summary of results from the ANOVA, shown in Step 5 above. The significant F test shows us that there is a significant difference between diets, but we should also note that the model doesn't actually account for much variance in the data; the R-squared value is only 0.03, showing that the model is only accounting for a few percent of the variance in weight loss. Thus, we would not want to overinterpret this result.
The significant result also doesn't tell us which diets differ from which others. We can find out more by comparing means across conditions using the ```emmeans()``` ("estimated marginal means") function:
```{r echo=FALSE}
# compute the differences between each of the means
leastsquare <- emmeans(lmResult,
pairwise ~ diet,
adjust="tukey")
# display the results by grouping using letters
CLD(leastsquare$emmeans,
alpha=.05,
Letters=letters)
```
The letters in the rightmost column show us which of the groups differ from one another, using a method that adjusts for the number of comparisons being performed. This shows that Atkins and LEARN diets don't differ from one another (since they share the letter a), and the LEARN, Ornish, and Zone diets don't differ from one another (since they share the letter b), but the Atkins diet differs from the Ornish and Zone diets (since they share no letters).
#### Bayes factor
Let's say that we want to have a better way to describe the amount of evidence provided by the data. One way we can do this is to compute a Bayes factor, which we can do by fitting the full model (including diet) and the reduced model (without diet) and then comparing their fit. For the reduced model, we just include a 1, which tells the fitting program to only fit an intercept. Note that this will take a few minutes to run.
```{r echo=FALSE, results='hide',message=FALSE}
brmFullModel <- brm(BMIChange12Months ~ diet, data = dietDf,
save_all_pars = TRUE)
brmReducedModel <- brm(BMIChange12Months ~ 1, data = dietDf,
save_all_pars = TRUE)
```
```{r echo=FALSE, results='hide',message=FALSE}
bayes_factor(brmFullModel,brmReducedModel)
```
This shows us that there is very strong evidence (Bayes factor of nearly 100) for differences between the diets.
### What about possible confounds?
If we look more closely at the Garder paper, we will see that they also report statistics on how many individuals in each group had been diagnosed with *metabolic syndrome*, which is a syndrome characterized by high blood pressure, high blood glucose, excess body fat around the waist, and abnormal cholesterol levels and is associated with increased risk for cardiovascular problems. Let's first add those data into the summary data frame:
```{r echo=FALSE}
summaryDf <-
summaryDf %>%
mutate(
nMetSym=c(22,20,29,27),
nNoMetSym=n-nMetSym,
pMetSym=nMetSym/(nMetSym+nNoMetSym)
)
displayDf <- summaryDf %>%
dplyr::select(diet,n,pMetSym) %>%
rename(`P(metabolic syndrome)`=pMetSym,
N=n,
Diet=diet)
kable(displayDf, caption="Presence of metabolic syndrome in each group in the AtoZ study.")
```
Looking at the data it seems that the rates are slightly different across groups, with more metabolic syndrome cases in the Ornish and Zone diets -- which were exactly the diets with poorer outcomes. Let's say that we are interested in testing whether the rate of metabolic syndrome was significantly different between the groups, since this might make us concerned that these differences could have affected the results of the diet outcomes.
#### Determine the appropriate model
* What kind of dependent variable?
* proportions
* What are we comparing?
* proportion with metabolic syndrome across four diet groups
* chi-squared test for goodness of fit is appropriate against null hypothesis of no difference
Let's compute that statistic using the ```chisq.test()``` function. Here we will use the `simulate.p.value` option, which will help deal with the relatively small
```{r echo=FALSE}
contTable <- as.matrix(summaryDf[,9:10])
chisq.test(contTable)
```
This test shows that there is not a significant difference between means. However, it doesn't tell us how certain we are that there is no difference; remember that under NHST, we are always working under the assumption that the null is true unless the data show us enough evidence to cause us to reject this null hypothesis.
What if we want to quantify the evidence for or against the null? We can do this using the Bayes factor.
```{r echo=FALSE}
bf <- contingencyTableBF(contTable,
sampleType = "indepMulti",
fixedMargin = "cols")
bf
```
This shows us that the alternative hypothesis is 0.058 times more likely than the null hypothesis, which means that the null hypothesis is 1/0.058 ~ 17 times more likely than the alternative hypothesis given these data. This is fairly strong, if not completely overwhelming, evidence.