title | author | output | ||||
---|---|---|---|---|---|---|
Data Exploration and Predictive Modeling: Spontaneous Abortion Prediction |
Peace Maddox |
|
This notebook is based on this (Esophageal Cancer) project. These techniques are important for contextualizing data and creating predictions based on modeling and visualizations. The data set used for this project is from the (Induced abortion and secondary infertility) study.
-
Exploring the data set (infert) which comes in the "R" data sets package.
-
Here is a data usage example below:
require(stats)
model1 <- glm(case ~ spontaneous+induced, data = infert, family = binomial())
summary(model1)
## adjusted for other potential confounders:
summary(model2 <- glm(case ~ age+parity+education+spontaneous+induced,
data = infert, family = binomial()))
## Really should be analysed by conditional logistic regression
## which is in the survival package
if(require(survival)){
model3 <- clogit(case ~ spontaneous+induced+strata(stratum), data = infert)
print(summary(model3))
detach() # survival (conflicts)
}
-
Visualizing the relationship between spontaneous abortion case occurrence and age / education / induced abortions.
-
Identifying the groups at risk via useful analyzes and graphs.
-
Building a well-developed generalized linear model.
-
Predicting spontaneous abortion percentages among the groups.
-
Testing the robustness of the model via leave-one-out cross validation.
Refer to the pdf document for the full mardown!
Induced abortion and secondary infertility study
Practical advice on variable selection and reporting using Akaike information criterion
Common pitfalls in statistical analysis: Logistic regression
Cross-validation under separate sampling: strong bias and how to correct it{.uri}