-
Notifications
You must be signed in to change notification settings - Fork 377
/
10_logistic_regression_exercise.py
63 lines (40 loc) · 1.42 KB
/
10_logistic_regression_exercise.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
'''
L O G I S T I C R E G R E S S I O N
Adapted From example given in Chapter 4 of
Introduction to Statistical Learning
Data: Default Data Set
'''
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
from sklearn.cross_validation import train_test_split
'''
QUIZ: UNDERSTANDING THE BASIC SHAPE
'''
'''
PART I - Exploration
'''
# 1 - Read in Default.csv and convert all data to numeric
# Convert everything to numeric before splitting
# 2 - Split the data into train and test sets
# Can convert arrays back into dataframes if desired, for convenience later on
# 3 - Create a histogram of all variables
# 4 - Create a scatter plot of the income vs. balance
# 5 - Mark defaults with a different color and symbol
# 6 - What can you infer from this plot?
'''
PART II - LOGISTIC REGRESSION
'''
# 1 - Run a logistic regression on the balance variable
# 2 - Is the beta value associated with balance significant?
# 3 - Predict the probability of default for someone with a balance of $1.2k and $2k
# 4 - Plot the fitted logistic function overtop of the data points
# 5 - Create predictions using the test set
# 6 - Compute the overall accuracy, the sensitivity and specificity
# Accuracy
# How many were classified correctly?
# Specificity
# For those who didn't default, how many did it predict correctly?
# Sensitivity
# For those who did default, how many did it predict correctly?