Skip to content

Latest commit

 

History

History
53 lines (48 loc) · 2.71 KB

imbalance_binary_classification.md

File metadata and controls

53 lines (48 loc) · 2.71 KB

Imbalanced Binary Classification

  • Precision
    • TP / (TP + FP)
    • Means how good a model is at predicting the positive class.
    • Precision doesn't use TN.
    • Care about the correct prediction of positive.
    • For example, a precision of 0.33 can be understood as 33% correct predictions among the positive predictions.
  • Recall
    • TP / (TP + FN)
    • Means how good a model is at predicting the positive class when the actual outcome is positive.
    • Recall = Sensitivity = True positive rate
    • Recall doesn't use TN too.
    • Care about the correct prediction of positive too.
  • False positive rate
    • FP / (FP + TN)
    • Means how often a positive class is predicted when the actual outcome is negative.
  • True positive rate
    • TP / (TP + FN)
  • Meaning of precision and recall
    • They don't use the true negatives, only concerned with correctly predicting the positive minority class 1.
  • ROC curve
    • X-axis is False positive rate.
    • Y-axis is True positive rate.
    • Means the trade-off between specificity and sensitivity.
    • Left side of x-axis is lower FP and higher TN (Good)
    • Upper side of y-axis is higher TP and lower FN (Good)
    • AUC is the Area Under the ROC Curve.
  • Precision-Recall curve
    • X-axis is Recall.
    • Y-axis is Precision.
    • This should be used when there is a moderate to large class imbalance and a large skew in the class distribution.
    • Baseline of precision-recall curve is the proportion of positive class, P / (P + N).
    • PRCAUC is the area under the precision-recall curve.
  • Single-threshold measure
    • Scores that need to decide threshold (e.g. 0.5) to assign positive or negative to prediction before calculating scores
    • e.g. Confusion matrix, precision, recall,
  • Threshold-free measure
    • The model outputs scores or probabilities for considering positive and negative class, but not a static division.

SMOTE

  • Synthetic Minority Oversampling Technique
    • Oversample the minority class
    • It can balance the class distribution but doesn't provide any additional information on the model

Resource