Skip to content

Evaluation metrics task force

csudre edited this page Aug 11, 2020 · 1 revision

Brainstorming regarding use of metrics for purposes of evaluation and benchmarking in MONAI

Identification of metrics definitions, assessment of limitations and conditions of use for

  • Classification
  • Binary / Probabilistic / Multi Label segmentation
  • Regression / Generative tasks
  • Object detection

Requirements for evaluation suite

Still in progress Definition of what to do in edge cases (no definition of the metric) nan values Specific tasks with particular metrics Tractography Vessel segmentation Metrics for assessment of distribution / evaluation of uncertainty

Transformation of described requirements in issues

Generic

  • Allow the evaluation suite to take in single pair of ref/seg images or folder of matching pairs (by subject name) - Using np array in memory for the different images - (i.e. not forcing everything to be in folders) - Develop util functions to allow folder / file loading to memory - Computation should be on CPU - ensure torch tensors are converted back to numpy arrays.
  • Link of classical metrics to their trainable counterpart (GPU based if possible with possible backpropagation.
  • Allow for binary or probabilistic input
  • For segmentation - provide results at different thresholds (potentially predefined by user)
  • Allow for multi label input

Output of evaluation

  • Produce a report csv file for the evaluation with aggregate statistics over the different metrics - use pandas DataFrame to gather all results (save to csv/xls depending on evaluation (multi label / mono - label / probability thresholds…) Specify the output format as option - suggest one according to task
  • Csv/xlsx/html/ for individual subject
  • Potentially html for aggregation building on challengeR (going towards WebToolkit) - to discuss with dev team on best way to integrate

Task focus 1 - Segmentation

  • Implement dice score metrics allowing for multiple options when metrics is not defined
    • Add optional epsilon to handle nans if needed (both on numerator and denominator)
    • Optional function to handle nans in aggregation
    • Implement nan-handling functions
    • For all metrics - 2 outputs - nan_handled / not nan_handled - To discuss further
  • Implement Hausdorff distance using percentile as argument
  • Implement binary based confusion matrix metrics
  • Report on raw data from confusion matrix
  • Implement GDSC
  • Implement Surface dice
  • Implement Average surface distance.
Clone this wiki locally