Evaluation, Reproducibility, Benchmarks Meeting 26

Minutes of meeting 26

Date: 28th February 2024

Presentation of results
There is missing expertise on reporting variability in the community; this gap can be closed by implementation from MONAI
Follow-up: Consensus paper on reporting (training and testing separately)
- TODO we will collect points in our brainstorming document
- Reporting of confidence intervals
- Hierarchical data structure
- Describe both variability (standard deviation; are there cases where I am really bad?) and confidence intervals (tell you how confident I am / how precise my mean value is)
We should empirically investigate SD vs SEM
Future work:
- Power analysis to derive necessary number of test set samples for given question
- How to validate Foundation Models (can probably be built upon challengeR)
Lena and Annika will check Metrics Reloaded consortium for suitable candidates for this project
Pitfalls in interpreting certain measures (mean, median, SD etc)
Links between this work and Metrics Reloaded?

Copyright (c) MONAI Consortium