-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Evaluation, Reproducibility, Benchmarks Meeting 26
AReinke edited this page Feb 29, 2024
·
1 revision
Date: 28th February 2024
- Presentation of results
- There is missing expertise on reporting variability in the community; this gap can be closed by implementation from MONAI
- Follow-up: Consensus paper on reporting (training and testing separately)
- TODO we will collect points in our brainstorming document
- Reporting of confidence intervals
- Hierarchical data structure
- Describe both variability (standard deviation; are there cases where I am really bad?) and confidence intervals (tell you how confident I am / how precise my mean value is)
- We should empirically investigate SD vs SEM
- Future work:
- Power analysis to derive necessary number of test set samples for given question
- How to validate Foundation Models (can probably be built upon challengeR)
- Lena and Annika will check Metrics Reloaded consortium for suitable candidates for this project
- Pitfalls in interpreting certain measures (mean, median, SD etc)
- Links between this work and Metrics Reloaded?