Ensure videos are allocated into all specified splits #169

ejm714 · 2021-12-11T03:40:51Z

This PR fixes the issue that caused the following error:

RuntimeError: Early stopping conditioned on metric val_macro_f1 which is not available. Pass in or modify your EarlyStopping callback to use any of the following: train_loss

The issue has to do with the split column. When we don't provide it in the labels, we use a random split. But we don't guarantee that we have all of the values from the split_proportions dict (e.g. train and val ) in the split column. With a small number of videos and the default proportions, all videos getting assigned to train (see in splits.csv). Since there are no val videos, there is no val_f1_score.

I considered implementing this without the random draw and instead calculating the number of samples that need to go into each split based on the proportions provided. However, this gets tricky to ensure based on rounding. I instead went with the while loop that tries different seeds until it finds a draw that works. If someone specifies split proportions of train: 100, val: 1, and only provides say 5 videos, there is risk of an infinite loop. I'm not sure how important it is to catch such a situation since 1) we don't expect people to be training on so few videos and 2) it feels hard to accidentally end up in such a situation.

Bonus fix: fix bug in ModelCheckpoint which lets monitor be None if there is no early stopping. mode must either be min or max. I set it to min to match PTL's default but it doesn't get used if monitor is None.

Plus cleanup to set defaults in DummyTrainConfig to avoid repeated code.

github-actions · 2021-12-11T03:44:36Z

🚀 Deployed on https://deploy-preview-169--silly-keller-664934.netlify.app

codecov · 2021-12-11T04:21:01Z

Codecov Report

Merging #169 (ad38de6) into master (62797d9) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@          Coverage Diff           @@
##           master    #169   +/-   ##
======================================
  Coverage    85.0%   85.0%           
======================================
  Files          30      30           
  Lines        1843    1851    +8     
======================================
+ Hits         1567    1575    +8     
  Misses        276     276

Impacted Files	Coverage Δ
zamba/models/model_manager.py	`84.1% <ø> (ø)`
zamba/models/config.py	`97.1% <100.0%> (+<0.1%)`	⬆️

zamba/models/config.py

ejm714 · 2021-12-14T00:10:31Z

@pjbull ready for another look. this now has the added benefit of doing split allocation within species which is something we had desired but hadn't yet implemented. this should help ensure better training results (in addition to fixing the bug)

ejm714 · 2021-12-14T02:02:47Z

@pjbull ready for reals this time!

pjbull · 2021-12-14T02:49:27Z

zamba/models/config.py

+                        list(values["split_proportions"].keys()),
+                        weights=list(values["split_proportions"].values()),
+                        k=len(species_df) - len(expected_splits),
+                    )


This has one weird edge case that is likely rare, so just worth filing an issue for:

v0.mp4, antelope v1.mp4, antelope # v1 assigned test by antelope grouping v1.mp4, cow # subsequently v1 assigned train by cow grouping v2.mp4, antelope v4.mp4, cow v5.mp4, cow # test set is now missing antelope

ejm714 added 2 commits December 11, 2021 03:22

draw until we have all expected splits

8092968

add test

cf53a3a

ejm714 requested a review from pjbull December 11, 2021 03:40

ejm714 added 3 commits December 11, 2021 04:08

fix for no early stopping

9f26f73

test for no early stopping plus cleanup

b5c5eb1

fix linting

76de8b7

pjbull requested changes Dec 13, 2021

View reviewed changes

zamba/models/config.py Outdated Show resolved Hide resolved

pjbull reviewed Dec 13, 2021

View reviewed changes

zamba/models/config.py Outdated Show resolved Hide resolved

ejm714 added 2 commits December 13, 2021 15:23

rename

dfd8f4f

code review

38dc8e9

ejm714 added 3 commits December 13, 2021 16:55

fix failing test; proportions change since splitting within species

ad359e7

set a minimum for three examples per species

9ab1f9d

error if not enough species, seed and then randomly allocate

ad38de6

pjbull reviewed Dec 14, 2021

View reviewed changes

pjbull mentioned this pull request Dec 14, 2021

Automated generation of splits for training can put rare species in wrong group #171

Open

pjbull approved these changes Dec 14, 2021

View reviewed changes

pjbull merged commit 9e2dad5 into master Dec 14, 2021

pjbull deleted the ensure-splits branch December 14, 2021 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure videos are allocated into all specified splits #169

Ensure videos are allocated into all specified splits #169

ejm714 commented Dec 11, 2021 •

edited

Loading

github-actions bot commented Dec 11, 2021 •

edited

Loading

codecov bot commented Dec 11, 2021 •

edited

Loading

ejm714 commented Dec 14, 2021

ejm714 commented Dec 14, 2021

pjbull Dec 14, 2021

Ensure videos are allocated into all specified splits #169

Ensure videos are allocated into all specified splits #169

Conversation

ejm714 commented Dec 11, 2021 • edited Loading

github-actions bot commented Dec 11, 2021 • edited Loading

codecov bot commented Dec 11, 2021 • edited Loading

Codecov Report

ejm714 commented Dec 14, 2021

ejm714 commented Dec 14, 2021

pjbull Dec 14, 2021

Choose a reason for hiding this comment

ejm714 commented Dec 11, 2021 •

edited

Loading

github-actions bot commented Dec 11, 2021 •

edited

Loading

codecov bot commented Dec 11, 2021 •

edited

Loading