Method | Pre-train Data & Checkpoint | Extra Label | Backbone | #Frame x Sample Rate | Script & Log & Checkpoint | mAP |
---|---|---|---|---|---|---|
VideoMAE | Kinetics-400 checkpoint | ✗ | ViT-S | 16x4 | script/log/checkpoint | 22.5 |
VideoMAE | Kinetics-400 checkpoint | ✓ | ViT-S | 16x4 | script/log/checkpoint | 28.4 |
VideoMAE | Kinetics-400 checkpoint | ✗ | ViT-B | 16x4 | script/log/checkpoint | 26.7 |
VideoMAE | Kinetics-400 checkpoint | ✓ | ViT-B | 16x4 | script/log/checkpoint | 31.8 |
VideoMAE | Kinetics-400 checkpoint | ✗ | ViT-L | 16x4 | script | 34.3 |
VideoMAE | Kinetics-400 checkpoint | ✓ | ViT-L | 16x4 | script | 37.0 |
VideoMAE | Kinetics-400 checkpoint | ✗ | ViT-H | 16x4 | script | 36.5 |
VideoMAE | Kinetics-400 checkpoint | ✓ | ViT-H | 16x4 | script | 39.5 |
- Extra Label ✗ means only unlabelled data is used during the pre-training phase.