This is the source codes of our paper. We provide zone evaluation on MMDetection v2.25.3, YOLOv5, and YOLOv8.
Our paper is accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2024).
Here is a detailed step-by-step tutorial.
@article{zheng2024ZoneEval,
title={Zone Evaluation: Revealing Spatial Bias in Object Detection},
author= {Zheng, Zhaohui and Chen, Yuming and Hou, Qibin and Li, Xiang and Wang, Ping and Cheng, Ming-Ming},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2024},
doi={10.1109/TPAMI.2024.3409416}
}
A fundamental limitation of object detectors is that they suffer from ``spatial bias'', and in particular perform less satisfactorily when detecting objects near image borders. For a long time, there has been a lack of effective ways to measure and identify spatial bias, and little is known about where it comes from and what degree it is. To this end, we present a new zone evaluation protocol, extending from the traditional evaluation to a more generalized one, which measures the detection performance over zones, yielding a series of Zone Precisions (ZPs). For the first time, we provide numerical results, showing that the object detectors perform quite unevenly across the zones. Surprisingly, the detector's performance in the 96% border zone of the image does not reach the AP value (Average Precision, commonly regarded as the average detection performance in the entire image zone). To better understand spatial bias, a series of heuristic experiments are conducted. Our investigation excludes two intuitive conjectures about spatial bias that the object scale and the absolute positions of objects barely influence the spatial bias. We find that the key lies in the human-imperceptible divergence in data patterns between objects in different zones, thus eventually forming a visible performance gap between the zones. With these findings, we finally discuss a future direction for object detection, namely, spatial disequilibrium problem, aiming at pursuing a balanced detection ability over the entire image zone. By broadly evaluating 10 popular object detectors and 5 detection datasets, we shed light on the spatial bias of object detectors. We hope this work could raise a focus on detection robustness.
conda create --name ZoneEval python=3.8 -y
conda activate ZoneEval
conda install pytorch=1.12 cudatoolkit=11.3 torchvision=0.13.0 -c pytorch
pip install mmcv-full==1.6.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12.0/index.html
git clone https://github.com/Zzh-tju/ZoneEval.git
cd ZoneEval/pycocotools
pip install -e .
(If you encounter the compile error, try `pip install cython==0.29.36`)
cd ..
cd mmdetection
pip install -v -e .
Please refer to Dataset Preparations for preparing PASCAL VOC 07+12, Face Mask, Fruit, Helmet, and MS COCO datasets.
The relevant options can be specified on the config file,
model = dict(
test_cfg=dict(zone_eval=True)) # set to False and evaluate in the conventional way.
# for VOC and 3 application datasets
./tools/dist_test.sh configs/sela/your_config_file.py your_model.pth 2 --eval mAP
# for MS COCO
./tools/dist_test.sh configs/sela/your_config_file.py your_model.pth 2 --eval bbox
Currently, we provide evaluation for various object detectors, and the pretrained weight file can be downloaded from MMDetection or their official websites.
Detector | Network & TS | Var | FPS | ||||||
---|---|---|---|---|---|---|---|---|---|
RetinaNet | R50_1x | 36.5 | 14.8 | 27.3 | 33.3 | 35.5 | 34.5 | 39.2 | 35.4 |
RetinaNet | R50_2x | 37.4 | 16.9 | 27.6 | 34.6 | 35.8 | 35.1 | 40.4 | 35.4 |
Faster R-CNN | R50_1x | 37.4 | 11.8 | 29.3 | 34.2 | 36.1 | 35.0 | 39.9 | 37.5 |
YOLOF | R50_1x | 37.5 | 12.8 | 28.4 | 35.2 | 36.6 | 35.3 | 39.2 | 61.6 |
Sparse R-CNN | R50_1x | 37.9 | 22.8 | 27.8 | 34.7 | 37.1 | 37.1 | 42.6 | 37.8 |
YOLOv5-s | 37.4 | 10.5 | 28.8 | 34.9 | 36.9 | 35.1 | 38.4 | 140.0 | |
RepPoints | R50_1x | 38.1 | 12.9 | 29.2 | 34.7 | 36.7 | 35.6 | 40.3 | 27.4 |
FCOS | R50_1x | 38.7 | 14.7 | 29.5 | 35.3 | 38.0 | 36.7 | 41.1 | 37.3 |
DETR | R50_150e | 40.1 | 26.9 | 29.8 | 36.2 | 39.8 | 39.1 | 45.7 | 49.9 |
RetinaNet | PVT-s_1x | 40.4 | 19.7 | 30.8 | 36.9 | 39.0 | 37.4 | 44.6 | 20.0 |
Cascade R-CNN | R50_1x | 40.3 | 18.7 | 30.9 | 36.6 | 39.2 | 38.6 | 44.2 | 30.7 |
GFocal | R50_1x | 40.1 | 16.9 | 31.1 | 37.5 | 39.4 | 38.5 | 43.8 | 37.2 |
YOLOv8-s | 44.9 | 24.4 | 33.4 | 42.2 | 44.3 | 43.2 | 48.5 | 128.5 | |
Cascade Mask R-CNN | R101_3x | 45.4 | 22.4 | 34.7 | 41.6 | 44.3 | 44.4 | 49.1 | 18.7 |
Sparse R-CNN | R50_3x | 45.0 | 21.6 | 35.8 | 41.9 | 43.4 | 44.0 | 50.3 | 32.1 |
YOLOv5-m | 45.2 | 12.9 | 36.0 | 42.3 | 44.5 | 43.2 | 46.7 | 104.6 | |
Mask R-CNN | Swin-T_3x | 46.0 | 15.4 | 36.8 | 41.7 | 44.1 | 43.5 | 49.0 | 24.3 |
Mask R-CNN | ConvNeXt-T_3x | 46.2 | 17.6 | 36.7 | 41.9 | 44.5 | 43.6 | 49.7 | 22.6 |
Cascade Mask R-CNN | X101-32x8d_3x | 46.1 | 21.1 | 36.1 | 42.0 | 44.8 | 45.9 | 49.9 | 13.5 |
VFNet | R101_2x | 46.2 | 15.6 | 36.7 | 43.0 | 45.0 | 44.5 | 48.8 | 25.9 |
Deformable DETR | R50_50e | 46.1 | 23.2 | 36.3 | 42.6 | 45.6 | 45.1 | 51.2 | 25.9 |
Sparse R-CNN | R101_3x | 46.2 | 21.1 | 36.9 | 42.9 | 44.9 | 44.7 | 51.3 | 25.2 |
GFocal | X101-32x4d_2x | 46.1 | 15.7 | 37.0 | 43.5 | 45.0 | 44.4 | 49.3 | 25.2 |
- 'TS': Training Schedule.
- '
$\text{ZP}^{0,5}$ ': the traditional Average Precision. - 'Var': the variance of the 5 ZP (
$\text{ZP}^{0,1}$ ,$\text{ZP}^{1,2}$ , ...,$\text{ZP}^{4,5}$ ). - 'FPS' is measured on a single RTX 3090 GPU. Class score threshold=0.05, NMS IoU threshold=0.6. The test resolution is 640 for YOLOv5 and YOLOv8, while [1333, 800] for the others.
- If you test DETR series, you must modify the
simple_test()
function inmmdet/models/detectors/single_stage.py
,
#outs = self.bbox_head(feat)
outs = self.bbox_head(feat, img_metas) # if you test DETR series
Currently, we do not support zone evaluation for instance segmentation models.