Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PaddlePaddle Hackathon] Task 67 #4297

Closed
wants to merge 13 commits into from

Conversation

gbstack
Copy link
Contributor

@gbstack gbstack commented Oct 12, 2021

PR types

New features

PR changes

APIs

Describe

Hi,

This PR adds OHEM according to Task #4224

Setting BBoxHead.bbox_assigner to OHEMBBoxAssigner in configuration file will enable OHEM.

test case file is located at ppdet/modeling/tests/test_ohem_bbox_assigner.py

Thanks,

@gbstack
Copy link
Contributor Author

gbstack commented Oct 13, 2021

Hi,
The TeamCity build details page cannot be opened.. Or could you send me the TeamCity build log so I can fix it?

@jerrywgz
Copy link
Collaborator

jerrywgz commented Oct 13, 2021

ci问题我这边会协助,目前看报错和你提交的代码无关
需要参考https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.2/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml 配置新增一个基于faster rcnn的ohem的配置,然后训练出模型验证效果~同时新增文档,可以参考https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.2/configs/dcn

@gbstack
Copy link
Contributor Author

gbstack commented Oct 13, 2021

ci问题我这边会协助,目前看报错和你提交的代码无关 需要参考https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.2/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml 配置新增一个基于faster rcnn的ohem的配置,然后训练出模型验证效果~同时新增文档,可以参考https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.2/configs/dcn

好的
请问需要在哪个数据集上验证呢?

@jerrywgz
Copy link
Collaborator

ci问题我这边会协助,目前看报错和你提交的代码无关 需要参考https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.2/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml 配置新增一个基于faster rcnn的ohem的配置,然后训练出模型验证效果~同时新增文档,可以参考https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.2/configs/dcn

好的 请问需要在哪个数据集上验证呢?

在coco2017上验证

@gbstack
Copy link
Contributor Author

gbstack commented Oct 14, 2021

你好,
配置文件(configs/faster_rcnn/faster_rcnn_r34_fpn_ohem_1x_coco.yml)和文档(configs/faster_rcnn/README.md)已更新

因为我本机的GPU是1660 super,完全训练时间太长,所以在AI Studio上面训练。不过昨天下午查看时,backbone为resnet34时,完全训练(12个epoch)在AI Studio仍需要6天,就尝试了10k iter,batch size 4

评估结果

ohem 10k iter
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.107
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.225
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.088
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.059
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.122
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.134
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.143
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.239
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.248
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.114
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.260
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.319
[10/14 07:58:22] ppdet.engine INFO: Total sample number: 4952, averge FPS: 12.805969137477573

昨晚到今早查看时,完全训练估计时间变成了不到3天(可能是使用人数变少),就再次训练1 epoch,这次设置了random seed(之前那次没有设置)。目前还在训练中,训练完成后才能生成公开版本的AI Studio项目

您那边方便的话,是否可以完整训练一次?(AI Studio每天最多只能训练16个小时,超过时间后就会断掉,无法连续完整训练)

@gbstack
Copy link
Contributor Author

gbstack commented Oct 14, 2021

这是AI Studio的项目链接:
https://aistudio.baidu.com/aistudio/projectdetail/2467877

fork版本1,并按照notebook执行即可

因为在AIStudio从github下载代码过慢,这里是从gitee下载的(和github版本一致)

训练一个epoch后的评估结果

ohem 1epoch(29316 iter), bs4
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.182
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.339
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.180
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.097
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.200
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.230
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.196
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.324
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.338
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.177
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.359
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.429

@jerrywgz
Copy link
Collaborator

可以发邮件到我们的官网邮箱[email protected],说明是这个 PR 的开发者,然后我们会提供充足的算力卡兑换码哈

@gbstack
Copy link
Contributor Author

gbstack commented Oct 15, 2021

可以发邮件到我们的官网邮箱[email protected],说明是这个 PR 的开发者,然后我们会提供充足的算力卡兑换码哈

好的 谢谢

@gbstack
Copy link
Contributor Author

gbstack commented Oct 22, 2021

使用OHEM后,完整训练的结果是这样的

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.377
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.572
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.408
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.209
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.406
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.496
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.317
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.498
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.522
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.325
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.558
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.663

和原resnet34-faster-rcnn配置文件的区别只有batch size为4,另外在代码里设置的random seed为1

@lyuwenyu
Copy link
Collaborator

image
你用的是哪一个 貌似加了OHEM 效果变差了?

@gbstack
Copy link
Contributor Author

gbstack commented Oct 26, 2021

貌似加了OHEM 效果变差了?

用的第一个,不是resnet-vd版本。有没有可能是因为batch size调大了,但是lr没有调整的原因?因为当时觉得batch size 1训练的太慢了,就把batch size调高了一些

@lyuwenyu
Copy link
Collaborator

貌似加了OHEM 效果变差了?

用的第一个,不是resnet-vd版本。有没有可能是因为batch size调大了,但是lr没有调整的原因?因为当时觉得batch size 1训练的太慢了,就把batch size调高了一些

也就是说 ohem 这个策略没用? mmdet的结果有跑过嘛

@gbstack
Copy link
Contributor Author

gbstack commented Nov 16, 2021

貌似加了OHEM 效果变差了?

用的第一个,不是resnet-vd版本。有没有可能是因为batch size调大了,但是lr没有调整的原因?因为当时觉得batch size 1训练的太慢了,就把batch size调高了一些

也就是说 ohem 这个策略没用? mmdet的结果有跑过嘛

mmdet的结果没有跑过,我跑下试试。

open-mmlab/mmdetection#5596 这个issue提到了他在训练faster rcnn用OHEM时,map下降的问题。

@lyuwenyu
Copy link
Collaborator

貌似加了OHEM 效果变差了?

用的第一个,不是resnet-vd版本。有没有可能是因为batch size调大了,但是lr没有调整的原因?因为当时觉得batch size 1训练的太慢了,就把batch size调高了一些

也就是说 ohem 这个策略没用? mmdet的结果有跑过嘛

mmdet的结果没有跑过,我跑下试试。

open-mmlab/mmdetection#5596 这个issue提到了他在训练faster rcnn用OHEM时,map下降的问题。

好的

@paddle-bot paddle-bot bot closed this Feb 6, 2024
Copy link

paddle-bot bot commented Feb 6, 2024

Automatically closed by Paddle-bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants