Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PaddlePaddle Hackathon] Task 70: Add DALI GPU processing for PP-YOLO training #4546

Closed
wants to merge 9 commits into from

Conversation

gbstack
Copy link
Contributor

@gbstack gbstack commented Nov 11, 2021

PR types

New features

PR changes

APIs

Describe

Hi,

This PR adds DALI processing for PP-YOLO training according to Task #4221

Changing COCODataSet to DALICOCODataSet in configuration file configs/datasets/coco_detection.yml will enable DALI preprocessing.

e.g.

TrainDataset:
  !DALICOCODataSet
    image_dir: train2017
    anno_path: annotations/instances_train2017.json
    dataset_dir: /dataset/coco2017
    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']

And the training command is same as before

python tools/train.py --config configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml

Thanks,

… removing requirement of all samples shape in one batch need be same)
…RandomFlip, RandomDistort, NormalizeImage and Permute operations
Set use_dali for all batch_transforms.
Apply batch transforms after loading data from DALI pipeline.
Convert image loaded from DALI pipeline to paddle Tensor.
@lyuwenyu
Copy link
Collaborator

训练速度这块有测过对比嘛 w/ vs. w/o

@gbstack
Copy link
Contributor Author

gbstack commented Nov 12, 2021

训练速度这块有测过对比嘛 w/ vs. w/o

抱歉,刚看到消息。。

我的测试环境是这样的

CPU: Intel i5-7400
GPU: Nvidia 1080 Ti

w/o DALI
bs2  9-10 images/s
bs4  12-13 images/s
bs6  13-14 images/s cpu 60% load average: 3.18, 3.26, 3.13

w/ DALI
bs2  6-7 images/s
bs4  8-9 images/s
bs6  9-10 images/s cpu 25% load average: 1.97, 2.96, 2.99

在batch size 6的时候, 使用DALI时GPU显存占用比不使用DALI多了大约2G,再继续提高batch size就显存不足了。。

根据上面的信息,估计到batch size>10,cpu可能就会满载了(batch size为6时,GPU还没有满载)

@lyuwenyu
Copy link
Collaborator

根据上面的信息,估计到batch size>10,cpu可能就会满载了(batch size为6时,GPU还没有满载)

这个结果的意思是 加了DALI变慢了嘛😂

@gbstack
Copy link
Contributor Author

gbstack commented Nov 16, 2021

根据上面的信息,估计到batch size>10,cpu可能就会满载了(batch size为6时,GPU还没有满载)

这个结果的意思是 加了DALI变慢了嘛joy

就是batch size变大使用DALI可能会提高速度,我试试看能不能在AI Studio上运行吧,用32G的显存。我本地的显存12G使用DALI时,batch size最大只能到6..

@paddle-bot paddle-bot bot closed this Feb 6, 2024
Copy link

paddle-bot bot commented Feb 6, 2024

Automatically closed by Paddle-bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants