Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dy2St] Optimize range_block_do performance #69834

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

SigureMo
Copy link
Member

@SigureMo SigureMo commented Nov 29, 2024

PR Category

Execute Infrastructure

PR Types

Performance

Description

动转静前反向拆分目前在部分 Program 较大的模型上需要很长时间,测试模型开启组合算子后(前向 17686 个 OP,反向个 11558 OP,共 29244 个 OP)上需要 34s

目前 range_block_do 每次循环在判断退出条件时,都会跑一次 it != list_offset(block, range[1]),导致这里变成 $O(N^2)$,因此模型规模越大,就显得越慢

优化此处后前反向拆分在 100ms 内即可完成,基本无感

顺带将 range 类型从 std::vector<int> 改为 std::pair<size_t, size_t>,语义上更明确些

PCard-66972

Copy link

paddle-bot bot commented Nov 29, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (1)
  • paddle/fluid/pybind/pir.cc: Language not supported
@SigureMo SigureMo closed this Nov 29, 2024
@SigureMo SigureMo reopened this Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant