[Dy2St] Optimize range_block_do
performance
#69834
Open
+18
−21
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Execute Infrastructure
PR Types
Performance
Description
动转静前反向拆分目前在部分 Program 较大的模型上需要很长时间,测试模型开启组合算子后(前向 17686 个 OP,反向个 11558 OP,共 29244 个 OP)上需要 34s
目前$O(N^2)$ ,因此模型规模越大,就显得越慢
range_block_do
每次循环在判断退出条件时,都会跑一次it != list_offset(block, range[1])
,导致这里变成优化此处后前反向拆分在 100ms 内即可完成,基本无感
顺带将 range 类型从
std::vector<int>
改为std::pair<size_t, size_t>
,语义上更明确些PCard-66972