-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference] cublaslt gemm algo search code optimization and support load algo caches file generated by offline #66132
[Inference] cublaslt gemm algo search code optimization and support load algo caches file generated by offline #66132
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for flags
* Example: | ||
* Note: If True, will apply global search in blaslt. | ||
*/ | ||
PHI_DEFINE_EXPORTED_bool(enable_blaslt_global_search, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续pr记得加单测
* Example: | ||
* Note: If set this flag, will load search configs file generated by offline. | ||
*/ | ||
PHI_DEFINE_EXPORTED_string(cublaslt_device_best_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续pr记得加单测
false, | ||
"Whether to use global search in cublaslt gemm."); | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
多一个*
@@ -1767,6 +1767,39 @@ PHI_DEFINE_EXPORTED_int32( | |||
-1, | |||
"Max count of eliminate redundant computation in CSE, for debug usage"); | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
多一个*
"Whether to load search configs file generated by " | ||
"offline in cublaslt gemm."); | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
多一个*
} | ||
if (phi::autotune::AutoTuneStatus::Instance().UseAutoTune() && | ||
(!desc->is_cached)) { | ||
SearchBestAlgo(ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
搜不到的时候有backup的默认algo吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个PR没有,下个PR统一加上
…oad algo caches file generated by offline (PaddlePaddle#66132) * code optimization * support load config file * update * udpate * update flag decsription * fix * fix ci
…oad algo caches file generated by offline (PaddlePaddle#66132) * code optimization * support load config file * update * udpate * update flag decsription * fix * fix ci
…oad algo caches file generated by offline (PaddlePaddle#66132) * code optimization * support load config file * update * udpate * update flag decsription * fix * fix ci
PR Category
Inference
PR Types
Improvements
Description
pcard-71500
cublaslt gemm algo search code optimization and support load algo caches file generated by offline.
Usage
离线配置文件可由tune_cublaslt_gemm算子生成,并放到指定路径下。加载离线文件配置(FLAGS_cublaslt_device_best_config)需启用cuBLASLT全局搜索(FLAGS_enable_blaslt_global_search)后才生效
cache file format
format(each line): M,K,N,algoId,swizzle,customOption,tile,splitK_val,reductionScheme,stages,workspaceSize,time
example: 2048,2048,4096,21,0,0,24,8,3,15,0,0.0701338
parameter:
TODO