Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference] cublaslt gemm algo search code optimization and support load algo caches file generated by offline #66132

Merged

Conversation

yuanlehome
Copy link
Contributor

@yuanlehome yuanlehome commented Jul 17, 2024

PR Category

Inference

PR Types

Improvements

Description

pcard-71500

cublaslt gemm algo search code optimization and support load algo caches file generated by offline.

Usage

# 开启后会在计算int8 matmul时启用cuBLASLt全局搜索找寻最优配置
export FLAGS_enable_blaslt_global_search=1
# 开启后会在离线文件中加载int8 matmul配置
export FLAGS_cublaslt_device_best_config=/path/to/file

离线配置文件可由tune_cublaslt_gemm算子生成,并放到指定路径下。加载离线文件配置(FLAGS_cublaslt_device_best_config)需启用cuBLASLT全局搜索(FLAGS_enable_blaslt_global_search)后才生效

cache file format

format(each line): M,K,N,algoId,swizzle,customOption,tile,splitK_val,reductionScheme,stages,workspaceSize,time
example: 2048,2048,4096,21,0,0,24,8,3,15,0,0.0701338
parameter:

  • M,K,N:矩阵shape[M, K]*[K,N] = [M,N]
  • algoId:指定初始化的算法ID
  • swizzle:是否开启 CTA swizzling
  • customOption:自定义选项值
  • tile:tile_id,用于设置tile size(rows * columns)
  • splitk_val:K 的分割数
  • reductionScheme:splitK_val > 1 时使用的缩减方案
  • stages id:用于配置暂存输入元素的共享内存缓冲区的大小和数量
  • workspaceSize:所需workspace内存大小
  • time:运行时间

TODO

  • 当shape超出cache file最大时走默认algo config
  • 补充单测

Copy link

paddle-bot bot commented Jul 17, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@yuanlehome yuanlehome changed the title [Inference] cublaslt gemm algo searce code optimization and support load offline algo caches [Inference] cublaslt gemm algo search code optimization and support load algo caches file generated by offline Jul 19, 2024
Copy link
Collaborator

@phlrain phlrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for flags

* Example:
* Note: If True, will apply global search in blaslt.
*/
PHI_DEFINE_EXPORTED_bool(enable_blaslt_global_search,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续pr记得加单测

* Example:
* Note: If set this flag, will load search configs file generated by offline.
*/
PHI_DEFINE_EXPORTED_string(cublaslt_device_best_config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续pr记得加单测

false,
"Whether to use global search in cublaslt gemm.");

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

多一个*

@@ -1767,6 +1767,39 @@ PHI_DEFINE_EXPORTED_int32(
-1,
"Max count of eliminate redundant computation in CSE, for debug usage");

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

多一个*

"Whether to load search configs file generated by "
"offline in cublaslt gemm.");

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

多一个*

}
if (phi::autotune::AutoTuneStatus::Instance().UseAutoTune() &&
(!desc->is_cached)) {
SearchBestAlgo(ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

搜不到的时候有backup的默认algo吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个PR没有,下个PR统一加上

@yuanlehome yuanlehome merged commit d50af66 into PaddlePaddle:develop Jul 22, 2024
31 checks passed
lixcli pushed a commit to lixcli/Paddle that referenced this pull request Jul 22, 2024
…oad algo caches file generated by offline (PaddlePaddle#66132)

* code optimization

* support load config file

* update

* udpate

* update flag decsription

* fix

* fix ci
zhiqiu pushed a commit to zhiqiu/Paddle that referenced this pull request Jul 22, 2024
…oad algo caches file generated by offline (PaddlePaddle#66132)

* code optimization

* support load config file

* update

* udpate

* update flag decsription

* fix

* fix ci
inaomIIsfarell pushed a commit to inaomIIsfarell/Paddle that referenced this pull request Jul 31, 2024
…oad algo caches file generated by offline (PaddlePaddle#66132)

* code optimization

* support load config file

* update

* udpate

* update flag decsription

* fix

* fix ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants