📖CUDA-Learn-Notes: 🎉CUDA/C++ 笔记 / 技术博客: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc. 👉News: Most of my time now is focused on LLM/VLM/Diffusion Inference. Please check 📖Awesome-LLM-Inference , 📖Awesome-SD-Inference
and 📖CUDA-Learn-Notes
for more details.
![image](https://private-user-images.githubusercontent.com/31974251/352692153-0c5e5125-586f-43fa-8e8b-e2c61c1afbbe.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwMDQ1ODksIm5iZiI6MTczOTAwNDI4OSwicGF0aCI6Ii8zMTk3NDI1MS8zNTI2OTIxNTMtMGM1ZTUxMjUtNTg2Zi00M2ZhLThlOGItZTJjNjFjMWFmYmJlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDA4NDQ0OVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWYzODllZDc4NjQ5M2E3NjcxZTIyNTdkOWJmMGZmZjZkZDFkZjUxOWRhMDM3MTczOTJkYzViOTQ0YjMxMDY1MDUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.nk39JRwAyRX72LDFSj_xg7WOZcRVNapU_XqtN0jYKwM)
💡说明: 大佬们写的文章实在是太棒了,学到了很多东西。欢迎大家提PR推荐更多优秀的文章!
- / = not supported now.
- ✔️ = known work and already supported now.
- ❔ = in my plan, but not coming soon, maybe a few weeks later.
- workflow: custom CUDA kernel impl -> Torch python binding -> Run tests.
GNU General Public License v3.0
Welcome to 🌟👆🏻star & submit a PR to this repo!