-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] dense_tensorcore/batch_matmul_tensorcore support int8/int4 #8402
Conversation
Thanks for your continue contribution on the tensor core schedule! @wyc-ruiker I'll help reivew when I have time. p.s. Recently I added a new op |
Thanks, But in our vit network, it looks like we have some performance issues before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @wyc-ruiker Overall looks great to me.
Just some nit-pick.
Co-authored-by: Chenfan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @wyc-ruiker
Push agian to re-triggle the CI? @wyc-ruiker |
…ache#8402) * add int8/int tensorcore for dense/batch_matmul * fix bug * fix lint * Apply suggestions from code review Co-authored-by: Chenfan <[email protected]> * fix for reviewer * fix lint Co-authored-by: Chenfan <[email protected]>
…ache#8402) * add int8/int tensorcore for dense/batch_matmul * fix bug * fix lint * Apply suggestions from code review Co-authored-by: Chenfan <[email protected]> * fix for reviewer * fix lint Co-authored-by: Chenfan <[email protected]>
Let dense_tensorcore and batch_matmul_tensorcore support int8/int4.
Before this pr, the vision transform (vit) latency (#7814) in Tesla T4 is:
vit int4: 4.71 ms
vit int8: 3.48 ms
After this pr:
vit int4: 2.93 ms
vit int8: 2.97 ms
@jcf94 @jwfromm @huochaitiantang could you help review this pr?