Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] dense_tensorcore/batch_matmul_tensorcore support int8/int4 #8402

Merged
merged 13 commits into from
Jul 9, 2021

Conversation

wyc-ruiker
Copy link
Contributor

@wyc-ruiker wyc-ruiker commented Jul 5, 2021

Let dense_tensorcore and batch_matmul_tensorcore support int8/int4.
Before this pr, the vision transform (vit) latency (#7814) in Tesla T4 is:
vit int4: 4.71 ms
vit int8: 3.48 ms
After this pr:
vit int4: 2.93 ms
vit int8: 2.97 ms

@jcf94 @jwfromm @huochaitiantang could you help review this pr?

@wyc-ruiker wyc-ruiker changed the title [CUDA] add int8/int4 tensorcore for dense/batch_matmul [CUDA] dense_tensorcore/batch_matmul_tensorcore support int8/int4 Jul 5, 2021
@jcf94
Copy link
Contributor

jcf94 commented Jul 5, 2021

Thanks for your continue contribution on the tensor core schedule! @wyc-ruiker I'll help reivew when I have time.

p.s. Recently I added a new op nn.matmul which extend the nn.dense to support data tensor and weight tensor to be in transposed or non-transposed format. For a model from frameworks like TensorFlow, TVM will insert an extra transpose for nn.dense while use nn.matmul can get rid of that.
I'm not sure but maybe it will be beneficial to use it if your model suffered from some performance issue on the inserted transpose.

@wyc-ruiker
Copy link
Contributor Author

Thanks for your continue contribution on the tensor core schedule! @wyc-ruiker I'll help reivew when I have time.

p.s. Recently I added a new op nn.matmul which extend the nn.dense to support data tensor and weight tensor to be in transposed or non-transposed format. For a model from frameworks like TensorFlow, TVM will insert an extra transpose for nn.dense while use nn.matmul can get rid of that.
I'm not sure but maybe it will be beneficial to use it if your model suffered from some performance issue on the inserted transpose.

  %1552 = reshape(%1551, newshape=[-1, 64, 50]) /* ty=Tensor[(12, 64, 50), float32] */;
  %1553 = transpose(%1552, axes=[0, 2, 1]) /* ty=Tensor[(12, 50, 64), float32] */;
  %1554 = multiply(%1553, 16f /* ty=float32 */) /* ty=Tensor[(12, 50, 64), float32] */;
  %1555 = round(%1554) /* ty=Tensor[(12, 50, 64), float32] */;
  %1556 = clip(%1555, a_min=-127f, a_max=127f) /* ty=Tensor[(12, 50, 64), float32] */;
  %1557 = cast(%1549, dtype="int8") /* ty=Tensor[(12, 50, 64), int8] */;
  %1558 = cast(%1556, dtype="int8") /* ty=Tensor[(12, 50, 64), int8] */;
  %1559 = nn.batch_matmul(%1557, %1558, meta[relay.attrs.BatchMatmulAttrs][61]) /* ty=Tensor[(12, 50, 50), int32] */;

Thanks, But in our vit network, it looks like we have some performance issues before nn.batch_matmul. Waiting for your adding full transpose support for nn.batch_matmul!

@jcf94 jcf94 self-assigned this Jul 7, 2021
Copy link
Contributor

@jcf94 jcf94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @wyc-ruiker Overall looks great to me.

Just some nit-pick.

python/tvm/topi/cuda/batch_matmul_tensorcore.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/dense_tensorcore.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/tensorcore_alter_op.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/tensorcore_alter_op.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/tensorcore_alter_op.py Outdated Show resolved Hide resolved
@wyc-ruiker wyc-ruiker requested a review from jcf94 July 7, 2021 12:33
Copy link
Contributor

@jcf94 jcf94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @wyc-ruiker

@jcf94
Copy link
Contributor

jcf94 commented Jul 9, 2021

Push agian to re-triggle the CI? @wyc-ruiker

@jcf94 jcf94 merged commit 0fa4396 into apache:main Jul 9, 2021
@wyc-ruiker wyc-ruiker deleted the vit branch July 9, 2021 11:33
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…ache#8402)

* add int8/int tensorcore for dense/batch_matmul

* fix bug

* fix lint

* Apply suggestions from code review

Co-authored-by: Chenfan <[email protected]>

* fix for reviewer

* fix lint

Co-authored-by: Chenfan <[email protected]>
zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Mar 4, 2022
…ache#8402)

* add int8/int tensorcore for dense/batch_matmul

* fix bug

* fix lint

* Apply suggestions from code review

Co-authored-by: Chenfan <[email protected]>

* fix for reviewer

* fix lint

Co-authored-by: Chenfan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants