Faster sparse_dense on GPUs #6580

tkonolige · 2020-09-28T19:02:11Z

I've written a faster sparse_dense for GPUs using tir. This sparse_dense requires a padded matrix, so I've added a new op sparse_dense_padded. AlterOpLayout should transform sparse_dense to sparse_dense_padded when using a gpu.

This new sparse_dense improves prunebert performance from 155.41ms mean to 7.75ms mean. In general, this implementation is faster than cublas dense on matrices with density < 0.05 and is often faster than cusparse for machine learning workloads.

This new sparse_dense requires a padded matrix, so a new op `sparse_dense_padded` has been added. AlterOpLayout should transform `sparse_dense` to `sparse_dense_padded` when possible on the gpu.

ANSHUMAN87 · 2020-09-29T19:21:33Z

I've written a faster sparse_dense for GPUs using tir. This sparse_dense requires a padded matrix, so I've added a new op sparse_dense_padded. AlterOpLayout should transform sparse_dense to sparse_dense_padded when using a gpu.

This new sparse_dense improves prunebert performance from 155.41ms mean to 7.75ms mean. In general, this implementation is faster than cublas dense on matrices with density < 0.05 and is often faster than cusparse for machine learning workloads.

@tkonolige : Thanks for the PR! The data looks quite impressive 👍
I was wondering whether we can add some sort of benchmark testcase here , tuned to your shared data?

tkonolige · 2020-09-29T22:24:18Z

@ANSHUMAN87 Right now TVM does not do any testing for performance regressions. The hard part in setting up performance testing is that is varies from run to run and machine to machine.

ANSHUMAN87 · 2020-09-30T05:46:11Z

@tkonolige : I understand your concern clearly. However it was just a thought. Even if run to run or machine to machine difference, the relative reference would be same. But may be we don't have to do as part of this PR :)
I will go through deep into your PR, will share my comment if any. Thanks!

ghost · 2020-09-30T06:28:16Z

Have you considered your syntax errors?

tkonolige · 2020-09-30T16:09:57Z

@vinx13 @antinucleon @Laurawly @jwfromm @ajtulloch I think this is ready for review.

tests/python/topi/python/test_topi_sparse.py

src/relay/op/nn/sparse.cc

python/tvm/topi/cuda/sparse.py

python/tvm/tir/ir_builder.py

python/tvm/topi/nn/sparse.py

src/relay/op/nn/sparse.cc

ANSHUMAN87

Thanks @tkonolige !

electriclilies

LGTM!

merrymercy · 2020-10-10T17:08:19Z

@tkonolige @tqchen This commits fails in the master branch. see the CI: https://github.com/apache/incubator-tvm/commits/master

It introduces a flaky test that blocks two of my PRs.

ANSHUMAN87 · 2020-10-11T05:18:52Z

@tkonolige @tqchen This commits fails in the master branch. see the CI: https://github.com/apache/incubator-tvm/commits/master

It introduces a flaky test that blocks two of my PRs.

#6658 has resolved the issue i think.

tkonolige · 2020-10-11T06:43:26Z

@merrymercy I think that was the diagnostics

* Faster sparse_dense on GPUs. This new sparse_dense requires a padded matrix, so a new op `sparse_dense_padded` has been added. AlterOpLayout should transform `sparse_dense` to `sparse_dense_padded` when possible on the gpu. * formatting * more formatting * Check that alteroplayout is definedbefore using it * check if FTVMAlterOpLayout exists before using it * formatting * restore message passing * Fix sparse_dense and sparse_dense_padded docs * Fix old sparse_dense, autotvm and sparse_dense dont play well together * Remove unused imports * clarify warp count in cuda_transpose * Document multidimensional access * Warn users not to use sparse_dense_padded * rename nn.sparse_dense_padded to nn.internal.sparse_dense_padded

tkonolige changed the title ~~Faster sparse_dense on GPUs.~~ Faster sparse_dense on GPUs Sep 28, 2020

tkonolige added 4 commits September 29, 2020 11:13

Faster sparse_dense on GPUs.

02a5bed

This new sparse_dense requires a padded matrix, so a new op `sparse_dense_padded` has been added. AlterOpLayout should transform `sparse_dense` to `sparse_dense_padded` when possible on the gpu.

formatting

e9ccbb7

more formatting

b4df663

Check that alteroplayout is definedbefore using it

b563b1c

tkonolige force-pushed the faster_sparse_dense branch from 74cf118 to b563b1c Compare September 29, 2020 17:13

tkonolige added 2 commits September 29, 2020 12:08

check if FTVMAlterOpLayout exists before using it

13610bc

formatting

0f012d1

restore message passing

7bb3343

vinx13 reviewed Oct 1, 2020

View reviewed changes

tests/python/topi/python/test_topi_sparse.py Show resolved Hide resolved

src/relay/op/nn/sparse.cc Outdated Show resolved Hide resolved

tkonolige added 3 commits October 1, 2020 10:14

Fix sparse_dense and sparse_dense_padded docs

5c2989c

Fix old sparse_dense, autotvm and sparse_dense dont play well together

cb8068a

Remove unused imports

144bb56

Laurawly reviewed Oct 2, 2020

View reviewed changes

python/tvm/topi/cuda/sparse.py Show resolved Hide resolved

clarify warp count in cuda_transpose

e312cce

ANSHUMAN87 reviewed Oct 4, 2020

View reviewed changes

python/tvm/tir/ir_builder.py Show resolved Hide resolved

ANSHUMAN87 reviewed Oct 4, 2020

View reviewed changes

python/tvm/tir/ir_builder.py Show resolved Hide resolved

ANSHUMAN87 reviewed Oct 8, 2020

View reviewed changes

python/tvm/tir/ir_builder.py Outdated Show resolved Hide resolved

ANSHUMAN87 reviewed Oct 8, 2020

View reviewed changes

python/tvm/topi/nn/sparse.py Show resolved Hide resolved

ANSHUMAN87 reviewed Oct 8, 2020

View reviewed changes

src/relay/op/nn/sparse.cc Outdated Show resolved Hide resolved

tkonolige added 3 commits October 8, 2020 10:29

Document multidimensional access

1a051c7

Warn users not to use sparse_dense_padded

f02df33

rename nn.sparse_dense_padded to nn.internal.sparse_dense_padded

fda4e49

ANSHUMAN87 approved these changes Oct 9, 2020

View reviewed changes

electriclilies approved these changes Oct 9, 2020

View reviewed changes

jroesch approved these changes Oct 9, 2020

View reviewed changes

jroesch merged commit 6d0351a into apache:master Oct 9, 2020

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster sparse_dense on GPUs #6580

Faster sparse_dense on GPUs #6580

tkonolige commented Sep 28, 2020

ANSHUMAN87 commented Sep 29, 2020

tkonolige commented Sep 29, 2020

ANSHUMAN87 commented Sep 30, 2020

ghost commented Sep 30, 2020

tkonolige commented Sep 30, 2020

ANSHUMAN87 left a comment

electriclilies left a comment

merrymercy commented Oct 10, 2020 •

edited

Loading

ANSHUMAN87 commented Oct 11, 2020

tkonolige commented Oct 11, 2020

Faster sparse_dense on GPUs #6580

Faster sparse_dense on GPUs #6580

Conversation

tkonolige commented Sep 28, 2020

ANSHUMAN87 commented Sep 29, 2020

tkonolige commented Sep 29, 2020

ANSHUMAN87 commented Sep 30, 2020

ghost commented Sep 30, 2020

tkonolige commented Sep 30, 2020

ANSHUMAN87 left a comment

Choose a reason for hiding this comment

electriclilies left a comment

Choose a reason for hiding this comment

merrymercy commented Oct 10, 2020 • edited Loading

ANSHUMAN87 commented Oct 11, 2020

tkonolige commented Oct 11, 2020

merrymercy commented Oct 10, 2020 •

edited

Loading