[Quant Tool] Update QDQ Pad, Slice, Softmax #22676

adrianlizarraga · 2024-10-31T13:26:51Z

Description

Updates python quantization tool:

Ensures QDQ Pad has equal quantization parameters across input and output for certain Pad configurations.
Ensures QDQ Slice always has equal quantization parameters across input and output.
Fixes bug when Softmax is excluded from quantization.

Motivation and Context

QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.

… as the input. Fix bug when softmax is excluded from QDQ quantization

onnxruntime/python/tools/quantization/base_quantizer.py

yihonglyu · 2024-11-04T21:39:59Z

Description

Updates python quantization tool:

Ensures QDQ Pad has equal quantization parameters across input and output for certain Pad configurations.

Ensures QDQ Slice always has equal quantization parameters across input and output.

Fixes bug when Softmax is excluded from quantization.

Motivation and Context

QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.

Is the reason that QNN EP has lower latency when the quantization parameters are equal because it can fuse DQ->FP32 Pad->Q into INT8 Pad?

onnxruntime/test/python/quantization/test_op_pad.py

adrianlizarraga · 2024-11-05T01:00:12Z

Is the reason that QNN EP has lower latency when the quantization parameters are equal because it can fuse DQ->FP32 Pad->Q into INT8 Pad?

I imagine this is the case, although I don't know for sure. We only know this from inference latency measurements.

onnxruntime/test/python/quantization/test_op_pad.py

onnxruntime/test/python/quantization/test_op_slice.py

onnxruntime/python/tools/quantization/operators/pad.py

onnxruntime/test/python/quantization/test_op_pad.py

onnxruntime/test/python/quantization/op_test_utils.py

onnxruntime/test/python/quantization/test_op_pad.py

### Description Updates python quantization tool: - Ensures QDQ Pad has equal quantization parameters across input and output for certain Pad configurations. - Ensures QDQ Slice always has equal quantization parameters across input and output. - Fixes bug when Softmax is _excluded_ from quantization. ### Motivation and Context QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.

[Quant tool] Update QDQ Pad and QDQ Slice to quantize output the same…

ad8d487

… as the input. Fix bug when softmax is excluded from QDQ quantization

adrianlizarraga marked this pull request as ready for review October 31, 2024 17:51

adrianlizarraga requested review from yihonglyu and jywu-msft October 31, 2024 18:13

Merge branch 'main' into adrianl/quant-tool-slice-pad-softmax-updates

5434a84

adrianlizarraga requested review from xadupre and fajin-corp November 4, 2024 17:52

yihonglyu reviewed Nov 4, 2024

View reviewed changes

onnxruntime/python/tools/quantization/base_quantizer.py Show resolved Hide resolved

yihonglyu reviewed Nov 4, 2024

View reviewed changes

onnxruntime/test/python/quantization/test_op_pad.py Outdated Show resolved Hide resolved

adrianlizarraga added 2 commits November 4, 2024 16:33

Properly test and handle Pad opset 2

4e24125

Add unittest for softmax bug fix (when softmax is excluded)

72766cd

sophies927 added release:1.20.1 triage:approved Approved for cherrypicks for release labels Nov 5, 2024

yihonglyu reviewed Nov 5, 2024

View reviewed changes

onnxruntime/test/python/quantization/test_op_pad.py Outdated Show resolved Hide resolved

yihonglyu reviewed Nov 5, 2024

View reviewed changes

onnxruntime/test/python/quantization/test_op_pad.py Outdated Show resolved Hide resolved

adrianlizarraga added 2 commits November 5, 2024 13:42

Add float16 testing for pad unit test

c63df67

Remove unnecessary extra option

343b0da

adrianlizarraga commented Nov 5, 2024

View reviewed changes

onnxruntime/test/python/quantization/test_op_slice.py Show resolved Hide resolved

yihonglyu reviewed Nov 5, 2024

View reviewed changes

onnxruntime/python/tools/quantization/operators/pad.py Show resolved Hide resolved

yihonglyu reviewed Nov 5, 2024

View reviewed changes

onnxruntime/test/python/quantization/test_op_pad.py Outdated Show resolved Hide resolved

Refactor common unittest utility function

6016c95

adrianlizarraga commented Nov 5, 2024

View reviewed changes

onnxruntime/test/python/quantization/op_test_utils.py Outdated Show resolved Hide resolved

Update onnxruntime/test/python/quantization/op_test_utils.py

8d1562a

yihonglyu reviewed Nov 6, 2024

View reviewed changes

onnxruntime/test/python/quantization/test_op_pad.py Outdated Show resolved Hide resolved

Address comment for unittest

9b26a07

yihonglyu approved these changes Nov 6, 2024

View reviewed changes

Merge branch 'main' into adrianl/quant-tool-slice-pad-softmax-updates

055aa86

adrianlizarraga merged commit aa0cf1c into main Nov 6, 2024
89 of 91 checks passed

adrianlizarraga deleted the adrianl/quant-tool-slice-pad-softmax-updates branch November 6, 2024 22:06

sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quant Tool] Update QDQ Pad, Slice, Softmax #22676

[Quant Tool] Update QDQ Pad, Slice, Softmax #22676

adrianlizarraga commented Oct 31, 2024

yihonglyu commented Nov 4, 2024 •

edited

Loading

Description

Motivation and Context

adrianlizarraga commented Nov 5, 2024 •

edited

Loading

[Quant Tool] Update QDQ Pad, Slice, Softmax #22676

[Quant Tool] Update QDQ Pad, Slice, Softmax #22676

Conversation

adrianlizarraga commented Oct 31, 2024

Description

Motivation and Context

yihonglyu commented Nov 4, 2024 • edited Loading

Description

Motivation and Context

adrianlizarraga commented Nov 5, 2024 • edited Loading

yihonglyu commented Nov 4, 2024 •

edited

Loading

adrianlizarraga commented Nov 5, 2024 •

edited

Loading