Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quant Tool] Update QDQ Pad, Slice, Softmax #22676

Merged
merged 10 commits into from
Nov 6, 2024

Conversation

adrianlizarraga
Copy link
Contributor

Description

Updates python quantization tool:

  • Ensures QDQ Pad has equal quantization parameters across input and output for certain Pad configurations.
  • Ensures QDQ Slice always has equal quantization parameters across input and output.
  • Fixes bug when Softmax is excluded from quantization.

Motivation and Context

QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.

… as the input. Fix bug when softmax is excluded from QDQ quantization
@adrianlizarraga adrianlizarraga marked this pull request as ready for review October 31, 2024 17:51
@yihonglyu
Copy link
Contributor

yihonglyu commented Nov 4, 2024

Description

Updates python quantization tool:

  • Ensures QDQ Pad has equal quantization parameters across input and output for certain Pad configurations.
  • Ensures QDQ Slice always has equal quantization parameters across input and output.
  • Fixes bug when Softmax is excluded from quantization.

Motivation and Context

QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.

Is the reason that QNN EP has lower latency when the quantization parameters are equal because it can fuse DQ->FP32 Pad->Q into INT8 Pad?

@adrianlizarraga
Copy link
Contributor Author

adrianlizarraga commented Nov 5, 2024

Is the reason that QNN EP has lower latency when the quantization parameters are equal because it can fuse DQ->FP32 Pad->Q into INT8 Pad?

I imagine this is the case, although I don't know for sure. We only know this from inference latency measurements.

@sophies927 sophies927 added release:1.20.1 triage:approved Approved for cherrypicks for release labels Nov 5, 2024
@adrianlizarraga adrianlizarraga merged commit aa0cf1c into main Nov 6, 2024
89 of 91 checks passed
@adrianlizarraga adrianlizarraga deleted the adrianl/quant-tool-slice-pad-softmax-updates branch November 6, 2024 22:06
adrianlizarraga added a commit that referenced this pull request Nov 6, 2024
### Description
Updates python quantization tool:
- Ensures QDQ Pad has equal quantization parameters across input and
output for certain Pad configurations.
- Ensures QDQ Slice always has equal quantization parameters across
input and output.
- Fixes bug when Softmax is _excluded_ from quantization.


### Motivation and Context
QDQ Pad and Slice have lower latency on QNN EP when their quantization
parameters are equal.
yf711 pushed a commit that referenced this pull request Nov 11, 2024
### Description
Updates python quantization tool:
- Ensures QDQ Pad has equal quantization parameters across input and
output for certain Pad configurations.
- Ensures QDQ Slice always has equal quantization parameters across
input and output.
- Fixes bug when Softmax is _excluded_ from quantization.


### Motivation and Context
QDQ Pad and Slice have lower latency on QNN EP when their quantization
parameters are equal.
@sophies927 sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Nov 18, 2024
ishwar-raut1 pushed a commit to ishwar-raut1/onnxruntime that referenced this pull request Nov 19, 2024
### Description
Updates python quantization tool:
- Ensures QDQ Pad has equal quantization parameters across input and
output for certain Pad configurations.
- Ensures QDQ Slice always has equal quantization parameters across
input and output.
- Fixes bug when Softmax is _excluded_ from quantization.


### Motivation and Context
QDQ Pad and Slice have lower latency on QNN EP when their quantization
parameters are equal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked Cherry-picked for a cherrypicks branch release:1.20.1 triage:approved Approved for cherrypicks for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants