-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quant Tool] Update QDQ Pad, Slice, Softmax #22676
[Quant Tool] Update QDQ Pad, Slice, Softmax #22676
Conversation
… as the input. Fix bug when softmax is excluded from QDQ quantization
Is the reason that QNN EP has lower latency when the quantization parameters are equal because it can fuse |
I imagine this is the case, although I don't know for sure. We only know this from inference latency measurements. |
### Description Updates python quantization tool: - Ensures QDQ Pad has equal quantization parameters across input and output for certain Pad configurations. - Ensures QDQ Slice always has equal quantization parameters across input and output. - Fixes bug when Softmax is _excluded_ from quantization. ### Motivation and Context QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.
### Description Updates python quantization tool: - Ensures QDQ Pad has equal quantization parameters across input and output for certain Pad configurations. - Ensures QDQ Slice always has equal quantization parameters across input and output. - Fixes bug when Softmax is _excluded_ from quantization. ### Motivation and Context QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.
### Description Updates python quantization tool: - Ensures QDQ Pad has equal quantization parameters across input and output for certain Pad configurations. - Ensures QDQ Slice always has equal quantization parameters across input and output. - Fixes bug when Softmax is _excluded_ from quantization. ### Motivation and Context QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.
Description
Updates python quantization tool:
Motivation and Context
QDQ Pad and Slice have lower latency on QNN EP when their quantization parameters are equal.