[Quant Tool] Introduce get_qdq_config() helper to get QDQ configurations #22677

adrianlizarraga · 2024-10-31T13:41:45Z

Description

Introduces the get_qdq_config() function to get a quantization configuration for a full integer QDQ model. This function provides an easier way of specifying commonly used options and sets convenient defaults. Specifically:

Instead of requiring the user to pass a dictionary of extra_options, the new interface adds function parameters for common settings:
- All calibrator settings
- Whether activations/weights are symmetric
- Whether to keep or fuse relu/clip into Q
- Minimum real range for quantization
- Dictionary of tensor quantization overrides.
Automatically scans the input floating-point model and fills out the operator types to quantize. Otherwise, only a limited number of operator types would be quantized by default.
Detects if the input model uses external data. If so, ensures that the generated QDQ model also uses external data.
Detects if the model will use newly introduced quantization types (int4/int16) with an older opset. If so, forces the use of the com.microsoft domain for Q/DQ ops, which support all types.
Automatically enables the "extra option" called ForceQuantizeNoInputCheck to ensure data movement operators (e.g., Transpose) are always quantized.
User can pass a function to indicate which nodes to exclude from quantization.
The user can still pass their own extra_options to override any of the above if necessary.

from onnxruntime.quantization import get_int_qdq_config, quantize # , ...

# Get QDQ configuration
qdq_config = get_int_qdq_config(
    float_model,
    data_reader,
    calibrate_method=CalibrationMethod.Percentile,
    calibrate_args={"percentile": 99.98},  # Converted to extra_options
    activation_type=QuantType.QUInt8,
    weight_type=QuantType.QInt8,
    per_channel=True,
    nodes_to_exclude=["Mul"], # Could also be a function. Ex: `lambda model, node: node.op_type == "Softmax"`

    # Other options converted to extra_options:
    min_real_range=0.0001,
    keep_removable_activations=True,
    activation_symmetric=True,
    weight_symmetric=True,
)

# Quantize model
quantize(float_model_path, qdq_model_path, qdq_config)

Motivation and Context

Need a version of get_qnn_qdq_config() that is not EP-specific.

…ration

onnxruntime/test/python/quantization/test_get_int_qdq_config.py

onnxruntime/python/tools/quantization/quantize.py

fajin-corp

…ons (#22677) ### Description Introduces the `get_qdq_config()` function to get a quantization configuration for a full integer QDQ model. This function provides an easier way of specifying commonly used options and sets convenient defaults. Specifically: - Instead of requiring the user to pass a dictionary of `extra_options`, the new interface adds function parameters for common settings: - All calibrator settings - Whether activations/weights are symmetric - Whether to keep or fuse relu/clip into Q - Minimum real range for quantization - Dictionary of tensor quantization overrides. - Automatically scans the input floating-point model and fills out the operator types to quantize. Otherwise, only a limited number of operator types would be quantized by default. - Detects if the input model uses external data. If so, ensures that the generated QDQ model also uses external data. - Detects if the model will use newly introduced quantization types (int4/int16) with an older opset. If so, forces the use of the `com.microsoft` domain for Q/DQ ops, which support all types. - Automatically enables the "extra option" called `ForceQuantizeNoInputCheck` to ensure data movement operators (e.g., Transpose) are always quantized. - User can pass a function to indicate which nodes to exclude from quantization. - The user can still pass their own `extra_options` to override any of the above if necessary. ```python from onnxruntime.quantization import get_int_qdq_config, quantize # , ... # Get QDQ configuration qdq_config = get_int_qdq_config( float_model, data_reader, calibrate_method=CalibrationMethod.Percentile, calibrate_args={"percentile": 99.98}, # Converted to extra_options activation_type=QuantType.QUInt8, weight_type=QuantType.QInt8, per_channel=True, nodes_to_exclude=["Mul"], # Could also be a function. Ex: `lambda model, node: node.op_type == "Softmax"` # Other options converted to extra_options: min_real_range=0.0001, keep_removable_activations=True, activation_symmetric=True, weight_symmetric=True, ) # Quantize model quantize(float_model_path, qdq_model_path, qdq_config) ``` ### Motivation and Context Need a version of `get_qnn_qdq_config()` that is not EP-specific.

…ons (microsoft#22677) ### Description Introduces the `get_qdq_config()` function to get a quantization configuration for a full integer QDQ model. This function provides an easier way of specifying commonly used options and sets convenient defaults. Specifically: - Instead of requiring the user to pass a dictionary of `extra_options`, the new interface adds function parameters for common settings: - All calibrator settings - Whether activations/weights are symmetric - Whether to keep or fuse relu/clip into Q - Minimum real range for quantization - Dictionary of tensor quantization overrides. - Automatically scans the input floating-point model and fills out the operator types to quantize. Otherwise, only a limited number of operator types would be quantized by default. - Detects if the input model uses external data. If so, ensures that the generated QDQ model also uses external data. - Detects if the model will use newly introduced quantization types (int4/int16) with an older opset. If so, forces the use of the `com.microsoft` domain for Q/DQ ops, which support all types. - Automatically enables the "extra option" called `ForceQuantizeNoInputCheck` to ensure data movement operators (e.g., Transpose) are always quantized. - User can pass a function to indicate which nodes to exclude from quantization. - The user can still pass their own `extra_options` to override any of the above if necessary. ```python from onnxruntime.quantization import get_int_qdq_config, quantize # , ... # Get QDQ configuration qdq_config = get_int_qdq_config( float_model, data_reader, calibrate_method=CalibrationMethod.Percentile, calibrate_args={"percentile": 99.98}, # Converted to extra_options activation_type=QuantType.QUInt8, weight_type=QuantType.QInt8, per_channel=True, nodes_to_exclude=["Mul"], # Could also be a function. Ex: `lambda model, node: node.op_type == "Softmax"` # Other options converted to extra_options: min_real_range=0.0001, keep_removable_activations=True, activation_symmetric=True, weight_symmetric=True, ) # Quantize model quantize(float_model_path, qdq_model_path, qdq_config) ``` ### Motivation and Context Need a version of `get_qnn_qdq_config()` that is not EP-specific.

[Quant Tool] Introduce get_int_qdq_config() helper to get QDQ configu…

40883fb

…ration

adrianlizarraga marked this pull request as ready for review October 31, 2024 18:10

adrianlizarraga requested a review from yihonglyu October 31, 2024 18:27

adrianlizarraga commented Oct 31, 2024

View reviewed changes

onnxruntime/test/python/quantization/test_get_int_qdq_config.py Outdated Show resolved Hide resolved

Update onnxruntime/test/python/quantization/test_get_int_qdq_config.py

91d0b3a

adrianlizarraga requested a review from jywu-msft October 31, 2024 18:42

adrianlizarraga added 3 commits November 4, 2024 09:06

Merge branch 'main' into adrianl/quant-tool-get-init-qdq-config

2474ec5

Allow passing a callable to exclude nodes from quantization

6c58aae

Fix unittest use of property

d284685

adrianlizarraga requested a review from fajin-corp November 4, 2024 18:32

sophies927 added triage:approved Approved for cherrypicks for release release:1.20.1 labels Nov 5, 2024

fajin-corp reviewed Nov 5, 2024

View reviewed changes

onnxruntime/python/tools/quantization/quantize.py Outdated Show resolved Hide resolved

fajin-corp reviewed Nov 5, 2024

View reviewed changes

onnxruntime/python/tools/quantization/quantize.py Outdated Show resolved Hide resolved

fajin-corp previously approved these changes Nov 5, 2024

View reviewed changes

adrianlizarraga added 2 commits November 5, 2024 14:59

Merge branch 'main' into adrianl/quant-tool-get-init-qdq-config

2f92fa0

Address review comments: name of function, make constant

2b3e285

adrianlizarraga dismissed fajin-corp’s stale review via 2b3e285 November 5, 2024 23:19

adrianlizarraga changed the title ~~[Quant Tool] Introduce get_int_qdq_config() helper to get QDQ configurations~~ [Quant Tool] Introduce get_qdq_config() helper to get QDQ configurations Nov 5, 2024

fajin-corp approved these changes Nov 6, 2024

View reviewed changes

adrianlizarraga merged commit 2c1b17c into main Nov 6, 2024
91 checks passed

adrianlizarraga deleted the adrianl/quant-tool-get-init-qdq-config branch November 6, 2024 18:27

sophies927 added the cherry-picked Cherry-picked for a cherrypicks branch label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quant Tool] Introduce get_qdq_config() helper to get QDQ configurations #22677

[Quant Tool] Introduce get_qdq_config() helper to get QDQ configurations #22677

adrianlizarraga commented Oct 31, 2024 •

edited

Loading

fajin-corp left a comment

[Quant Tool] Introduce get_qdq_config() helper to get QDQ configurations #22677

[Quant Tool] Introduce get_qdq_config() helper to get QDQ configurations #22677

Conversation

adrianlizarraga commented Oct 31, 2024 • edited Loading

Description

Motivation and Context

fajin-corp left a comment

Choose a reason for hiding this comment

adrianlizarraga commented Oct 31, 2024 •

edited

Loading