Fix f8f8bf16_lite quantize op input in `quantize_and_compute` #3667

YUNQIUGUO · 2025-02-07T17:23:58Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/745

A minor fix for trt-llm cudaCoreGemm cuda_lite op in quantize_bench script.

when testing with --bench_quantize detected a failure with input

...
tree/deeplearning/fbgemm/fbgemm_gpu/experimental/gen_ai/bench/quantize_ops.py", line 797, in quantize_and_compute
    return self.compute(xq, wq, x_scale * w_scale)
TypeError: FP8LiteGemm.compute() missing 1 required positional argument: 'w_scale'

Reviewed By: jwfromm

Differential Revision: D69272912

facebook-github-bot · 2025-02-07T17:24:11Z

This pull request was exported from Phabricator. Differential Revision: D69272912

netlify · 2025-02-07T17:24:23Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`f2f7712`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67a64abab7f4f40009dd7c1a
😎 Deploy Preview	https://deploy-preview-3667--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: heuristics tuning result for fp8 mixed precision fast_gemv: P1725480254 achieved perf numbers on par/almost identical with `cuda_lite_fp8` (trt-llm gemv kernel). Differential Revision: D69128901

Differential Revision: D69213404

…h#3667) Summary: X-link: facebookresearch/FBGEMM#745 A minor fix for trt-llm cudaCoreGemm `cuda_lite` op in quantize_bench script. when testing with `--bench_quantize` detected a failure with input ``` ... tree/deeplearning/fbgemm/fbgemm_gpu/experimental/gen_ai/bench/quantize_ops.py", line 797, in quantize_and_compute return self.compute(xq, wq, x_scale * w_scale) TypeError: FP8LiteGemm.compute() missing 1 required positional argument: 'w_scale' ``` Reviewed By: jwfromm Differential Revision: D69272912

facebook-github-bot · 2025-02-07T18:02:28Z

This pull request was exported from Phabricator. Differential Revision: D69272912

facebook-github-bot · 2025-02-07T18:02:43Z

This pull request was exported from Phabricator. Differential Revision: D69272912

facebook-github-bot · 2025-02-07T21:56:00Z

This pull request has been merged in a914871.

facebook-github-bot added the cla signed label Feb 7, 2025

facebook-github-bot added the fb-exported label Feb 7, 2025

YUNQIUGUO added 3 commits February 7, 2025 10:01

add mixed precision fp8 fast_gemv_quantized kernel

5ec88a1

Summary: heuristics tuning result for fp8 mixed precision fast_gemv: P1725480254 achieved perf numbers on par/almost identical with `cuda_lite_fp8` (trt-llm gemv kernel). Differential Revision: D69128901

add fp8fp8 fast_gemv_quantized

8ce975e

Differential Revision: D69213404

YUNQIUGUO force-pushed the export-D69272912 branch from f75ca84 to 38b9679 Compare February 7, 2025 18:02

YUNQIUGUO force-pushed the export-D69272912 branch from 38b9679 to f2f7712 Compare February 7, 2025 18:02

facebook-github-bot closed this in a914871 Feb 7, 2025

facebook-github-bot added the Merged label Feb 7, 2025

q10 added category:fix feature:fp8 labels Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix f8f8bf16_lite quantize op input in `quantize_and_compute` #3667

Fix f8f8bf16_lite quantize op input in `quantize_and_compute` #3667

YUNQIUGUO commented Feb 7, 2025

facebook-github-bot commented Feb 7, 2025

netlify bot commented Feb 7, 2025 •

edited

Loading

facebook-github-bot commented Feb 7, 2025

facebook-github-bot commented Feb 7, 2025

facebook-github-bot commented Feb 7, 2025

Fix f8f8bf16_lite quantize op input in quantize_and_compute #3667

Fix f8f8bf16_lite quantize op input in quantize_and_compute #3667

Conversation

YUNQIUGUO commented Feb 7, 2025

facebook-github-bot commented Feb 7, 2025

netlify bot commented Feb 7, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Feb 7, 2025

facebook-github-bot commented Feb 7, 2025

facebook-github-bot commented Feb 7, 2025

Fix f8f8bf16_lite quantize op input in `quantize_and_compute` #3667

Fix f8f8bf16_lite quantize op input in `quantize_and_compute` #3667

netlify bot commented Feb 7, 2025 •

edited

Loading