Support INT4 Dequant onto GPU for Seq INT TBE look up #3584

faran928 · 2025-01-17T01:46:53Z

Summary:
Seq INT4 -> INT4 STBE look up is supported in the diff stack: https://www.internalfb.com/diff/D61305978 .

This diff supports:

The dequanitzation of INT4 -> INT4 STBE look up onto Cuda for all float types
Extends the dequantization of INT4 > INT4 STBE look up onto CPU for BF16

The main gap is to handle the dequant for the case when scale bias for INT4 quantized tensor is in the front. While for CPU, just need to add the dequantization for BF16 based on dtype.

This will enable us to reduce the network overhead to remote embedding server as well as D2H data transfer from onto GPU host.

Differential Revision: D68187234

facebook-github-bot · 2025-01-17T01:47:02Z

This pull request was exported from Phabricator. Differential Revision: D68187234

netlify · 2025-01-17T01:48:37Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`553548f`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/679accc5ea45b20008fbe043
😎 Deploy Preview	https://deploy-preview-3584--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: Seq INT4 -> INT4 STBE look up is supported in the diff stack: https://www.internalfb.com/diff/D61305978 . This diff supports: 1. The dequanitzation of INT4 -> INT4 STBE look up onto Cuda for all float types 2. Extends the dequantization of INT4 > INT4 STBE look up onto CPU for BF16 The main gap is to handle the dequant for the case when scale bias for INT4 quantized tensor is in the front. While for CPU, just need to add the dequantization for BF16 based on dtype. This will enable us to reduce the network overhead to remote embedding server as well as D2H data transfer from onto GPU host. Differential Revision: D68187234

facebook-github-bot · 2025-01-17T01:50:32Z

This pull request was exported from Phabricator. Differential Revision: D68187234

Summary: Seq INT4 -> INT4 STBE look up is supported in the diff stack: https://www.internalfb.com/diff/D61305978 . This diff supports: 1. The dequanitzation of INT4 -> INT4 STBE look up onto Cuda for all float types 2. Extends the dequantization of INT4 > INT4 STBE look up onto CPU for BF16 The main gap is to handle the dequant for the case when scale bias for INT4 quantized tensor is in the front. While for CPU, just need to add the dequantization for BF16 based on dtype. This will enable us to reduce the network overhead to remote embedding server as well as D2H data transfer from onto GPU host. Differential Revision: D68187234

facebook-github-bot · 2025-01-17T17:50:57Z

This pull request was exported from Phabricator. Differential Revision: D68187234

Summary: X-link: facebookresearch/FBGEMM#670 Seq INT4 -> INT4 STBE look up is supported in the diff stack: https://www.internalfb.com/diff/D61305978 . This diff supports: 1. The dequanitzation of INT4 -> INT4 STBE look up onto Cuda for all float types 2. Extends the dequantization of INT4 > INT4 STBE look up onto CPU for BF16 The main gap is to handle the dequant for the case when scale bias for INT4 quantized tensor is in the front. While for CPU, just need to add the dequantization for BF16 based on dtype. This will enable us to reduce the network overhead to remote embedding server as well as D2H data transfer from onto GPU host. Differential Revision: D68187234

facebook-github-bot · 2025-01-29T07:06:47Z

This pull request was exported from Phabricator. Differential Revision: D68187234

Summary: X-link: facebookresearch/FBGEMM#670 Seq INT4 -> INT4 STBE look up is supported in the diff stack: https://www.internalfb.com/diff/D61305978 . This diff supports: 1. The dequanitzation of INT4 -> INT4 STBE look up onto Cuda for all float types 2. Extends the dequantization of INT4 > INT4 STBE look up onto CPU for BF16 The main gap is to handle the dequant for the case when scale bias for INT4 quantized tensor is in the front. While for CPU, just need to add the dequantization for BF16 based on dtype. This will enable us to reduce the network overhead to remote embedding server as well as D2H data transfer from onto GPU host. Differential Revision: D68187234

facebook-github-bot · 2025-01-29T07:23:42Z

This pull request was exported from Phabricator. Differential Revision: D68187234

Summary: X-link: facebookresearch/FBGEMM#670 Seq INT4 -> INT4 STBE look up is supported in the diff stack: https://www.internalfb.com/diff/D61305978 . This diff supports: 1. The dequanitzation of INT4 -> INT4 STBE look up onto Cuda for all float types 2. Extends the dequantization of INT4 > INT4 STBE look up onto CPU for BF16 The main gap is to handle the dequant for the case when scale bias for INT4 quantized tensor is in the front. While for CPU, just need to add the dequantization for BF16 based on dtype. This will enable us to reduce the network overhead to remote embedding server as well as D2H data transfer from onto GPU host. Differential Revision: D68187234

facebook-github-bot · 2025-01-30T00:11:16Z

This pull request was exported from Phabricator. Differential Revision: D68187234

Summary: X-link: facebookresearch/FBGEMM#670 Seq INT4 -> INT4 STBE look up is supported in the diff stack: https://www.internalfb.com/diff/D61305978 . This diff supports: 1. The dequanitzation of INT4 -> INT4 STBE look up onto Cuda for all float types 2. Extends the dequantization of INT4 > INT4 STBE look up onto CPU for BF16 The main gap is to handle the dequant for the case when scale bias for INT4 quantized tensor is in the front. While for CPU, just need to add the dequantization for BF16 based on dtype. This will enable us to reduce the network overhead to remote embedding server as well as D2H data transfer from onto GPU host. Differential Revision: D68187234

facebook-github-bot · 2025-01-30T00:50:48Z

This pull request was exported from Phabricator. Differential Revision: D68187234

facebook-github-bot · 2025-01-30T13:39:26Z

This pull request has been merged in 4c9da5d.

facebook-github-bot added the cla signed label Jan 17, 2025

facebook-github-bot added the fb-exported label Jan 17, 2025

faran928 force-pushed the export-D68187234 branch from 004b36b to 867b7f7 Compare January 17, 2025 01:50

faran928 force-pushed the export-D68187234 branch from 867b7f7 to cd85b52 Compare January 17, 2025 17:50

faran928 force-pushed the export-D68187234 branch from cd85b52 to 843bddb Compare January 29, 2025 07:05

faran928 force-pushed the export-D68187234 branch from 843bddb to 7b7f96d Compare January 29, 2025 07:22

faran928 force-pushed the export-D68187234 branch from 7b7f96d to 74e5a0f Compare January 30, 2025 00:10

faran928 force-pushed the export-D68187234 branch from 74e5a0f to 553548f Compare January 30, 2025 00:50

facebook-github-bot closed this in 4c9da5d Jan 30, 2025

facebook-github-bot added the Merged label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support INT4 Dequant onto GPU for Seq INT TBE look up #3584

Support INT4 Dequant onto GPU for Seq INT TBE look up #3584

faran928 commented Jan 17, 2025

facebook-github-bot commented Jan 17, 2025

netlify bot commented Jan 17, 2025 •

edited

Loading

facebook-github-bot commented Jan 17, 2025

facebook-github-bot commented Jan 17, 2025

facebook-github-bot commented Jan 29, 2025

facebook-github-bot commented Jan 29, 2025

facebook-github-bot commented Jan 30, 2025

facebook-github-bot commented Jan 30, 2025

facebook-github-bot commented Jan 30, 2025

Support INT4 Dequant onto GPU for Seq INT TBE look up #3584

Support INT4 Dequant onto GPU for Seq INT TBE look up #3584

Conversation

faran928 commented Jan 17, 2025

facebook-github-bot commented Jan 17, 2025

netlify bot commented Jan 17, 2025 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Jan 17, 2025

facebook-github-bot commented Jan 17, 2025

facebook-github-bot commented Jan 29, 2025

facebook-github-bot commented Jan 29, 2025

facebook-github-bot commented Jan 30, 2025

facebook-github-bot commented Jan 30, 2025

facebook-github-bot commented Jan 30, 2025

netlify bot commented Jan 17, 2025 •

edited

Loading