Tensor Parallelism Support for AffineQuantizedTensor #988

jerryzh168 · 2024-10-01T21:01:39Z

Recently we landed #939 to support tensor parallelism for int8 weight only quantization, another example: #785

now we can support tensor parallelism for other types of quantization as well.

float8 weight only @jainapurva - Add Float8 support for AQT tensor parallel #1003
float8 dynamic activation @jainapurva - Add AQT tensor parallel for float8_dynamic_quant #1078
uintx weight only @melvinebenezer
int4 weight only quant - @jerryzh168 Add tensor parallelism support for int4_weight_only quantization #1120
int8 dynamic act + int8 weight - not tested, add your name here
fpx - add your name here

Steps

1. Create test

Since we don't have many tests today, we can optimize for readability for now, so we can copy paste the test cases to a https://github.com/pytorch/ao/blob/main/test/dtypes/test_affine_quantized_tensor_parallel.py instead of inheriting from these test cases

For new tests you can follow

ao/test/dtypes/test_affine_quantized_tensor_parallel.py

Lines 133 to 153 in c87cc9b

    
           class TestFloat8dqTensorAffineQuantizedTensorParallel(TestFloat8dqAffineQuantizedTensorParallel): 
        
               QUANT_METHOD_FN = staticmethod(float8_dynamic_activation_float8_weight) 
        
               QUANT_METHOD_KWARGS = {"granularity": PerTensor()} 
        
               COMMON_DTYPES = [torch.bfloat16, torch.float16, torch.float32] 
        
               @common_utils.parametrize("dtype", COMMON_DTYPES) 
        
               @with_comms 
        
               @unittest.skipIf(not torch.cuda.is_available(), "Need CUDA available") 
        
               def test_tp(self, dtype): 
        
                   return self._test_tp(dtype) 
        
           class TestFloat8dqRowAffineQuantizedTensorParallel(TestFloat8dqAffineQuantizedTensorParallel): 
        
               QUANT_METHOD_FN = staticmethod(float8_dynamic_activation_float8_weight) 
        
               QUANT_METHOD_KWARGS = {"granularity": PerRow()} 
        
               COMMON_DTYPES = [torch.bfloat16] 
        
               @common_utils.parametrize("dtype", COMMON_DTYPES) 
        
               @with_comms 
        
               @unittest.skipIf(not torch.cuda.is_available(), "Need CUDA available") 
        
               def test_tp(self, dtype): 
        
                   return self._test_tp(dtype)

to create your own test case

2. Run the test

python test/dtypes/test_affine_quantized_tensor_parallel.py

3. Add support for missing ops until test passes

We'd expect people to add some slicing ops etc. to the corresponding TensorImpl tensor subclass

melvinebenezer · 2024-10-07T15:34:21Z

@jerryzh168 should'nt this Issue be kept open until other points in the description are fixed ?
I have a WIP #1026

p4arth · 2024-10-17T20:16:40Z

Hi @jerryzh168, I had two questions

What does fpx mean here exactly?
For int8 dynamic act + int8 weight only writing tests is left?

jerryzh168 · 2024-10-17T20:49:05Z

@p4arth

fpx is referring to this API:

ao/torchao/quantization/quant_api.py

Line 97 in 7aaf0ff

"fpx_weight_only",

, docs: https://github.com/pytorch/ao/tree/main/torchao/dtypes/floatx
For int8 dynamic act + int8 weight only writing tests is left? - I think so, but will need to make sure the test passes, tests will be similar to

ao/test/dtypes/test_affine_quantized_tensor_parallel.py

Line 133 in 7aaf0ff

class TestFloat8dqTensorAffineQuantizedTensorParallel(TestFloat8dqAffineQuantizedTensorParallel):

Summary: Following pytorch#988 we added TP support for int4_weight_only quantization in torchao that's using TensorCoreTiledLayout Addresses one work item in pytorch#988 Also clarified docs based on pytorch#386 Also restructructured the tests in test/dtypes/test_affine_quantized_tensor_parallel.py to not depend on torchao/utils.py to reduce the jumps people have to do to understand what is tested Test Plan: python test/dtypes/test_affine_quantized_tensor_parallel.py Reviewers: Subscribers: Tasks: Tags:

* Add tensor parallelism support for int4_weight_only quantization Summary: Following #988 we added TP support for int4_weight_only quantization in torchao that's using TensorCoreTiledLayout Addresses one work item in #988 Also clarified docs based on #386 Also restructructured the tests in test/dtypes/test_affine_quantized_tensor_parallel.py to not depend on torchao/utils.py to reduce the jumps people have to do to understand what is tested Test Plan: python test/dtypes/test_affine_quantized_tensor_parallel.py Reviewers: Subscribers: Tasks: Tags: * typo

jerryzh168 added the good first issue Good for newcomers label Oct 1, 2024

jerryzh168 assigned jainapurva Oct 2, 2024

jainapurva linked a pull request Oct 4, 2024 that will close this issue

Add Float8 support for AQT tensor parallel #1003

Merged

jainapurva closed this as completed in #1003 Oct 4, 2024

melvinebenezer mentioned this issue Oct 7, 2024

Uintx ops - Slice etc... #1026

Draft

4 tasks

jainapurva reopened this Oct 7, 2024

jerryzh168 mentioned this issue Oct 18, 2024

Add tensor parallelism support for int4_weight_only quantization #1120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor Parallelism Support for AffineQuantizedTensor #988

Tensor Parallelism Support for AffineQuantizedTensor #988

jerryzh168 commented Oct 1, 2024 •

edited

Loading

melvinebenezer commented Oct 7, 2024

p4arth commented Oct 17, 2024

jerryzh168 commented Oct 17, 2024

Tensor Parallelism Support for AffineQuantizedTensor #988

Tensor Parallelism Support for AffineQuantizedTensor #988

Comments

jerryzh168 commented Oct 1, 2024 • edited Loading

Steps

1. Create test

2. Run the test

3. Add support for missing ops until test passes

melvinebenezer commented Oct 7, 2024

p4arth commented Oct 17, 2024

jerryzh168 commented Oct 17, 2024

jerryzh168 commented Oct 1, 2024 •

edited

Loading