Fix fp8-all-gather buck errors #912

y-sq · 2024-09-20T22:17:33Z

Differential Revision: D63048850

pytorch-bot · 2024-09-20T22:17:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/912

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 771675a with merge base 44cdd79 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-09-20T22:18:03Z

This pull request was exported from Phabricator. Differential Revision: D63048850

facebook-github-bot · 2024-09-25T07:07:23Z

This pull request was exported from Phabricator. Differential Revision: D63048850

Summary: Pull Request resolved: pytorch#912 Differential Revision: D63048850

facebook-github-bot · 2024-09-25T07:39:00Z

This pull request was exported from Phabricator. Differential Revision: D63048850

Summary: Pull Request resolved: pytorch#912 Differential Revision: D63048850

y-sq · 2024-09-25T18:31:42Z

The test failures are in int8, should be unrelated to this pr.

=========================== short test summary info ============================
  FAILED test/prototype/test_quantized_training.py::TestQuantizedTraining::test_int8_mixed_precision_training_compile_False_config0 - AssertionError: (0, 0.05118261338021933)
  assert 0.05118261338021933 < 0.003
  
  To execute this test, run the following from the base repo dir:
      python test/prototype/test_quantized_training.py TestQuantizedTraining.test_int8_mixed_precision_training_compile_False_config0
  
  This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
  FAILED test/prototype/test_quantized_training.py::TestQuantizedTraining::test_int8_mixed_precision_training_compile_False_config2 - AssertionError: (0, 0.027038948187457792)
  assert 0.027038948187457792 < 0.003
  
  To execute this test, run the following from the base repo dir:
      python test/prototype/test_quantized_training.py TestQuantizedTraining.test_int8_mixed_precision_training_compile_False_config2
  
  This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
  FAILED test/prototype/test_quantized_training.py::TestQuantizedTraining::test_int8_mixed_precision_training_compile_False_config3 - AssertionError: (2, 0.08557230322896399)
  assert 0.08557230322896399 < 0.003
  
  Traceback (most recent call last):
  To execute this test, run the following from the base repo dir:
    File "/home/ec2-user/actions-runner/_work/ao/ao/test-infra/.github/scripts/run_with_env_secrets.py", line 102, in <module>
      main()
      python test/prototype/test_quantized_training.py TestQuantizedTraining.test_int8_mixed_precision_training_compile_False_config3
  
  This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
  FAILED test/prototype/test_quantized_training.py::TestQuantizedTraining::test_int8_mixed_precision_training_compile_True_config0 - AssertionError: (0, 0.005407431195376575)
  assert 0.005407431195376575 < 0.003
  
  To execute this test, run the following from the base repo dir:
      python test/prototype/test_quantized_training.py TestQuantizedTraining.test_int8_mixed_precision_training_compile_True_config0
  
  This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
  FAILED test/prototype/test_quantized_training.py::TestQuantizedTraining::test_int8_mixed_precision_training_compile_True_config2 - AssertionError: (0, 0.038073588697231514)
  assert 0.038073588697231514 < 0.003
  
  To execute this test, run the following from the base repo dir:
      python test/prototype/test_quantized_training.py TestQuantizedTraining.test_int8_mixed_precision_training_compile_True_config2
  
  This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
  FAILED test/prototype/test_quantized_training.py::TestQuantizedTraining::test_int8_mixed_precision_training_compile_True_config3 - AssertionError: (0, 0.08123108139198207)
  assert 0.08123108139198207 < 0.003
  
  To execute this test, run the following from the base repo dir:
      python test/prototype/test_quantized_training.py TestQuantizedTraining.test_int8_mixed_precision_training_compile_True_config3
  
  This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
  ==== 6 failed, 1763 passed, 379 skipped, 62 warnings in 2256.49s (0:37:36) =====

facebook-github-bot · 2024-09-25T18:45:48Z

This pull request was exported from Phabricator. Differential Revision: D63048850

Summary: Pull Request resolved: pytorch#912 Reviewed By: vkuzo Differential Revision: D63048850

vkuzo · 2024-09-25T18:50:14Z

test/float8/test_fsdp2/test_fsdp2.py

@@ -18,7 +18,7 @@
 from torchao.float8.config import CastConfig, Float8LinearConfig, ScalingType
 from torchao.float8.float8_linear_utils import convert_to_float8_training
 from torchao.float8.fsdp_utils import WeightWithDynamicFloat8CastTensor
-from fsdp2_common import check_parity_bf16_mp, check_parity_no_mp
+from torchao.testing.float8_fsdp2_utils.float8 import check_parity_bf16_mp, check_parity_no_mp


just curious, why is this torchao.testing.float8_fsdp2_utils.float8 instead of torchao.testing.float8_fsdp2_utils?

sorry.. it was a typo. I just updated the pr again. It will be "torchao.testing.float8.fsdp2_utils"

Summary: Pull Request resolved: pytorch#912 Reviewed By: vkuzo Differential Revision: D63048850

facebook-github-bot · 2024-09-25T18:53:07Z

This pull request was exported from Phabricator. Differential Revision: D63048850

Summary: Pull Request resolved: pytorch#912 Reviewed By: vkuzo Differential Revision: D63048850

facebook-github-bot · 2024-09-25T19:23:10Z

This pull request was exported from Phabricator. Differential Revision: D63048850

Differential Revision: D63048850 Pull Request resolved: pytorch#912

…th torch.compile (#904) * [float8] improve eager numerics for dynamic scales * leave torch.linalg.vector_norm for another PR Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * cuda Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * remove _data and investigate Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * remove _data comment Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * upcast to float32 is enough Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * explain why float32 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * _data parity Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * handle sm8.9 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix transformer unit test Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * print if error Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add tutorial for trainable tensor subclass (#908) Summary: The new tutorial provides an example of how to implement a trainable tensor subclass that wraps quantized data. This extends the existing `MyDTypeTensor` with a few necessary steps to ensure proper gradient updates, namely: 1. Define a differentiable constructor 2. Define backward pass for ops of interest (e.g. torch.nn.functional.linear) 3. Handle special ops used by the optimizer (e.g. aten.add, aten.add_) Test Plan: python tutorials/developer_api_guide/my_trainable_tensor_subclass.py * Introducing 1-bit quantization for Llama in torchchat (#910) Differential Revision: D63052325 Pull Request resolved: #911 * Rename Floating point to fp8 (#909) * [float8] fix typo in bitwise_identical unit test (#918) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Adding example for quantized tensor + tensor parallelism (#785) * [WIP] Adding example for quantized tensor + tensor parallelism Summary: This PR adds an example of how quantized tensor subclass can work with DTensor: https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md End goal is to rewrite https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/llama2.py with normal llama2 implementation and show case with DTensor + AffineQuantizedTensor + torch.compile we can get on par performance with the custom tensor parallel implementation Test Plan: torchrun --standalone --nnodes=1 --nproc-per-node=4 tutorials/developer_api_guide/tensor_parallel.py Reviewers: Subscribers: Tasks: Tags: * tensor parallel file * Use DTensor.from instead of distribute_tensor * implementing aten.slice.Tensor (WIP) * working * some shape fix and use more quant primitive ops * Add rowwise test * make rowwise sharding work * compile still not working yet * fake tensor didn't pick up shape changes from transpose * backend='eager' * change transpose to non-inplace op * add error message * works now with torch nightly * remove print * ruff * Clean up * Fix device id --------- Co-authored-by: Ke Wen <[email protected]> * rename cuda mode -> gpu mode (#925) * Add workaround to recover the perf for quantized vit in torch.compile (#926) Add temporary workaround to recover the perf for quantized vit under torch.compile Summary: Recently we found a perf drop in quantized vit due to #898 (comment) This PR add a temp fix until we figure out the longer term fix. I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that Test Plan: python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags: * clean up device checks in float8 unit test files (#923) Summary: While working on rowwise scaling I noticed that some of the CUDA device capability checks we had in the test files did not make sense, cleaning this up. Test Plan: tests pass on my H100 CI, it should skip less tests now since CI only has CUDA capability 8, 9 Reviewers: Subscribers: Tasks: Tags: * [low-bit optim] Change 8-bit and FP8 optim block size from 2048 to 256 to match new bnb v0.44 (#927) * Float8 autoquant weight only (#866) * Fix failing FP6 benchmark (#931) * Remove two if statements in fp8 padding (#935) Reviewed By: vkuzo Differential Revision: D63051205 Pull Request resolved: #935 Approved by: https://github.com/vkuzo * [Distributed] Improve sharding example (#937) * [Distributed] Improve sharding example * Add comment * Add composable QAT quantizer (#938) Summary: This is a utility for users who wish to apply multiple QAT quantizers to their models. In the near future, we expect to add an embedding QAT quantizer that composes with the existing linear QAT quantizers. Test Plan: python test/quantization/test_qat.py -k test_composable_qat_quantizer * resolve conflict with latest main Differential Revision: D63048850 Pull Request resolved: #912 * Add torchchat quantizer Differential Revision: D62394341 Pull Request resolved: #897 * Add compile tests to test suite (#906) * Add compile tests to test suite Summary: This is a follow up PR addressing #839 (comment) We can add more compiler related tests in the future. Next * refactor a bit to use quantize_ API directly * use the test suite in existing API tests Test Plan: python torchao/testing/utils.py Reviewers: Subscribers: Tasks: Tags: * rename * add result check * Fix up CMakeLists and reorganize some code locations Differential Revision: D62711903 Pull Request resolved: #948 * [float8] all-reduce amax on dp mesh instead of global pg (#933) * [float8] all-reduce amax on dp mesh instead of global pg Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * liner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * improve comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * move hp tensor inside if Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * int8 dynamic quant + bsr support (#821) This PR, adds in int8 dynamicquant + bsr support. Changes: * Use i8i8 -> bf16 matmul to maintain accuracy * Added a block sparse layout type to AffineQuantizedTensor + check/impl. * Cleaned up benchmark.py script and add a single line `benchmark.sh` file for acceleration numbers * Updated eval.py and added a single line `evaluate.sh` file for accuracy numbers * Lots of lint formatting and README updates * torch.compile now working and is correct * fixing some issues with our support for 70/405B models (#941) Summary: download and convert scripts needed to be updated alongside model.py config files Test Plan: python generate.py --checkpoint_path ../../../checkpoints/meta-llama/Meta-Llama-3.1-70B/model.pth Reviewers: Subscribers: Tasks: Tags: * Update INT8 mixed-precision training test to be less flaky (#950) * Add executorch parallel Differential Revision: D62711909 Pull Request resolved: #953 * test CI Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * better comment on why upcasting Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * control seed Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * move unit test to test_compile Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix typo Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * float64 upcasting after allreduce Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * use LinearMMConfig Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: --------- Co-authored-by: andrewor14 <[email protected]> Co-authored-by: Vaishnavi Gupta <[email protected]> Co-authored-by: Apurva Jain <[email protected]> Co-authored-by: Jerry Zhang <[email protected]> Co-authored-by: Ke Wen <[email protected]> Co-authored-by: Mark Saroufim <[email protected]> Co-authored-by: Vasiliy Kuznetsov <[email protected]> Co-authored-by: Thien Tran <[email protected]> Co-authored-by: Tobias van der Werff <[email protected]> Co-authored-by: Shuqi Yang <[email protected]> Co-authored-by: Scott Roy <[email protected]> Co-authored-by: Jesse Cai <[email protected]> Co-authored-by: HDCharles <[email protected]>

Differential Revision: D63048850 Pull Request resolved: pytorch#912

…th torch.compile (pytorch#904) * [float8] improve eager numerics for dynamic scales * leave torch.linalg.vector_norm for another PR Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * cuda Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * remove _data and investigate Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * remove _data comment Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * upcast to float32 is enough Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * explain why float32 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * _data parity Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * handle sm8.9 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix transformer unit test Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * print if error Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add tutorial for trainable tensor subclass (pytorch#908) Summary: The new tutorial provides an example of how to implement a trainable tensor subclass that wraps quantized data. This extends the existing `MyDTypeTensor` with a few necessary steps to ensure proper gradient updates, namely: 1. Define a differentiable constructor 2. Define backward pass for ops of interest (e.g. torch.nn.functional.linear) 3. Handle special ops used by the optimizer (e.g. aten.add, aten.add_) Test Plan: python tutorials/developer_api_guide/my_trainable_tensor_subclass.py * Introducing 1-bit quantization for Llama in torchchat (pytorch#910) Differential Revision: D63052325 Pull Request resolved: pytorch#911 * Rename Floating point to fp8 (pytorch#909) * [float8] fix typo in bitwise_identical unit test (pytorch#918) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Adding example for quantized tensor + tensor parallelism (pytorch#785) * [WIP] Adding example for quantized tensor + tensor parallelism Summary: This PR adds an example of how quantized tensor subclass can work with DTensor: https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md End goal is to rewrite https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/llama2.py with normal llama2 implementation and show case with DTensor + AffineQuantizedTensor + torch.compile we can get on par performance with the custom tensor parallel implementation Test Plan: torchrun --standalone --nnodes=1 --nproc-per-node=4 tutorials/developer_api_guide/tensor_parallel.py Reviewers: Subscribers: Tasks: Tags: * tensor parallel file * Use DTensor.from instead of distribute_tensor * implementing aten.slice.Tensor (WIP) * working * some shape fix and use more quant primitive ops * Add rowwise test * make rowwise sharding work * compile still not working yet * fake tensor didn't pick up shape changes from transpose * backend='eager' * change transpose to non-inplace op * add error message * works now with torch nightly * remove print * ruff * Clean up * Fix device id --------- Co-authored-by: Ke Wen <[email protected]> * rename cuda mode -> gpu mode (pytorch#925) * Add workaround to recover the perf for quantized vit in torch.compile (pytorch#926) Add temporary workaround to recover the perf for quantized vit under torch.compile Summary: Recently we found a perf drop in quantized vit due to pytorch#898 (comment) This PR add a temp fix until we figure out the longer term fix. I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that Test Plan: python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags: * clean up device checks in float8 unit test files (pytorch#923) Summary: While working on rowwise scaling I noticed that some of the CUDA device capability checks we had in the test files did not make sense, cleaning this up. Test Plan: tests pass on my H100 CI, it should skip less tests now since CI only has CUDA capability 8, 9 Reviewers: Subscribers: Tasks: Tags: * [low-bit optim] Change 8-bit and FP8 optim block size from 2048 to 256 to match new bnb v0.44 (pytorch#927) * Float8 autoquant weight only (pytorch#866) * Fix failing FP6 benchmark (pytorch#931) * Remove two if statements in fp8 padding (pytorch#935) Reviewed By: vkuzo Differential Revision: D63051205 Pull Request resolved: pytorch#935 Approved by: https://github.com/vkuzo * [Distributed] Improve sharding example (pytorch#937) * [Distributed] Improve sharding example * Add comment * Add composable QAT quantizer (pytorch#938) Summary: This is a utility for users who wish to apply multiple QAT quantizers to their models. In the near future, we expect to add an embedding QAT quantizer that composes with the existing linear QAT quantizers. Test Plan: python test/quantization/test_qat.py -k test_composable_qat_quantizer * resolve conflict with latest main Differential Revision: D63048850 Pull Request resolved: pytorch#912 * Add torchchat quantizer Differential Revision: D62394341 Pull Request resolved: pytorch#897 * Add compile tests to test suite (pytorch#906) * Add compile tests to test suite Summary: This is a follow up PR addressing pytorch#839 (comment) We can add more compiler related tests in the future. Next * refactor a bit to use quantize_ API directly * use the test suite in existing API tests Test Plan: python torchao/testing/utils.py Reviewers: Subscribers: Tasks: Tags: * rename * add result check * Fix up CMakeLists and reorganize some code locations Differential Revision: D62711903 Pull Request resolved: pytorch#948 * [float8] all-reduce amax on dp mesh instead of global pg (pytorch#933) * [float8] all-reduce amax on dp mesh instead of global pg Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * liner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * improve comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * move hp tensor inside if Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * int8 dynamic quant + bsr support (pytorch#821) This PR, adds in int8 dynamicquant + bsr support. Changes: * Use i8i8 -> bf16 matmul to maintain accuracy * Added a block sparse layout type to AffineQuantizedTensor + check/impl. * Cleaned up benchmark.py script and add a single line `benchmark.sh` file for acceleration numbers * Updated eval.py and added a single line `evaluate.sh` file for accuracy numbers * Lots of lint formatting and README updates * torch.compile now working and is correct * fixing some issues with our support for 70/405B models (pytorch#941) Summary: download and convert scripts needed to be updated alongside model.py config files Test Plan: python generate.py --checkpoint_path ../../../checkpoints/meta-llama/Meta-Llama-3.1-70B/model.pth Reviewers: Subscribers: Tasks: Tags: * Update INT8 mixed-precision training test to be less flaky (pytorch#950) * Add executorch parallel Differential Revision: D62711909 Pull Request resolved: pytorch#953 * test CI Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * better comment on why upcasting Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * control seed Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * move unit test to test_compile Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix typo Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * float64 upcasting after allreduce Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * use LinearMMConfig Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: --------- Co-authored-by: andrewor14 <[email protected]> Co-authored-by: Vaishnavi Gupta <[email protected]> Co-authored-by: Apurva Jain <[email protected]> Co-authored-by: Jerry Zhang <[email protected]> Co-authored-by: Ke Wen <[email protected]> Co-authored-by: Mark Saroufim <[email protected]> Co-authored-by: Vasiliy Kuznetsov <[email protected]> Co-authored-by: Thien Tran <[email protected]> Co-authored-by: Tobias van der Werff <[email protected]> Co-authored-by: Shuqi Yang <[email protected]> Co-authored-by: Scott Roy <[email protected]> Co-authored-by: Jesse Cai <[email protected]> Co-authored-by: HDCharles <[email protected]>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 20, 2024

facebook-github-bot added the fb-exported label Sep 20, 2024

y-sq added a commit to y-sq/ao that referenced this pull request Sep 25, 2024

Fix fp8-all-gather buck errors (pytorch#912)

f4745b0

Summary: Pull Request resolved: pytorch#912 Differential Revision: D63048850

y-sq force-pushed the export-D63048850 branch from 09da3c7 to f4745b0 Compare September 25, 2024 07:07

y-sq added a commit to y-sq/ao that referenced this pull request Sep 25, 2024

Fix fp8-all-gather buck errors (pytorch#912)

1f482e3

Summary: Pull Request resolved: pytorch#912 Differential Revision: D63048850

y-sq force-pushed the export-D63048850 branch from f4745b0 to 1f482e3 Compare September 25, 2024 07:39

y-sq added a commit to y-sq/ao that referenced this pull request Sep 25, 2024

Fix fp8-all-gather buck errors (pytorch#912)

ff3d03c

Summary: Pull Request resolved: pytorch#912 Reviewed By: vkuzo Differential Revision: D63048850

y-sq force-pushed the export-D63048850 branch from 1f482e3 to ff3d03c Compare September 25, 2024 18:45

vkuzo reviewed Sep 25, 2024

View reviewed changes

y-sq added a commit to y-sq/ao that referenced this pull request Sep 25, 2024

Fix fp8-all-gather buck errors (pytorch#912)

cc3c8ea

Summary: Pull Request resolved: pytorch#912 Reviewed By: vkuzo Differential Revision: D63048850

y-sq force-pushed the export-D63048850 branch from ff3d03c to cc3c8ea Compare September 25, 2024 18:53

vkuzo approved these changes Sep 25, 2024

View reviewed changes

Fix fp8-all-gather buck errors (pytorch#912)

771675a

Summary: Pull Request resolved: pytorch#912 Reviewed By: vkuzo Differential Revision: D63048850

y-sq force-pushed the export-D63048850 branch from cc3c8ea to 771675a Compare September 25, 2024 19:23

facebook-github-bot merged commit b521c9b into pytorch:main Sep 25, 2024
19 checks passed

weifengpy added a commit to weifengpy/ao that referenced this pull request Sep 26, 2024

resolve conflict with latest main

a05a40f

Differential Revision: D63048850 Pull Request resolved: pytorch#912

melvinebenezer pushed a commit to melvinebenezer/ao that referenced this pull request Oct 3, 2024

Fix fp8-all-gather buck errors

7a82623

Differential Revision: D63048850 Pull Request resolved: pytorch#912

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fp8-all-gather buck errors #912

Fix fp8-all-gather buck errors #912

y-sq commented Sep 20, 2024

pytorch-bot bot commented Sep 20, 2024 •

edited

Loading

facebook-github-bot commented Sep 20, 2024

facebook-github-bot commented Sep 25, 2024

facebook-github-bot commented Sep 25, 2024

y-sq commented Sep 25, 2024

facebook-github-bot commented Sep 25, 2024

vkuzo Sep 25, 2024

y-sq Sep 25, 2024

facebook-github-bot commented Sep 25, 2024

facebook-github-bot commented Sep 25, 2024

Fix fp8-all-gather buck errors #912

Fix fp8-all-gather buck errors #912

Conversation

y-sq commented Sep 20, 2024

pytorch-bot bot commented Sep 20, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/912

✅ No Failures

facebook-github-bot commented Sep 20, 2024

facebook-github-bot commented Sep 25, 2024

facebook-github-bot commented Sep 25, 2024

y-sq commented Sep 25, 2024

facebook-github-bot commented Sep 25, 2024

vkuzo Sep 25, 2024

Choose a reason for hiding this comment

y-sq Sep 25, 2024

Choose a reason for hiding this comment

facebook-github-bot commented Sep 25, 2024

facebook-github-bot commented Sep 25, 2024

pytorch-bot bot commented Sep 20, 2024 •

edited

Loading