Skip to content

PyTorch 2.3.1 Release, bug fix release

Compare
Choose a tag to compare
@atalman atalman released this 05 Jun 19:16
· 11028 commits to main since this release
63d5e92

This release is meant to fix the following issues (regressions / silent correctness):

Torch.compile:

  • Remove runtime dependency on JAX/XLA, when importing torch.__dynamo (#124634)
  • Hide Plan failed with a cudnnException warning (#125790)
  • Fix CUDA memory leak (#124238) (#120756)

Distributed:

  • Fix format_utils executable, which was causing it to run as a no-op (#123407)
  • Fix regression with device_mesh in 2.3.0 during initialization causing memory spikes (#124780)
  • Fix crash of FSDP + DTensor with ShardingStrategy.SHARD_GRAD_OP (#123617)
  • Fix failure with distributed checkpointing + FSDP if at least 1 forward/backward pass has not been run. (#121544) (#127069)
  • Fix error with distributed checkpointing + FSDP, and with use_orig_params = False and activation checkpointing (#124698) (#126935)
  • Fix set_model_state_dict errors on compiled module with non-persistent buffer with distributed checkpointing (#125336) (#125337)

MPS:

  • Fix data corruption when coping large (>4GiB) tensors (#124635)
  • Fix Tensor.abs() for complex (#125662)

Packaging:

Other:

  • Fix DeepSpeed transformer extension build on ROCm (#121030)
  • Fix kernel crash on tensor.dtype.to_complex() after ~100 calls in ipython kernel (#125154)

Release tracker #125425 contains all relevant pull requests related to this release as well as links to related issues.