v0.17.0: PyTorch 2.0 support, Process Control Enhancements, TPU pod support and FP8 mixed precision training
PyTorch 2.0 support
This release fully supports the upcoming PyTorch 2.0 release. You can choose to use torch.compile
or not and then customize the options in accelerate.config
or via a TorchDynamoPlugin
.
- update support for torch dynamo compile by @pacman100 in #1150
Process Control Enhancements
This release adds a new PartialState
, which contains most of the capabilities of the AcceleratorState
however it is designed to be used by the user to assist in any process control mechanisms around it. With this, users also now do not need to have if accelerator.state.is_main_process
when utilizing classes such as the Tracking
API, as these now will automatically use only the main process for their work by default.
- Refactor process executors to be in AcceleratorState by @muellerzr in #1039
TPU Pod Support (Experimental)
Launching from TPU pods is now supported, please see this issue for more information
- Introduce TPU Pod launching to
accelerate launch
by @muellerzr in #1049
FP8 mixed precision training (Experimental)
This release adds experimental support for FP8 mixed precision training, which requires the transformer-engine library as well as a Hopper GPU (or higher).
What's new?
- v0.17.0.dev0 by @sgugger (direct commit on main)
- Deepspeed param check by @dhar174 in #1015
- enabling
mps
device by default and removing related config by @pacman100 in #1030 - fix: links to gradient synchronization by @prassanna-ravishankar in #1035
- do not scale gradient in bf16 mode by @kashif in #1036
- Pass keywords arguments of backward function deeper to DeepSpeed by @DistinctVision in #1037
- Add daily slack notifier for nightlies by @muellerzr in #1042
- Make sure direct parameters are properly set on device by @sgugger in #1043
- Add
cpu_offload_with_hook
by @sgugger in #1045 - Update quality tools to 2023 by @sgugger in #1046
- Load tensors directly on device by @sgugger in #1028
- Fix cpu_offload_with_hook code snippet by @pcuenca in #1047
- Use create_task by @muellerzr in #1052
- Fix args by adding in the defaults by @muellerzr in #1053
- deepspeed
hidden_size
auto value default fixes by @pacman100 in #1060 - Introduce PartialState by @muellerzr in #1055
- Flag for deprecation by @muellerzr in #1061
- Try with this by @muellerzr in #1062
- Update integrations by @muellerzr in #1063
- Swap utils over to use PartialState by @muellerzr in #1065
- update fsdp docs and removing deepspeed version pinning by @pacman100 in #1059
- Fix/implement process-execution decorators on the Accelerator by @muellerzr in #1070
- Refactor state and make
PartialState
first class citizen by @muellerzr in #1071 - Add error if passed --config_file does not exist by @muellerzr in #1074
- SageMaker image_uri is now optional by @ in #1077
- Allow custom SageMaker Estimator arguments by @ in #1080
- Fix tpu_cluster arg by @muellerzr in #1081
- Update complete_cv_example.py by @fcossio in #1082
- Added SageMaker local mode config section by @ in #1084
- Fix config by @muellerzr in #1090
- adds missing "lfs" in pull by @CSchoel in #1091
- add multi_cpu support to reduce by @alex-hh in #1094
- Update README.md by @BM-K in #1100
- Tracker rewrite and lazy process checker by @muellerzr in #1079
- Update performance.mdx by @fcossio in #1107
- Attempt to unwrap tracker. by @pcuenca in #1109
- TensorBoardTracker: wrong arg def by @stas00 in #1111
- Actually raise if exception by @muellerzr in #1124
- Add test for ops and fix reduce by @muellerzr in #1122
- Deep merge SageMaker
additional_args
, allowing more flexible configuration andenv
variable support by @dbpprt in #1113 - Move dynamo.optimize to the end of model preparation by @ymwangg in #1128
- Refactor
launch
for greater extensibility by @Yard1 in #1123 - [Big model loading] Correct GPU only loading by @patrickvonplaten in #1121
- Add tee and role to launch by @muellerzr in #1132
- Expand warning and grab all GPUs available by default by @muellerzr in #1134
- Fix multinode with GPU ids when each node has 1 by @muellerzr in #1127
- deepspeed dataloader prepare fix by @pacman100 in #1126
- fix ds dist init kwargs issue by @pacman100 in #1138
- fix lr scheduler issue by @pacman100 in #1140
- fsdp bf16 enable autocast by @pacman100 in #1125
- Fix notebook_launcher by @muellerzr in #1141
- fix partial state by @pacman100 in #1144
- FSDP enhancements and fixes by @pacman100 in #1145
- Fixed typos in notebook by @SamuelLarkin in #1146
- Include a note in the gradient synchronization docs on "what can go wrong" and show the timings by @muellerzr in #1153
- [Safetensors] Relax missing metadata constraint by @patrickvonplaten in #1151
- Solve arrow keys being environment dependant for accelerate config by @p1atdev (direct commit on main)
- Load custom state to cpu by @Guangxuan-Xiao in #1156
- 📝 add a couple more trackers to the docs by @nateraw in #1158
- Let GradientState know active dataloaders and reset the remainder by @muellerzr in #1162
- Attempt to fix import error when PyTorch is build without
torch.distributed
module by @mfuntowicz in #1108 - [
Accelerator
] Fix issue with 8bit models by @younesbelkada in #1155 - Document skip_first_batches in the checkpoint usage guides by @muellerzr in #1164
- Fix what files get deleted through
total_limit
by @muellerzr in #1165 - Remove outdated command directions and use in tests by @muellerzr in #1166
Significant community contributions
The following contributors have made significant changes to the library over the last release: