Support skip_first_batches for XLA #2966

yitongh · 2024-07-29T12:21:52Z

What does this PR do?

At present, when using the resume_from_checkpoint feature in the Transformers Trainer, it results in an error because skip_first_batches does not support MpDeviceLoaderWrapper of XLA. This PR supports this feature.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@muellerzr

muellerzr

Thanks! Overall this looks fine, just one suggestion :)

muellerzr · 2024-07-29T16:28:58Z

src/accelerate/data_loader.py

@@ -1083,6 +1088,12 @@ def skip_first_batches(dataloader, num_batches=0):
    """
    Creates a `torch.utils.data.DataLoader` that will efficiently skip the first `num_batches`.
    """
+    is_xla_dataloader = False
+    if is_torch_xla_available() and isinstance(dataloader, MpDeviceLoaderWrapper):


At this point I believe we can use PartialState().distributed_type == DistributedType.XLA

OK, I change to AcceleratorState because this class has already been imported and it aligns with the usage of the prepare_data_loader method. I believe that the distributed_type of these two states is shared.

PartialState is safer and better for these types of situations.

Thank you for your correction. I have already changed it to PartialState. I'm not very familiar with these states :)

HuggingFaceDocBuilderDev · 2024-07-29T16:31:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

muellerzr

Thanks for enabling!

yitongh · 2024-08-07T11:20:40Z

hi, @muellerzr , can you merge this pr? The error in tests seems unrelated to this PR.

muellerzr · 2024-08-08T12:55:40Z

@yitongh yes indeed!

Fix skip_first_batches for XLA

1d2dc00

muellerzr reviewed Jul 29, 2024

View reviewed changes

yitongh mentioned this pull request Jul 30, 2024

Support save/load ckpt for XLA FSDP huggingface/transformers#32311

Merged

5 tasks

yitongh added 2 commits July 30, 2024 11:36

Use state to check XLA

5688e5e

Change to PartialState

f468964

muellerzr approved these changes Aug 1, 2024

View reviewed changes

muellerzr merged commit 79ca85c into huggingface:main Aug 8, 2024
23 of 25 checks passed

yitongh deleted the fix_xla_skip_data branch August 9, 2024 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support skip_first_batches for XLA #2966

Support skip_first_batches for XLA #2966

yitongh commented Jul 29, 2024

muellerzr left a comment

muellerzr Jul 29, 2024

yitongh Jul 30, 2024

muellerzr Jul 31, 2024

yitongh Aug 1, 2024

HuggingFaceDocBuilderDev commented Jul 29, 2024

muellerzr left a comment

yitongh commented Aug 7, 2024

muellerzr commented Aug 8, 2024

Support skip_first_batches for XLA #2966

Support skip_first_batches for XLA #2966

Conversation

yitongh commented Jul 29, 2024

What does this PR do?

Before submitting

Who can review?

muellerzr left a comment

Choose a reason for hiding this comment

muellerzr Jul 29, 2024

Choose a reason for hiding this comment

yitongh Jul 30, 2024

Choose a reason for hiding this comment

muellerzr Jul 31, 2024

Choose a reason for hiding this comment

yitongh Aug 1, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 29, 2024

muellerzr left a comment

Choose a reason for hiding this comment

yitongh commented Aug 7, 2024

muellerzr commented Aug 8, 2024