Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiGPU support for DPR Training via DDP #619

Merged
merged 7 commits into from
Nov 12, 2020
Merged

Conversation

tholor
Copy link
Member

@tholor tholor commented Nov 9, 2020

In order to enable larger batch sizes for DPR training, we need multi GPU support.
Let's use DistributedDataParallel as it's the more performing and scalable option ...

  • gather tensors for loss with in-batch negatives
  • verify eval is only running on rank 0
  • adjust vocab size check for DDP
  • verify distribution of dataset into batches
  • infer/pass distributed_world_size in prediction head
  • fix nonzero() deprecation warning

Future work

  • refactor all_gather_list to torch's standard all_gather()

@tholor tholor changed the title WIP Add MultiGPU support for DPR Training via DDP Add MultiGPU support for DPR Training via DDP Nov 10, 2020
Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok lets merge this PR now and improve it later on.

examples/dpr_encoder.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants