How does ZeRO DDP work with Tensor Parallelism? #2142
Unanswered
heya5
asked this question in
Community | Q&A
Replies: 1 comment
-
Yes, you are right! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I think TP has no memory redundancy for model data, so I'm confused about how ZeRO DDP works with TP.
In your gpt example ,
Does this mean that we use 4 GPUs, 2 of which form a group to do tensor parallelism, and then we do ZeRO DDP on those two groups?
Beta Was this translation helpful? Give feedback.
All reactions