It says TP=1 even though I set TP=2 #2668
Unanswered
yurishin929
asked this question in
Community | Q&A
Replies: 2 comments
-
This is my run_gemini.sh
I'm using colossalai 0.2.3, NVIDIA TITAN XP, cuda 11.6 |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @yurishin929 The output from INFO is about the distributed communication environment. It does not affect your parallelism if you are not using models in Titans. Briefly, it's normal and you do not need to care about that. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I ran the following code with TPDEGREE:-2 in
run_gemini.sh,
and I can see the log
+ export TPDEGREE=2
but
INFO colossalai - colossalai - INFO: Distributed environment is initialized, **data parallel size: 4, pipeline parallel size: 1, tensor parallel size: 1**
too.do I need to set something more? or does it work with TP even though it says
data parallel size:4 / tensor parallel size: 1
? Thank you.https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/gemini/run_gemini.sh
++
when I set TP=2, there's ring-allreduce kernel on Tensorboard, but TP=1, there's only AllGather, ReduceScatter. no ring-allreduce.
Beta Was this translation helpful? Give feedback.
All reactions