You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am learning the chronicles_prequel, and I find the last table in the chapter indicates the higher TFLOPS is achieved with Zero_Stage = 1. Trying with ZeRO_STAGE=0/1
Zero_stage=1 could reduce the memory cost, but how come it increases the performance with other parameter being the same?
Nodes
Size
ZS
DP
TP
PP
MBS
GBS
Mem
Sec/it
TFLOPs
Notes
48
181B
1
4
8
12
2
2048
37GB
120.29
134.02
02-21
48
181B
0
4
8
12
2
2048
72GB
137.34
113.02
02-21
The text was updated successfully, but these errors were encountered:
Description
I am learning the chronicles_prequel, and I find the last table in the chapter indicates the higher TFLOPS is achieved with Zero_Stage = 1.
Trying with ZeRO_STAGE=0/1
Zero_stage=1 could reduce the memory cost, but how come it increases the performance with other parameter being the same?
The text was updated successfully, but these errors were encountered: