-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[megatron_gpt2] checkpoint v3 #13508
Conversation
@@ -80,6 +80,20 @@ def convert_megatron_checkpoint(args, input_state_dict, config): | |||
# The converted output model. | |||
output_state_dict = {} | |||
|
|||
nargs = input_state_dict["args"] | |||
# from pprint import pprint | |||
# pprint(vars(nargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the debug prints are essential for quick troubleshooting when converting, so they are a feature and not a left-over, so I hope it's ok if they remain there. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the logger then and set it at a debug level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know which parts need to be debugged ahead of time, hence everything is prepared for the developer in the various places. and a single on/off logger usually is too much information.
The Deepspeed framework has a ton of these and they sometimes have these commented out, but most of the time they use a helper debug print util where they have a flag force
which is False
by default, so the code is normal but it doesn't do anything unless the developer modifies the flag to True
.
print_rank0("some debug/trace", force=False)
Would be happy to adopt this approach at transformers
if it resonates better. But then we need to create such a convenience function (in another PR perhaps).
Also, this is similar to:
# keep for quick debug
# print(" ".join([f"\nPYTHONPATH={self.src_dir_str}"] + cmd)); die
execute_subprocess_async(cmd, env=self.get_env())
I have in various extended tests, and I use it quite often, but only when I need to. And I want the process to abort at that point, hence the die
trigger (from Perl where it's actually a built-in function but works for python as well as it's just a random undefined symbol there). The commented out code saves a ton of time and I use it quite often when debugging these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will get back to this subject matter and revisit the code once we agree on what the best approach is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing!
The BigScience generates Megatron-Deepspeed checkpoints en masse, so we need a better v3 checkpoint support for megatron_gpt2 (also working on a direct Megatron-Deepspeed to HF-style script).
This PR:
gelu
fromgelu_new
- which is what the current megatron-lm useshttps://github.com/NVIDIA/Megatron-LM/blob/3860e995269df61d234ed910d4756e104e1ab844/megatron/model/language_model.py#L140-L141
The previous script happened to work because
max_position_embeddings
happened to be equal tohidden_size
@LysandreJik, @sgugger