[megatron gpt checkpoint conversion] causal mask requires pos_embed dimension #13735

stas00 · 2021-09-24T21:28:29Z

this is a follow up to #13508 - where I tried to fix the wrong side of the bug :(, this one hopefully is the correct one.

causal mask uses the positional emb dimensions / seqlen and not n_emb (hidden_size) as it was originally coded and happened to work because the original meg-gpt2 model had the same n_emb and seqlen size.

I re-tested that the original megatron_lm_345m/release/mp_rank_00/model_optim_rng.pt still produces the same converted output.

@sgugger, @LysandreJik

…imension

sgugger

Let's hope this one is the right fix!

[megatron gpt checkpoint conversion] causal mask requires pos_embed d…

2c9a7b2

…imension

stas00 marked this pull request as ready for review September 24, 2021 22:01

sgugger approved these changes Sep 26, 2021

View reviewed changes

stas00 merged commit 400c5a1 into huggingface:master Sep 26, 2021

stas00 deleted the megatron_convert_take2 branch September 26, 2021 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron gpt checkpoint conversion] causal mask requires pos_embed dimension #13735

[megatron gpt checkpoint conversion] causal mask requires pos_embed dimension #13735

stas00 commented Sep 24, 2021 •

edited

Loading

sgugger left a comment

[megatron gpt checkpoint conversion] causal mask requires pos_embed dimension #13735

[megatron gpt checkpoint conversion] causal mask requires pos_embed dimension #13735

Conversation

stas00 commented Sep 24, 2021 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

stas00 commented Sep 24, 2021 •

edited

Loading