Fix pre-norm weight conversion for nmt #1723

Pzzzzz5142 · 2024-06-04T08:27:52Z

For nmt model, if it uses pre-norm architecture, then it will have final layernorm. (Ref: fairseq encoder init, forward and fairseq decoder init, forward).

We don't need to modify the rest of the script since fairseq will use the original final_layer_norm before ffn despite they call it final_layer_norm. (Ref: https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/transformer_layer.py#L212)

nv-guomingz · 2024-06-05T07:03:55Z

Hi @Pzzzzz5142 thanks for you contributing, we'll evaluate your changes internal firstly.

nv-guomingz · 2024-06-06T11:47:36Z

hi @Pzzzzz5142 ， thanks for your contribution, this PR has been merged and will upstream to main branch next week,

Fix pre-norm weight conversion for nmt

1cc1443

nv-guomingz added triaged Issue has been triaged by maintainers Investigating labels Jun 5, 2024

byshiue assigned nv-guomingz Jun 6, 2024

nv-guomingz added the Merged label Jun 6, 2024

nv-guomingz closed this Jun 6, 2024

Pzzzzz5142 deleted the dev-pzzzzz-fix-pre-norm branch June 6, 2024 11:52

kaiyux mentioned this pull request Jun 11, 2024

Update TensorRT-LLM #1763

Merged

kaiyux mentioned this pull request Jul 17, 2024

TensorRT-LLM v0.11 Update #1969

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pre-norm weight conversion for nmt #1723

Fix pre-norm weight conversion for nmt #1723

Pzzzzz5142 commented Jun 4, 2024 •

edited

Loading

nv-guomingz commented Jun 5, 2024

nv-guomingz commented Jun 6, 2024

Fix pre-norm weight conversion for nmt #1723

Fix pre-norm weight conversion for nmt #1723

Conversation

Pzzzzz5142 commented Jun 4, 2024 • edited Loading

nv-guomingz commented Jun 5, 2024

nv-guomingz commented Jun 6, 2024

Pzzzzz5142 commented Jun 4, 2024 •

edited

Loading