Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pre-norm weight conversion for nmt #1723

Closed
wants to merge 1 commit into from

Conversation

Pzzzzz5142
Copy link
Contributor

@Pzzzzz5142 Pzzzzz5142 commented Jun 4, 2024

For nmt model, if it uses pre-norm architecture, then it will have final layernorm. (Ref: fairseq encoder init, forward and fairseq decoder init, forward).

We don't need to modify the rest of the script since fairseq will use the original final_layer_norm before ffn despite they call it final_layer_norm. (Ref: https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/transformer_layer.py#L212)

@nv-guomingz
Copy link
Collaborator

Hi @Pzzzzz5142 thanks for you contributing, we'll evaluate your changes internal firstly.

@nv-guomingz nv-guomingz added triaged Issue has been triaged by maintainers Investigating labels Jun 5, 2024
@nv-guomingz
Copy link
Collaborator

hi @Pzzzzz5142 , thanks for your contribution, this PR has been merged and will upstream to main branch next week,

@nv-guomingz nv-guomingz closed this Jun 6, 2024
@Pzzzzz5142 Pzzzzz5142 deleted the dev-pzzzzz-fix-pre-norm branch June 6, 2024 11:52
@kaiyux kaiyux mentioned this pull request Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigating Merged triaged Issue has been triaged by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants