-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ProtT5 model training loss function #164
Comments
I would always recommend to use mean as you want the loss to be independent of the number of tokens you have in your batch |
Thank you for the insights, @mheinzinger . If I want to compute the perplexity of a protein sequence from the ProtT5 model, how do I do it? Since it is using MLM objective, I think the following will not work. Could you please share some insights? inputs = tokenizer(sequence, return_tensors="pt")["input_ids"] Get outputsoutputs = model(**inputs, labels=labels) Compute perplexityperplexity = torch.exp(loss).item() |
Indeed, the above won't work. Especially, as perplexity is ill-defined for models trained via MLM. |
Thank you so much for the tips, @mheinzinger . I found how the "masks one token at a time" from the link you shared. Will try that. |
Just in case; we now have an example for continuing pre-training. |
Dear Authors,
Thank you very much for the great work. I got a question and would appreciate your insights.
For the ProtT5 training, since it will predict the full sequence, not just the masked tokens. What is the loss function for the ProtT5 training, is it Torch crossentropyloss with reduction=SUM, or it is Torch crossentropyloss with reduction=MEAN?
The text was updated successfully, but these errors were encountered: