ProtT5 model training loss function #164

Alex2975 · 2024-12-11T16:52:01Z

Dear Authors,

Thank you very much for the great work. I got a question and would appreciate your insights.

For the ProtT5 training, since it will predict the full sequence, not just the masked tokens. What is the loss function for the ProtT5 training, is it Torch crossentropyloss with reduction=SUM, or it is Torch crossentropyloss with reduction=MEAN?

mheinzinger · 2024-12-12T12:53:27Z

I would always recommend to use mean as you want the loss to be independent of the number of tokens you have in your batch

Alex2975 · 2025-01-03T13:50:47Z

Thank you for the insights, @mheinzinger . If I want to compute the perplexity of a protein sequence from the ProtT5 model, how do I do it? Since it is using MLM objective, I think the following will not work. Could you please share some insights?

inputs = tokenizer(sequence, return_tensors="pt")["input_ids"]
labels = tokenizer(sequence, return_tensors="pt")["input_ids"]

Get outputs

outputs = model(**inputs, labels=labels)
loss = outputs.loss

Compute perplexity

perplexity = torch.exp(loss).item()
print(f"Perplexity: {perplexity}")

mheinzinger · 2025-01-06T16:43:12Z

Indeed, the above won't work. Especially, as perplexity is ill-defined for models trained via MLM.
I guess the best you can get is the pseudo perplexity where one masks one token at a time, reconstructs it, compute loss against groundtruth, and repeat for the full sequence before averaging & taking exponent.
We have some implementation for step-wise masking of ProtT5 here, maybe this helps.

Alex2975 · 2025-01-07T03:40:55Z

Thank you so much for the tips, @mheinzinger . I found how the "masks one token at a time" from the link you shared. Will try that.

mheinzinger · 2025-01-22T16:21:20Z

Just in case; we now have an example for continuing pre-training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProtT5 model training loss function #164

ProtT5 model training loss function #164

Alex2975 commented Dec 11, 2024

mheinzinger commented Dec 12, 2024

Alex2975 commented Jan 3, 2025 •

edited

Loading

mheinzinger commented Jan 6, 2025

Alex2975 commented Jan 7, 2025

mheinzinger commented Jan 22, 2025

ProtT5 model training loss function #164

ProtT5 model training loss function #164

Comments

Alex2975 commented Dec 11, 2024

mheinzinger commented Dec 12, 2024

Alex2975 commented Jan 3, 2025 • edited Loading

Get outputs

Compute perplexity

mheinzinger commented Jan 6, 2025

Alex2975 commented Jan 7, 2025

mheinzinger commented Jan 22, 2025

Alex2975 commented Jan 3, 2025 •

edited

Loading