esm3 sequence logits #82

DVader96 · 2024-08-12T21:01:11Z

DVader96
Aug 12, 2024

Hi,
I am trying to use the esm3 sequence logits for a downstream analysis.
When I run forward on a single sequence, the shape of out.sequence_logits (see below for code) is 1 x input length x 64. The tokenizer implies the shape should be 33, but when I look into the code, indeed the sequence output is set to be dimension 64 (https://github.com/evolutionaryscale/esm/blob/0774600af03d724e8244d577c415e10617f018fe/esm/models/esm3.py#L160C9-L160C57).

Is it the case that only the first 33 tokens are meaningful (as would seem to be the case based on the the tokenizer)?
Am I missing something or is there a better way to get the sequence logits?
Thanks!

Code:

from huggingface_hub import login
from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig
login()
model: ESM3InferenceClient = ESM3.from_pretrained("esm3_sm_open_v1").to("cpu") # or "cuda"
from esm.tokenization.sequence_tokenizer import EsmSequenceTokenizer
tokenizer = EsmSequenceTokenizer()
prompt = "DQATSLRILNNGHAFNVEFDDSQDKAOO"
enc_prompt = tokenizer.encode(prompt)
input = torch.tensor(enc_prompt, dtype=torch.int64).unsqueeze(0)
out = model(sequence_tokens=input)
out.sequence_logits.shape # result: torch.Size([1, 30, 64])

roman-bushuiev · 2024-09-16T23:08:47Z

roman-bushuiev
Sep 16, 2024

It is answered in #86.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

esm3 sequence logits #82

{{title}}

Replies: 1 comment

{{title}}

Select a reply

esm3 sequence logits #82

DVader96 Aug 12, 2024

Replies: 1 comment

roman-bushuiev Sep 16, 2024

DVader96
Aug 12, 2024

roman-bushuiev
Sep 16, 2024