Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the eos_token_id and bos_token_id #279

Open
leekum2018 opened this issue Apr 5, 2023 · 37 comments
Open

What are the eos_token_id and bos_token_id #279

leekum2018 opened this issue Apr 5, 2023 · 37 comments

Comments

@leekum2018
Copy link

leekum2018 commented Apr 5, 2023

In generate.py, the bos_token_id=1 and eos_token_id=2,
model.config.bos_token_id = 1
model.config.eos_token_id = 2

However, in finetune.py, the tokenizer is directly loaded from the official llama checkpoint, where bos_token_id=0 and eos_token_id=0.
How to understand this discrepancy? Thank you!

@archwolf118
Copy link

same question, if the fine-tune need same configuration?

@HillZhang1999
Copy link

Same question, I finetuned an alpaca-lora using the author's code, and found it will generate a <unk> instead of <eos> at the end of response, which will result in some problems.

@Qubitium
Copy link

Qubitium commented Apr 7, 2023

This is a huge issue. The https://huggingface.co/decapoda-research/llama-Xb-hf HF models have bad/incorrect token ids mappings for bos/eos vs the original META llama. Now that lots of people are using it to generate models, the end result is bad.

Transformers head now fixes this issue but broke backward compact. You can use use_fast=False to use old LlamaTokenizer which is old code. Head transformer default to LlamaTokenizerFast.

@Qubitium
Copy link

Qubitium commented Apr 7, 2023

Everyone needs to check out transformer head and use the latest export to hf script on the original facebook models and use that as bases for future training using transformer[head]. Everyone needs to stop using decapoda models which will cause more and more issues the more you use it's broken tokenizer mapping for training.

huggingface/transformers#22402

@Qubitium
Copy link

Qubitium commented Apr 7, 2023

For reference, the following is the token mapping generated by transformer[head] convert from llama weights:

{
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0,
  "transformers_version": "4.28.0.dev0"
}

If the model you downloaded/referencing has tokenizer that does not match the above, don't use it and just throw it away.

@leekum2018
Copy link
Author

@diegomontoya Thanks for your prompt reply, which addresses my confusion. And I have another question. according to finetune.py, each training sequence is appended with an EOS token during the preprocess. So I think the models trained on these data should tend to generate sentences ending with an [EOS]. However, I use the checkpoint provided in this repo to generate something, and I found the generated sentences end with [EOS] [BOS], instead of a single [EOS]. Is that normal?

@gururise
Copy link
Contributor

gururise commented Apr 8, 2023

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf
13B - https://huggingface.co/yahma/llama-13b-hf

@ehartford
Copy link

Everyone needs to check out transformer head and use the latest export to hf script on the original facebook models and use that as bases for future training using transformer[head]. Everyone needs to stop using decapoda models which will cause more and more issues the more you use it's broken tokenizer mapping for training.

huggingface/transformers#22402

is decapoda aware? they might be willing to update their models.

@alisyzhu
Copy link

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

with 13B model, the size is 38G of the decapoda_research uploaded, while the model here is about 26G, could you tell me what's the difference between them,please?

@gururise
Copy link
Contributor

gururise commented Apr 11, 2023

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

with 13B model, the size is 38G of the decapoda_research uploaded, while the model here is about 26G, could you tell me what's the difference between them,please?

Interesting observation.
META AI original LLAMA 13B model weights are 26G in size.
I'm don't why the decapoda_research 13b model is 38G in size.
The yahma/llama-13b-hf was converted using latest transformer git, and matches the original 26G size for the model weights.

@louisoutin
Copy link

@diegomontoya Thanks for your prompt reply, which addresses my confusion. And I have another question. according to finetune.py, each training sequence is appended with an EOS token during the preprocess. So I think the models trained on these data should tend to generate sentences ending with an [EOS]. However, I use the checkpoint provided in this repo to generate something, and I found the generated sentences end with [EOS] [BOS], instead of a single [EOS]. Is that normal?

Also got the same after finetuning on my end. Anybody found a workaround?

@louisoutin
Copy link

For me doing:

model.config.pad_token_id = 0 = tokenizer.pad_token_id = 0  # same as unk token id
model.config.bos_token_id = tokenizer.bos_token_id = 1
model.config.eos_token_id = tokenizer.eos_token_id = 2
  • saving the transformer to json file using tokenizer.save_pretrained(path) and loading the tokenizer from this file helped in resolving the issue with unk token being generated instead of eos token.
    (But now i'm having the same issue as diegomontoya mentionned)

@NoahVl
Copy link

NoahVl commented Apr 15, 2023

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Thank you very much! This solved some very annoying inference bug in relation to the padding token of the tokenizer that would sometimes show up. If I changed the padding token it would just show up in another batch after a while. For people who might land on this page via Google, this is the error I used to (only sometimes) get:

../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
...
  File ".../lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Thanks to your uploaded models the issue somehow got fixed!

@teknium1
Copy link

Any updates to this? Are all things good now? Can we fix old models by changing the tokenizer config or?

@Qubitium
Copy link

@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.

@yzxyzh
Copy link

yzxyzh commented Apr 17, 2023

@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.

Is this mean that I have to download a new llama-hf model and retrain, or i can just use the old one, and use the newest transformer code with LlamaTokenizer?

@teknium1
Copy link

@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.

Is this mean that I have to download a new llama-hf model and retrain, or i can just use the old one, and use the newest transformer code with LlamaTokenizer?

I think it means either train on a llama model converted recently to HF format, or do it yourself with latest transformers. Unfortunately, the best fine tuned models rn are all based on old format. Only thing I can do atm is revert to older transformers commit to resolve

@HZQ950419
Copy link

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Hi @gururise , is it possible to upload llama-30B and llama-65B as well? Thanks!

@USBhost
Copy link

USBhost commented Apr 25, 2023

I would like to report all of Neko's tokenizers are current and match with https://huggingface.co/oobabooga/llama-tokenizer. Also if you want me to update stuff in the future just bug me here or on Neko.

@ehartford
Copy link

@USBhost Your contributions are appreciated!

@jploski
Copy link

jploski commented May 2, 2023

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

@teknium1
Copy link

teknium1 commented May 2, 2023

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Try Neko or Elinas' repos

@jploski
Copy link

jploski commented May 2, 2023

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Try Neko or Elinas' repos

elinas/llama-7b-hf-transformers-4.29 and Neko-Institute-of-Science/LLaMA-7B-HF both suffer from the same problem. They also both use the same two-big-shards config, which confirms my suspicion it is the cause (I can also see the RAM peaking and the process aborting when the 12.68 GB limit is hit; I'm talking of system RAM, not GPU RAM here).

So to sum it up, it would be nice to have a test configuration which can execute in the free Google Colab notebook - which I know is technically possible because decapoda-research/llama-7b-hf can be trained there (although the training produces wrong results).

(I also tried Kaggle, but there it fails because of the 20GB disk space limit.)

@jploski
Copy link

jploski commented May 4, 2023

So to sum it up, it would be nice to have a test configuration which can execute in the free Google Colab notebook - which I know is technically possible because decapoda-research/llama-7b-hf can be trained there (although the training produces wrong results).

I uploaded jploski/llama-7b-hf, which allows just this. It uses 34 checkpoint shards, but is otherwise identical to yahma/llama-7b-hf. (And the results of test.py from #364 are ok when the final LoRA weights from it are fed to generate.py.)

@Opdoop
Copy link

Opdoop commented May 6, 2023

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Do we have alpaca-lora weights based on these new models?

@Opdoop
Copy link

Opdoop commented May 6, 2023

Hi @gururise. Thanks for sharing the model! I guess these two lora weights are based on new llama models, am I right?

7B - https://huggingface.co/yahma/alpaca-7b-lora
13B - https://huggingface.co/yahma/alpaca-13b-lora

@gururise
Copy link
Contributor

Hi @gururise. Thanks for sharing the model! I guess these two lora weights are based on new llama models, am I right?

7B - https://huggingface.co/yahma/alpaca-7b-lora 13B - https://huggingface.co/yahma/alpaca-13b-lora

Yes, they are both based on the new llama models.

@nevercast
Copy link

I used alpaca-lora to fine-tune on top of openlm-research's open llama model. Now I'm getting lots of unk in my output. What's weird is I swear it didn't do this earlier, perhaps I reinstalled the dependencies and that has affected it?

Can someone please help me understand what actually changed in the tokens? Which token ids changed and which is "correct"? and if anyone knows if openlm's model uses the "correct" tokenizer that would also help me a tonne. Appreciated.

@Kong-Aobo
Copy link

Kong-Aobo commented Jun 18, 2023

There is still something wrong. I replace decapoda-research/llama-7b-hf with yahma/llama-7b-hf. And I find its tokenizer has no pad_token and pad_token_id. Its special tokens are as follows: <unk> 0, <bos> 1, <eos> 2. So what are the special tokens and their ids in original llama on earth? Do I have any misunderstandings?

@ehartford
Copy link

ehartford commented Jun 18, 2023 via email

@sciftlikci
Copy link

sciftlikci commented Jun 27, 2023

@nevercast I think your issue stems from setting the pad token equal to unk token, which leads to generating unk tokens more frequently if the fine-tuning hasn't been done properly.

Can someone explain why this method is selected despite HF staff's pad token = eos token usage in various places? Is there any empirical validation behind this?

@heroes999
Copy link

heroes999 commented Jul 7, 2023

For reference, the following is the token mapping generated by transformer[head] convert from llama weights:

{
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0,
  "transformers_version": "4.28.0.dev0"
}

If the model you downloaded/referencing has tokenizer that does not match the above, don't use it and just throw it away.

https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/generation_config.json
Oops, I encounter token issues when batch_size>1. I think the bos_token_id&eos_token_id are incorrent in decapoda tokenizer_config.json

@heroes999
Copy link

heroes999 commented Jul 7, 2023

For me doing:

model.config.pad_token_id = 0 = tokenizer.pad_token_id = 0  # same as unk token id
model.config.bos_token_id = tokenizer.bos_token_id = 1
model.config.eos_token_id = tokenizer.eos_token_id = 2
  • saving the transformer to json file using tokenizer.save_pretrained(path) and loading the tokenizer from this file helped in resolving the issue with unk token being generated instead of eos token.
    (But now i'm having the same issue as diegomontoya mentionned)

@louisoutin Oh, I wish I could see your post earlier. I tried past several days to work around, and found similar solution to yours. Any other unexpected side effects ?

@heroes999
Copy link

For me doing:

model.config.pad_token_id = 0 = tokenizer.pad_token_id = 0  # same as unk token id
model.config.bos_token_id = tokenizer.bos_token_id = 1
model.config.eos_token_id = tokenizer.eos_token_id = 2
  • saving the transformer to json file using tokenizer.save_pretrained(path) and loading the tokenizer from this file helped in resolving the issue with unk token being generated instead of eos token.
    (But now i'm having the same issue as diegomontoya mentionned)

@louisoutin Oh, I wish I could see your post earlier. I tried past several days to work around, and found similar solution to yours. Any other unexpected side effects ?

Not sure if the decapoda-llama 7B is trained with (pad=0, bos=1, eos=2)

@thedaffodil
Copy link

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Could you find a solution for this?

@jploski
Copy link

jploski commented Aug 8, 2023

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Could you find a solution for this?

You can use https://huggingface.co/jploski/llama-7b-hf instead of yahma/llama-7b-hf

@thedaffodil
Copy link

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Could you find a solution for this?

You can use https://huggingface.co/jploski/llama-7b-hf instead of yahma/llama-7b-hf

I've just used https://huggingface.co/jploski/llama-7b-hf for the following code:

from open_flamingo import create_model_and_transforms
llama_path = '/content/llama-7b-hf'
model, image_processor, tokenizer = create_model_and_transforms(
clip_vision_encoder_path="ViT-L-14",
clip_vision_encoder_pretrained="openai",
lang_encoder_path=llama_path,
tokenizer_path=llama_path,
cross_attn_every_n_layers=4
)

but it didn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests