What are the eos_token_id and bos_token_id #279

leekum2018 · 2023-04-05T18:03:02Z

In generate.py, the bos_token_id=1 and eos_token_id=2,
model.config.bos_token_id = 1
model.config.eos_token_id = 2

However, in finetune.py, the tokenizer is directly loaded from the official llama checkpoint, where bos_token_id=0 and eos_token_id=0.
How to understand this discrepancy? Thank you!

The text was updated successfully, but these errors were encountered:

archwolf118 · 2023-04-06T09:16:39Z

same question， if the fine-tune need same configuration?

HillZhang1999 · 2023-04-07T07:44:59Z

Same question, I finetuned an alpaca-lora using the author's code, and found it will generate a <unk> instead of <eos> at the end of response, which will result in some problems.

Qubitium · 2023-04-07T08:04:52Z

This is a huge issue. The https://huggingface.co/decapoda-research/llama-Xb-hf HF models have bad/incorrect token ids mappings for bos/eos vs the original META llama. Now that lots of people are using it to generate models, the end result is bad.

Transformers head now fixes this issue but broke backward compact. You can use use_fast=False to use old LlamaTokenizer which is old code. Head transformer default to LlamaTokenizerFast.

Qubitium · 2023-04-07T08:45:00Z

Everyone needs to check out transformer head and use the latest export to hf script on the original facebook models and use that as bases for future training using transformer[head]. Everyone needs to stop using decapoda models which will cause more and more issues the more you use it's broken tokenizer mapping for training.

huggingface/transformers#22402

Qubitium · 2023-04-07T09:23:16Z

For reference, the following is the token mapping generated by transformer[head] convert from llama weights:

{
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0,
  "transformers_version": "4.28.0.dev0"
}

If the model you downloaded/referencing has tokenizer that does not match the above, don't use it and just throw it away.

leekum2018 · 2023-04-07T13:49:18Z

@diegomontoya Thanks for your prompt reply, which addresses my confusion. And I have another question. according to finetune.py, each training sequence is appended with an EOS token during the preprocess. So I think the models trained on these data should tend to generate sentences ending with an [EOS]. However, I use the checkpoint provided in this repo to generate something, and I found the generated sentences end with [EOS] [BOS], instead of a single [EOS]. Is that normal?

gururise · 2023-04-08T14:51:35Z

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf
13B - https://huggingface.co/yahma/llama-13b-hf

ehartford · 2023-04-10T17:26:01Z

Everyone needs to check out transformer head and use the latest export to hf script on the original facebook models and use that as bases for future training using transformer[head]. Everyone needs to stop using decapoda models which will cause more and more issues the more you use it's broken tokenizer mapping for training.

huggingface/transformers#22402

is decapoda aware? they might be willing to update their models.

alisyzhu · 2023-04-11T07:31:18Z

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

with 13B model, the size is 38G of the decapoda_research uploaded, while the model here is about 26G, could you tell me what's the difference between them,please?

gururise · 2023-04-11T16:21:35Z

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

with 13B model, the size is 38G of the decapoda_research uploaded, while the model here is about 26G, could you tell me what's the difference between them,please?

Interesting observation.
META AI original LLAMA 13B model weights are 26G in size.
I'm don't why the decapoda_research 13b model is 38G in size.
The yahma/llama-13b-hf was converted using latest transformer git, and matches the original 26G size for the model weights.

louisoutin · 2023-04-11T20:26:26Z

@diegomontoya Thanks for your prompt reply, which addresses my confusion. And I have another question. according to finetune.py, each training sequence is appended with an EOS token during the preprocess. So I think the models trained on these data should tend to generate sentences ending with an [EOS]. However, I use the checkpoint provided in this repo to generate something, and I found the generated sentences end with [EOS] [BOS], instead of a single [EOS]. Is that normal?

Also got the same after finetuning on my end. Anybody found a workaround?

louisoutin · 2023-04-11T20:29:13Z

For me doing:

model.config.pad_token_id = 0 = tokenizer.pad_token_id = 0  # same as unk token id
model.config.bos_token_id = tokenizer.bos_token_id = 1
model.config.eos_token_id = tokenizer.eos_token_id = 2

saving the transformer to json file using tokenizer.save_pretrained(path) and loading the tokenizer from this file helped in resolving the issue with unk token being generated instead of eos token.
(But now i'm having the same issue as diegomontoya mentionned)

NoahVl · 2023-04-15T12:36:39Z

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Thank you very much! This solved some very annoying inference bug in relation to the padding token of the tokenizer that would sometimes show up. If I changed the padding token it would just show up in another batch after a while. For people who might land on this page via Google, this is the error I used to (only sometimes) get:

../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
...
  File ".../lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Thanks to your uploaded models the issue somehow got fixed!

teknium1 · 2023-04-17T04:52:30Z

Any updates to this? Are all things good now? Can we fix old models by changing the tokenizer config or?

Qubitium · 2023-04-17T05:09:08Z

@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.

yzxyzh · 2023-04-17T15:48:50Z

@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.

Is this mean that I have to download a new llama-hf model and retrain, or i can just use the old one, and use the newest transformer code with LlamaTokenizer?

teknium1 · 2023-04-17T16:16:47Z

@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.

Is this mean that I have to download a new llama-hf model and retrain, or i can just use the old one, and use the newest transformer code with LlamaTokenizer?

I think it means either train on a llama model converted recently to HF format, or do it yourself with latest transformers. Unfortunately, the best fine tuned models rn are all based on old format. Only thing I can do atm is revert to older transformers commit to resolve

HZQ950419 · 2023-04-19T13:21:07Z

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Hi @gururise , is it possible to upload llama-30B and llama-65B as well? Thanks!

USBhost · 2023-04-25T15:38:52Z

I would like to report all of Neko's tokenizers are current and match with https://huggingface.co/oobabooga/llama-tokenizer. Also if you want me to update stuff in the future just bug me here or on Neko.

ehartford · 2023-04-25T16:07:49Z

@USBhost Your contributions are appreciated!

jploski · 2023-05-02T21:25:13Z

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

teknium1 · 2023-05-02T21:29:00Z

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Try Neko or Elinas' repos

jploski · 2023-05-02T21:50:51Z

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Try Neko or Elinas' repos

elinas/llama-7b-hf-transformers-4.29 and Neko-Institute-of-Science/LLaMA-7B-HF both suffer from the same problem. They also both use the same two-big-shards config, which confirms my suspicion it is the cause (I can also see the RAM peaking and the process aborting when the 12.68 GB limit is hit; I'm talking of system RAM, not GPU RAM here).

So to sum it up, it would be nice to have a test configuration which can execute in the free Google Colab notebook - which I know is technically possible because decapoda-research/llama-7b-hf can be trained there (although the training produces wrong results).

(I also tried Kaggle, but there it fails because of the 20GB disk space limit.)

jploski · 2023-05-04T12:54:36Z

So to sum it up, it would be nice to have a test configuration which can execute in the free Google Colab notebook - which I know is technically possible because decapoda-research/llama-7b-hf can be trained there (although the training produces wrong results).

I uploaded jploski/llama-7b-hf, which allows just this. It uses 34 checkpoint shards, but is otherwise identical to yahma/llama-7b-hf. (And the results of test.py from #364 are ok when the final LoRA weights from it are fed to generate.py.)

Opdoop · 2023-05-06T12:40:14Z

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Do we have alpaca-lora weights based on these new models?

Opdoop · 2023-05-06T14:45:18Z

Hi @gururise. Thanks for sharing the model! I guess these two lora weights are based on new llama models, am I right?

7B - https://huggingface.co/yahma/alpaca-7b-lora
13B - https://huggingface.co/yahma/alpaca-13b-lora

gururise · 2023-05-12T00:07:08Z

Hi @gururise. Thanks for sharing the model! I guess these two lora weights are based on new llama models, am I right?

7B - https://huggingface.co/yahma/alpaca-7b-lora 13B - https://huggingface.co/yahma/alpaca-13b-lora

Yes, they are both based on the new llama models.

nevercast · 2023-05-31T23:36:36Z

I used alpaca-lora to fine-tune on top of openlm-research's open llama model. Now I'm getting lots of unk in my output. What's weird is I swear it didn't do this earlier, perhaps I reinstalled the dependencies and that has affected it?

Can someone please help me understand what actually changed in the tokens? Which token ids changed and which is "correct"? and if anyone knows if openlm's model uses the "correct" tokenizer that would also help me a tonne. Appreciated.

Kong-Aobo · 2023-06-18T12:38:21Z

There is still something wrong. I replace decapoda-research/llama-7b-hf with yahma/llama-7b-hf. And I find its tokenizer has no pad_token and pad_token_id. Its special tokens are as follows: <unk> 0, <bos> 1, <eos> 2. So what are the special tokens and their ids in original llama on earth? Do I have any misunderstandings？

ehartford · 2023-06-18T19:38:34Z

You should use huggyllama

…

On Sun, Jun 18, 2023, 5:38 AM Kong Aobo ***@***.***> wrote: There is still something wrong. I replace decapoda-research/llama-7b-hf with yahma/llama-7b-hf. And I find its tokenizer has no pad_token and pad_token_id. Its special tokens are as follows: 0, 1, 2. So what are the special tokens and their ids in original llama on earth? Do I have any misunderstandings？ — Reply to this email directly, view it on GitHub <#279 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIQ4BNEYREGNLR6RSZGNB3XL3ZERANCNFSM6AAAAAAWUMVGUE> . You are receiving this because you commented.Message ID: ***@***.***>

sciftlikci · 2023-06-27T10:18:25Z

@nevercast I think your issue stems from setting the pad token equal to unk token, which leads to generating unk tokens more frequently if the fine-tuning hasn't been done properly.

Can someone explain why this method is selected despite HF staff's pad token = eos token usage in various places? Is there any empirical validation behind this?

heroes999 · 2023-07-07T07:28:46Z

For reference, the following is the token mapping generated by transformer[head] convert from llama weights:
{
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0,
  "transformers_version": "4.28.0.dev0"
}
If the model you downloaded/referencing has tokenizer that does not match the above, don't use it and just throw it away.

https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/generation_config.json
Oops, I encounter token issues when batch_size>1. I think the bos_token_id&eos_token_id are incorrent in decapoda tokenizer_config.json

heroes999 · 2023-07-07T07:33:35Z

For me doing:
model.config.pad_token_id = 0 = tokenizer.pad_token_id = 0  # same as unk token id
model.config.bos_token_id = tokenizer.bos_token_id = 1
model.config.eos_token_id = tokenizer.eos_token_id = 2
saving the transformer to json file using tokenizer.save_pretrained(path) and loading the tokenizer from this file helped in resolving the issue with unk token being generated instead of eos token.
(But now i'm having the same issue as diegomontoya mentionned)

@louisoutin Oh, I wish I could see your post earlier. I tried past several days to work around, and found similar solution to yours. Any other unexpected side effects ?

heroes999 · 2023-07-07T07:35:37Z

For me doing:
model.config.pad_token_id = 0 = tokenizer.pad_token_id = 0  # same as unk token id
model.config.bos_token_id = tokenizer.bos_token_id = 1
model.config.eos_token_id = tokenizer.eos_token_id = 2
saving the transformer to json file using tokenizer.save_pretrained(path) and loading the tokenizer from this file helped in resolving the issue with unk token being generated instead of eos token.
(But now i'm having the same issue as diegomontoya mentionned)
@louisoutin Oh, I wish I could see your post earlier. I tried past several days to work around, and found similar solution to yours. Any other unexpected side effects ?

Not sure if the decapoda-llama 7B is trained with (pad=0, bos=1, eos=2)

thedaffodil · 2023-08-08T17:32:30Z

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Could you find a solution for this?

jploski · 2023-08-08T18:36:48Z

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Could you find a solution for this?

You can use https://huggingface.co/jploski/llama-7b-hf instead of yahma/llama-7b-hf

thedaffodil · 2023-08-08T18:54:22Z

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Could you find a solution for this?

You can use https://huggingface.co/jploski/llama-7b-hf instead of yahma/llama-7b-hf

I've just used https://huggingface.co/jploski/llama-7b-hf for the following code:

from open_flamingo import create_model_and_transforms
llama_path = '/content/llama-7b-hf'
model, image_processor, tokenizer = create_model_and_transforms(
clip_vision_encoder_path="ViT-L-14",
clip_vision_encoder_pretrained="openai",
lang_encoder_path=llama_path,
tokenizer_path=llama_path,
cross_attn_every_n_layers=4
)

but it didn't work.

amcl mentioned this issue Apr 7, 2023

Both export_*_checkpoint.py not reflecting base and lora weights merge upon inference #291

Open

Facico mentioned this issue Apr 11, 2023

测试时输出回答无法停止，直到256长度限制，loss很快收敛性，到0.82左右就不再下降 Facico/Chinese-Vicuna#54

Closed

lywinged mentioned this issue Apr 12, 2023

finetune alpaca-lora with custom dataset got poor results #325

Open

yzxyzh mentioned this issue Apr 17, 2023

Asking to pad but the tokenizer does not have a padding token. lm-sys/FastChat#466

Closed

wilson9x1 mentioned this issue May 9, 2023

微调之后加载权重发现输出停不下来 Facico/Chinese-Vicuna#140

Closed

jayelm mentioned this issue Jun 16, 2023

Unable to reproduce LLaMA-7B results when training from scratch jayelm/gisting#9

Closed

What are the eos_token_id and bos_token_id #279

What are the eos_token_id and bos_token_id #279

Comments

leekum2018 commented Apr 5, 2023 • edited Loading

archwolf118 commented Apr 6, 2023

HillZhang1999 commented Apr 7, 2023

Qubitium commented Apr 7, 2023

Qubitium commented Apr 7, 2023

Qubitium commented Apr 7, 2023

leekum2018 commented Apr 7, 2023

gururise commented Apr 8, 2023 • edited Loading

ehartford commented Apr 10, 2023

alisyzhu commented Apr 11, 2023

gururise commented Apr 11, 2023 • edited Loading

louisoutin commented Apr 11, 2023

louisoutin commented Apr 11, 2023

NoahVl commented Apr 15, 2023

teknium1 commented Apr 17, 2023

Qubitium commented Apr 17, 2023

yzxyzh commented Apr 17, 2023

teknium1 commented Apr 17, 2023

HZQ950419 commented Apr 19, 2023

USBhost commented Apr 25, 2023 • edited Loading

ehartford commented Apr 25, 2023

jploski commented May 2, 2023

teknium1 commented May 2, 2023

jploski commented May 2, 2023

jploski commented May 4, 2023

Opdoop commented May 6, 2023

Opdoop commented May 6, 2023

gururise commented May 12, 2023

nevercast commented May 31, 2023

Kong-Aobo commented Jun 18, 2023 • edited Loading

ehartford commented Jun 18, 2023 via email

sciftlikci commented Jun 27, 2023 • edited Loading

heroes999 commented Jul 7, 2023 • edited Loading

heroes999 commented Jul 7, 2023 • edited Loading

heroes999 commented Jul 7, 2023

thedaffodil commented Aug 8, 2023

jploski commented Aug 8, 2023

thedaffodil commented Aug 8, 2023

leekum2018 commented Apr 5, 2023 •

edited

Loading

gururise commented Apr 8, 2023 •

edited

Loading

gururise commented Apr 11, 2023 •

edited

Loading

USBhost commented Apr 25, 2023 •

edited

Loading

Kong-Aobo commented Jun 18, 2023 •

edited

Loading

sciftlikci commented Jun 27, 2023 •

edited

Loading

heroes999 commented Jul 7, 2023 •

edited

Loading

heroes999 commented Jul 7, 2023 •

edited

Loading