Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16 #5234

jackswl · 2024-01-31T07:54:11Z

Hi,

I am trying to quantize my custom fine-tuned deepseek-7b instruct model, and I am unable to to do. I followed the document:

# Convert to fp16
fp16 = f"{MODEL_NAME}/{MODEL_NAME.lower()}.fp16.bin"
!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16}

but it produces this error:

/content/llama.cpp/gguf-py
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00001-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00002-of-00003.safetensors
Loading model file deepseek-coder-6.7b-instruct-finetuned/model-00003-of-00003.safetensors
params = Params(n_vocab=32256, n_embd=4096, n_layer=32, n_ctx=16384, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-06, n_experts=None, n_experts_used=None, rope_scaling_type=<RopeScalingType.LINEAR: 'linear'>, f_rope_freq_base=100000, f_rope_scale=4.0, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('deepseek-coder-6.7b-instruct-finetuned'))
Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': PosixPath('deepseek-coder-6.7b-instruct-finetuned/tokenizer.json')}
Loading vocab file 'deepseek-coder-6.7b-instruct-finetuned/tokenizer.json', type 'spm'
Traceback (most recent call last):
  File "/content/llama.cpp/convert.py", line 1662, in <module>
    main(sys.argv[1:])  # Exclude the first element (script name) from sys.argv
  File "/content/llama.cpp/convert.py", line 1618, in main
    vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path)
  File "/content/llama.cpp/convert.py", line 1422, in load_vocab
    vocab = SentencePieceVocab(
  File "/content/llama.cpp/convert.py", line 449, in __init__
    self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 447, in Init
    self.Load(model_file=model_file, model_proto=model_proto)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

I cannot seem to find similar errors on the github issues. Any insight to this would be greatly appreciated.
One can replicate this experiment by quantizing a deepseek 7b instruct coder model.

The text was updated successfully, but these errors were encountered:

cmp-nct · 2024-02-01T16:09:07Z

Reads like a broken tokenizer file ?
Given the vocab appears not have been fine tuned, maybe get the original from here: https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/tree/main ?

jackswl · 2024-02-02T00:19:05Z

Thanks for your response - however, where do I find the vocab file in that huggingface? I assume you meant the vocab.json file?

cmp-nct · 2024-02-02T00:27:03Z

the tokenizer and vocab files, I'm not sure which ones are used.
But given the vocabulary is the same in your fine tune I'd assume they are identical.
You could also doublecheck your local directory, if any of those files are broken

jackswl · 2024-02-02T00:28:37Z

the files are not broken. This is an issue for other people as well. In fact, you dont have to quantize a custom deepseek model to get this error. If you just quantize the original 7b model, it will throw up this error too.

vlsav · 2024-02-10T18:03:45Z

Same story with latest set of DeepSeek Math Models.
python convert.py deepseek-math-7b-rl --pad-vocab
Loading model file deepseek-math-7b-rl\pytorch_model-00001-of-000002.bin
Loading model file deepseek-math-7b-rl\pytorch_model-00001-of-000002.bin
Loading model file deepseek-math-7b-rl\pytorch_model-00002-of-000002.bin
params = Params(n_vocab=102400, n_embd=4096, n_layer=30, n_ctx=4096, n_ff=11008, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=None, f_rope_freq_base=10000, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=WindowsPath('deepseek-math-7b-rl'))
Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': WindowsPath('deepseek-math-7b-rl/tokenizer.json')}
Loading vocab file 'deepseek-math-7b-rl\tokenizer.json', type 'spm'
Traceback (most recent call last):
File "D:\Util\llama.cpp\convert.py", line 1478, in
main()
File "D:\Util\llama.cpp\convert.py", line 1446, in main
vocab, special_vocab = vocab_factory.load_vocab(args.vocab_type, model_parent_path)
File "D:\Util\llama.cpp\convert.py", line 1332, in load_vocab
vocab = SentencePieceVocab(
File "D:\Util\llama.cpp\convert.py", line 394, in init
self.sentencepiece_tokenizer = SentencePieceProcessor(str(fname_tokenizer))
File "D:\Util\miniconda3\envs\llamacpp\lib\site-packages\sentencepiece_init_.py", line 447, in Init
self.Load(model_file=model_file, model_proto=model_proto)
File "D:\Util\miniconda3\envs\llamacpp\lib\site-packages\sentencepiece_init_.py", line 905, in Load
return self.LoadFromFile(model_file)
File "D:\Util\miniconda3\envs\llamacpp\lib\site-packages\sentencepiece_init_.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: D:\a\sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

python convert.py deepseek-math-7b-rl --vocab-type hfft --pad-vocab
makes broken model. llama.cpp cannot load it.

python convert.py deepseek-math-7b-rl --vocab-type bpe --pad-vocab
Makes loadable model, but it generates a lof of garbage and in general very strange output.
Convert shows following message abou vocab generation:
Vocab info: <BpeVocab with 100000 base tokens and 2 added tokens>
Special vocab info: <SpecialVocab with 99757 merges, special tokens {'bos': 100000, 'eos': 100001}, add special tokens {'bos': True, 'eos': False}>

RonanKMcGovern · 2024-02-22T14:56:05Z

Any insights @jackshiwl ?

Nold360 · 2024-03-01T16:33:16Z

Can confirm this issue.. although it converts the model using vocab-type hfft, the model will not load:

llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 2387/102400 vs 2400/102400 ).
[...]
terminate called after throwing an instance of 'std::out_of_range'
  what():  unordered_map::at

jackswl · 2024-03-02T03:30:47Z

hi all, i am not investigating this issue anymore. I am using another model. Hope someone can fix this / look into this @cmp-nct

itsdotscience · 2024-03-10T17:22:32Z

It seems there was a change recently that pins bpe to vocab.json . From the HF docs it looks like any compatible PretrainedTokenizer transformers supports could be represented by tokenizer.json

https://huggingface.co/docs/transformers/en/fast_tokenizers

3 weeks ago, b2213 convert.py output

Loading vocab file '/ai/models/tokenizer.json', type 'bpe'
Vocab info: <BpeVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 31757 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>

current mainline convert.py output

Loading vocab file PosixPath('/ai/models/tokenizer.json'), type 'hfft'
fname_tokenizer: /ai/models
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Vocab info: <HfVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>

result latest running main:

llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_voca  b: mismatch in special tokens definition ( 243/32256 vs 256/32256 ).

llm_load_print_meta: BOS token        = 32013 '<｜begin▁of▁sentence｜>'
llm_load_print_meta: EOS token        = 32014 '<｜end▁of▁sentence｜>'
llm_load_print_meta: UNK token        = 0 '!'
llm_load_print_meta: PAD token        = 32014 '<｜end▁of▁sentence｜>'


terminate called after throwing an instance of 'std::out_of_range'
  what():  unordered_map::at
Aborted (core dumped)

3 weeks ago running main:

llm_load_vocab: mismatch in special tokens definition ( 243/32256 vs 256/32256 ).

llm_load_print_meta: BOS token        = 32013 '<｜begin▁of▁sentence｜>'
llm_load_print_meta: EOS token        = 32014 '<｜end▁of▁sentence｜>'
llm_load_print_meta: PAD token        = 32014 '<｜end▁of▁sentence｜>'
llm_load_print_meta: LF token         = 126 'Ä'

we still have our mismatched but the type is bpe rather than spm it also produces text as expected, no garbage, rather than segfault

edit: I had another moment so I tried just copying tokenizer.json to vocab.json and setting vocab-type to bpe.

Loading vocab file PosixPath('/ai/models/vocab.json'), type 'bpe'
/ai/models
Vocab info: <BpeVocab with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 31757 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>

I confirmed both b2213 and the current main's convert.py if you do the above generate an f32 with an idental sha256 hash.

christopherthompson81 · 2024-03-13T16:56:04Z

There's a PR from the deepseek team about this. Basically, their tokenizer needs to be supported in llama.cpp for this to work.

hyperbolic-c · 2024-03-15T12:47:42Z

Can confirm this issue.. although it converts the model using vocab-type hfft, the model will not load:

llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 2387/102400 vs 2400/102400 ).
[...]
terminate called after throwing an instance of 'std::out_of_range'
  what():  unordered_map::at

@Nold360 yeah, I got the same error, did you have any way to solve it ? thanks. It can not quantize with convert-hf-to-gguf.py too

github-actions · 2024-04-30T01:15:56Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

jackswl added the bug-unconfirmed label Jan 31, 2024

RonanKMcGovern mentioned this issue Feb 22, 2024

tokenizer.json issue creating gguf files deepseek-ai/DeepSeek-Coder#124

Open

github-actions bot added the stale label Apr 15, 2024

github-actions bot closed this as completed Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16 #5234

Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16 #5234

jackswl commented Jan 31, 2024

cmp-nct commented Feb 1, 2024 •

edited

Loading

jackswl commented Feb 2, 2024

cmp-nct commented Feb 2, 2024

jackswl commented Feb 2, 2024

vlsav commented Feb 10, 2024

RonanKMcGovern commented Feb 22, 2024

Nold360 commented Mar 1, 2024

jackswl commented Mar 2, 2024

itsdotscience commented Mar 10, 2024 •

edited

Loading

christopherthompson81 commented Mar 13, 2024

hyperbolic-c commented Mar 15, 2024

github-actions bot commented Apr 30, 2024

Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16 #5234

Custom fine-tuned DeepSeek coder model unable to be quantized to Fp16 #5234

Comments

jackswl commented Jan 31, 2024

cmp-nct commented Feb 1, 2024 • edited Loading

jackswl commented Feb 2, 2024

cmp-nct commented Feb 2, 2024

jackswl commented Feb 2, 2024

vlsav commented Feb 10, 2024

RonanKMcGovern commented Feb 22, 2024

Nold360 commented Mar 1, 2024

jackswl commented Mar 2, 2024

itsdotscience commented Mar 10, 2024 • edited Loading

christopherthompson81 commented Mar 13, 2024

hyperbolic-c commented Mar 15, 2024

github-actions bot commented Apr 30, 2024

cmp-nct commented Feb 1, 2024 •

edited

Loading

itsdotscience commented Mar 10, 2024 •

edited

Loading