Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ./quantize command uses the Q4_K_M parameter, Unsupported tensor size encountered error #2143

Closed
stoneLee81 opened this issue Jul 8, 2023 · 2 comments

Comments

@stoneLee81
Copy link

When I use the ./quantize command to use the Q4_K_M parameter and want to convert a vector file, an error is reported

main: quantizing './zh-models/plus_13B/ggml-model-f16.bin' to './zh-models/plus_13B/ggml-model-q4_K_M.bin' as Q4_K_M
llama.cpp: loading model from ./zh-models/plus_13B/ggml-model-f16.bin
llama.cpp: saving model to ./zh-models/plus_13B/ggml-model-q4_K_M.bin
========================= Tensor sizes 5120 x 49954 are not divisible by 256
This is required to be able to use k-quants for now!
========================================================================================
llama_model_quantize: failed to quantize: Unsupported tensor size encountered

Can you tell me what might be wrong?

env

max os ventura 13.4.1

@KerfuffleV2
Copy link
Collaborator

The Chinese models use a non-standard vocabulary size. The size of some of the model's tensors are based on the vocabulary size, so the result is that the model isn't compatible with k-quants because it uses a 256 element block size.

You can potentially try to compile llama.cpp with LLAMA_QKK_64=1 which makes k-quants use a 64 element block size. However, this does negative at least part of the value of k-quants by increasing both overhead (and apparently) perplexity. See: #2001

Another downside is you'll only be able to use those models with a version of llama.cpp that was compiled with the flag I mentioned. Personally, I'm not sure I would bother just because the tradeoffs are significant. You should be able to quantize with the non-k-quants quantizations (i.e. q4_1, q4_0, q5_0, q5_1, etc).

@LostRuins
Copy link
Collaborator

This is fixed after #2148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants