The ./quantize command uses the Q4_K_M parameter， Unsupported tensor size encountered error #2143

stoneLee81 · 2023-07-08T04:05:40Z

When I use the ./quantize command to use the Q4_K_M parameter and want to convert a vector file, an error is reported

main: quantizing './zh-models/plus_13B/ggml-model-f16.bin' to './zh-models/plus_13B/ggml-model-q4_K_M.bin' as Q4_K_M
llama.cpp: loading model from ./zh-models/plus_13B/ggml-model-f16.bin
llama.cpp: saving model to ./zh-models/plus_13B/ggml-model-q4_K_M.bin
========================= Tensor sizes 5120 x 49954 are not divisible by 256
This is required to be able to use k-quants for now!
========================================================================================
llama_model_quantize: failed to quantize: Unsupported tensor size encountered

Can you tell me what might be wrong?

env

max os ventura 13.4.1

KerfuffleV2 · 2023-07-08T08:06:35Z

The Chinese models use a non-standard vocabulary size. The size of some of the model's tensors are based on the vocabulary size, so the result is that the model isn't compatible with k-quants because it uses a 256 element block size.

You can potentially try to compile llama.cpp with LLAMA_QKK_64=1 which makes k-quants use a 64 element block size. However, this does negative at least part of the value of k-quants by increasing both overhead (and apparently) perplexity. See: #2001

Another downside is you'll only be able to use those models with a version of llama.cpp that was compiled with the flag I mentioned. Personally, I'm not sure I would bother just because the tradeoffs are significant. You should be able to quantize with the non-k-quants quantizations (i.e. q4_1, q4_0, q5_0, q5_1, etc).

LostRuins · 2023-07-12T06:05:31Z

This is fixed after #2148

stoneLee81 closed this as completed Jul 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The ./quantize command uses the Q4_K_M parameter， Unsupported tensor size encountered error #2143

The ./quantize command uses the Q4_K_M parameter， Unsupported tensor size encountered error #2143

stoneLee81 commented Jul 8, 2023

KerfuffleV2 commented Jul 8, 2023

LostRuins commented Jul 12, 2023

The ./quantize command uses the Q4_K_M parameter， Unsupported tensor size encountered error #2143

The ./quantize command uses the Q4_K_M parameter， Unsupported tensor size encountered error #2143

Comments

stoneLee81 commented Jul 8, 2023

KerfuffleV2 commented Jul 8, 2023

LostRuins commented Jul 12, 2023