You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use the ./quantize command to use the Q4_K_M parameter and want to convert a vector file, an error is reported
main: quantizing './zh-models/plus_13B/ggml-model-f16.bin' to './zh-models/plus_13B/ggml-model-q4_K_M.bin' as Q4_K_M
llama.cpp: loading model from ./zh-models/plus_13B/ggml-model-f16.bin
llama.cpp: saving model to ./zh-models/plus_13B/ggml-model-q4_K_M.bin
========================= Tensor sizes 5120 x 49954 are not divisible by 256
This is required to be able to use k-quants for now!
========================================================================================
llama_model_quantize: failed to quantize: Unsupported tensor size encountered
Can you tell me what might be wrong?
env
max os ventura 13.4.1
The text was updated successfully, but these errors were encountered:
The Chinese models use a non-standard vocabulary size. The size of some of the model's tensors are based on the vocabulary size, so the result is that the model isn't compatible with k-quants because it uses a 256 element block size.
You can potentially try to compile llama.cpp with LLAMA_QKK_64=1 which makes k-quants use a 64 element block size. However, this does negative at least part of the value of k-quants by increasing both overhead (and apparently) perplexity. See: #2001
Another downside is you'll only be able to use those models with a version of llama.cpp that was compiled with the flag I mentioned. Personally, I'm not sure I would bother just because the tradeoffs are significant. You should be able to quantize with the non-k-quants quantizations (i.e. q4_1, q4_0, q5_0, q5_1, etc).
When I use the ./quantize command to use the Q4_K_M parameter and want to convert a vector file, an error is reported
Can you tell me what might be wrong?
env
The text was updated successfully, but these errors were encountered: