-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Something might be wrong with either llama.cpp or the Llama 3 GGUFs #6914
Comments
likely related to BPE token merging behavior #6809 |
Same on my side, Bartowski fresh conversion from https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/tree/main , Q8_0 quant |
I did these tests as well and can confirm your findings. Exllama v2 also did better in my own test I mentioned in the discussion yesterday. |
fixed by #6920. log with PR:
|
@Jeximo some other tests you can try, taken from the JS library sample reference:
|
Note regarding the test cases above: I wrote those test cases for LLaMA 1 tokenization, and updated the "expected results" for LLaMA 3 (with the exception of the last case, which is copied from official repo). So, those tests are definitely better than nothing, but they are not "optimal test cases" to uncover potential bugs in LLaMA 3 tokenization. Would be good to find some adversarial inputs specific to LLaMA 3 tokenization and update the tests as such. |
So I just tested the current llama3 models and despite the changes to support llama3 in latest ollama release, the official llama3 model that is uploaded still gives the wrong result
this is with llama3:70b-instruct-q4_K_M 5338d7c58d8d |
@nkeilar what is “the official one”? If you mean ollama… ollama isn’t official, and also they won’t have regenerated their GGUFs yet. Wherever you’re looking, when was the file generated? The changes made require the GGUFs to be generated again. It isn’t just a change to llama.cpp. I haven’t tried to test the changes, but the discussion around the changes sounded convincing to me. |
Someone here apparently had good luck with a new GGUF of the 8B model. |
@Jeximo is that a 3-bit quantization? 🤔 regardless, looks correct. |
fixed in #6920 |
Try this query: "What is 3333+777?"
Yes, yes, LLMs are bad at math. That's not what I'm getting at. Someone mentioned this on Reddit, and I have to agree that I'm seeing weird stuff too.
Let's get a baseline. Here is what meta.ai yields:
This is likely running on Llama 3 70B.
Here is what Groq yields:
and at 8B:
Now, here's where things get weird. Using Open WebUI on top of Ollama, let's use llama.cpp to run the GGUFs of Llama 3.
First, 8B at fp16:
Then 8B at Q8_0:
Then 70B at Q4_0:
I think the problem should be clear. All of the non-llama.cpp instances that were not using GGUFs did the math problem correctly. All of the llama.cpp instances got the problem wrong in exactly the same way. This issue is extremely repeatable on both ends. I have never seen the cloud instances make this mistake, and I have never seen the llama.cpp instances not make this exact mistake of adding an extra digit to the problem and then getting it wrong.
To me, it appears that something is degrading the accuracy of Llama 3 when run under llama.cpp.
Any ideas of what's going wrong here?
The text was updated successfully, but these errors were encountered: