Something might be wrong with either llama.cpp or the Llama 3 GGUFs #6914

coder543 · 2024-04-25T22:09:50Z

Try this query: "What is 3333+777?"

Yes, yes, LLMs are bad at math. That's not what I'm getting at. Someone mentioned this on Reddit, and I have to agree that I'm seeing weird stuff too.

Let's get a baseline. Here is what meta.ai yields:

This is likely running on Llama 3 70B.

Here is what Groq yields:

and at 8B:

Now, here's where things get weird. Using Open WebUI on top of Ollama, let's use llama.cpp to run the GGUFs of Llama 3.

First, 8B at fp16:

Then 8B at Q8_0:

Then 70B at Q4_0:

I think the problem should be clear. All of the non-llama.cpp instances that were not using GGUFs did the math problem correctly. All of the llama.cpp instances got the problem wrong in exactly the same way. This issue is extremely repeatable on both ends. I have never seen the cloud instances make this mistake, and I have never seen the llama.cpp instances not make this exact mistake of adding an extra digit to the problem and then getting it wrong.

To me, it appears that something is degrading the accuracy of Llama 3 when run under llama.cpp.

Any ideas of what's going wrong here?

Jeximo · 2024-04-25T23:47:17Z

I reproduced this in main,

./main -m ~/Meta-Llama-3-8B-Instruct-IQ3_M.gguf -s 7 -i --color --penalize-nl -e --temp 0 -r "<|eot_id|>" --in-prefix "\n<|start_header_id|>user<|end_header_id|>\n\n" --in-suffix "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" -p "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|>\n<|start_header_id|>user<|end_header_id|>\n\nHi!<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>\n\n"

I think the result is related to the way the llama.cpp tokenizer chunks 3333+777, see here:

buffer: '3333+777?'

input tokens: [ ... '33':1644, '33':1644, '+':10, '777':15831, ... ]

It works as expected with: 3,333 + 777(add comma):

4333+777(no comma) also worked as expected, so it's not a gguf problem.

LostRuins · 2024-04-26T02:54:52Z

likely related to BPE token merging behavior #6809

itodorovic · 2024-04-26T09:01:38Z

Same on my side, Bartowski fresh conversion from https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/tree/main , Q8_0 quant

Dampfinchen · 2024-04-26T16:16:52Z

I did these tests as well and can confirm your findings. Exllama v2 also did better in my own test I mentioned in the discussion yesterday.

Jeximo · 2024-04-26T22:04:28Z

fixed by #6920. log with PR:

buffer: '3333+777?'

input tokens: [ ..., '333':8765, '3':18, '+':10, '777':15831, ...]

LostRuins · 2024-04-27T02:00:00Z

@Jeximo some other tests you can try, taken from the JS library sample reference:

// Simple test case
test("grabbed",                           [59312, 2788])

// Naive implementation produces inconsistent tokenization for " grabbed" 
test(" grabbed",                          [30418])

// Naive implementation (in LLAMA 1 TOKENIZER) uses incorrect merge order for multiple consecutive space merges
test("           grabbed",                [1881, 30418])

// Linebreak and tab are handling
test("\n",                                [198])
test(" \n",                               [720])
test("	tabs				out here",   [3324, 3518, 573, 14294, 1618])

// Equal prio merges are performed left-to-right 
test("ax\n####\nboo",                     [710, 198, 71050, 34093])

// UTF-8 multipoint character that should be found in vocabulary
test('镇',                                [104643])

// UTF-8 multipoint character that should NOT be found in vocabulary
test('🦙',                               [9468, 99, 247])

// Consecutive UTF-8 multipoint characters that are NOT found in a vocabulary and use different number of bytes
test('🦙Ꙋ',                             [9468, 99, 247, 166, 247, 232])
test('Ꙋ🦙',                             [166, 247, 232, 9468, 99, 247])

// Test input from official LLaMA 3 library
test('This is a test sentence.',         [2028, 374, 264, 1296, 11914, 13])

belladoreai · 2024-04-27T12:54:20Z

Note regarding the test cases above: I wrote those test cases for LLaMA 1 tokenization, and updated the "expected results" for LLaMA 3 (with the exception of the last case, which is copied from official repo). So, those tests are definitely better than nothing, but they are not "optimal test cases" to uncover potential bugs in LLaMA 3 tokenization. Would be good to find some adversarial inputs specific to LLaMA 3 tokenization and update the tests as such.

nkeilar · 2024-04-30T03:14:22Z

So I just tested the current llama3 models and despite the changes to support llama3 in latest ollama release, the official llama3 model that is uploaded still gives the wrong result

What is 3333 + 777? 

Let me calculate that for you!

33,333 + 777 = 34,110

this is with llama3:70b-instruct-q4_K_M 5338d7c58d8d

coder543 · 2024-04-30T03:20:06Z

@nkeilar what is “the official one”? If you mean ollama… ollama isn’t official, and also they won’t have regenerated their GGUFs yet. Wherever you’re looking, when was the file generated? The changes made require the GGUFs to be generated again. It isn’t just a change to llama.cpp.

I haven’t tried to test the changes, but the discussion around the changes sounded convincing to me.

coder543 · 2024-04-30T03:21:44Z

Someone here apparently had good luck with a new GGUF of the 8B model.

Jeximo · 2024-04-30T03:25:37Z

llama.cpp + 8b IQ_3M

coder543 · 2024-04-30T03:30:16Z

@Jeximo is that a 3-bit quantization? 🤔

regardless, looks correct.

ddh0 · 2024-04-30T03:35:17Z

After pulling the latest llama.cpp repo, and the latest Llama 3 8B Instruct HF repo, converting and quantizing, I can confirm that the issues are indeed fixed:

Galunid · 2024-05-11T10:51:02Z

fixed in #6920

coder543 added the bug-unconfirmed label Apr 25, 2024

This was referenced Apr 26, 2024

llama : add Deepseek support #5981 #6252

Closed

llama : improve BPE pre-processing + LLaMA 3 and Deepseek support #6920

Merged

coder543 mentioned this issue Apr 29, 2024

Llama 3 BPE tokenization needs improvement ollama/ollama#4026

Closed

Galunid mentioned this issue Apr 29, 2024

perplexity: more statistics, added documentation #6936

Merged

Galunid closed this as completed May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Something might be wrong with either llama.cpp or the Llama 3 GGUFs #6914

Something might be wrong with either llama.cpp or the Llama 3 GGUFs #6914

coder543 commented Apr 25, 2024 •

edited

Loading

Jeximo commented Apr 25, 2024 •

edited

Loading

LostRuins commented Apr 26, 2024

itodorovic commented Apr 26, 2024

Dampfinchen commented Apr 26, 2024

Jeximo commented Apr 26, 2024

LostRuins commented Apr 27, 2024

belladoreai commented Apr 27, 2024 •

edited

Loading

nkeilar commented Apr 30, 2024 •

edited

Loading

coder543 commented Apr 30, 2024

coder543 commented Apr 30, 2024

Jeximo commented Apr 30, 2024

coder543 commented Apr 30, 2024

ddh0 commented Apr 30, 2024

Galunid commented May 11, 2024

Something might be wrong with either llama.cpp or the Llama 3 GGUFs #6914

Something might be wrong with either llama.cpp or the Llama 3 GGUFs #6914

Comments

coder543 commented Apr 25, 2024 • edited Loading

Jeximo commented Apr 25, 2024 • edited Loading

LostRuins commented Apr 26, 2024

itodorovic commented Apr 26, 2024

Dampfinchen commented Apr 26, 2024

Jeximo commented Apr 26, 2024

LostRuins commented Apr 27, 2024

belladoreai commented Apr 27, 2024 • edited Loading

nkeilar commented Apr 30, 2024 • edited Loading

coder543 commented Apr 30, 2024

coder543 commented Apr 30, 2024

Jeximo commented Apr 30, 2024

coder543 commented Apr 30, 2024

ddh0 commented Apr 30, 2024

Galunid commented May 11, 2024

coder543 commented Apr 25, 2024 •

edited

Loading

Jeximo commented Apr 25, 2024 •

edited

Loading

belladoreai commented Apr 27, 2024 •

edited

Loading

nkeilar commented Apr 30, 2024 •

edited

Loading