tiktoken.encoding_for_model fails for o1 #367

Somerandomguy10111 · 2025-01-11T05:33:32Z

When you attempt to get retrieve an encoding through the below code, you are met with the following error message: KeyError: 'Could not automatically map o1 to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.'

import tiktoken
encoding = tiktoken.encoding_for_model('o1')

I ran this unsing tiktoken==0.8.0 i.e. the latest release. I'm not sure if this is intended behaviour or not, I know that o1 is just an alias. But I think that it would make sense to support aliases.

The reason why this fails is because the MODEL_PREFIX_TO_ENCODING dictionary responsible for mapping from model prefix to encoding looks like this

"o1-": "o200k_base",

So tiktoken.encoding_for_model('o1-') and complete o1 model names work without issue. Either the dashes in the prefix dictionary could be removed or o1 could be added to the MODEL_TO_ENCODING dictionary explicitly.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tiktoken.encoding_for_model fails for o1 #367

tiktoken.encoding_for_model fails for o1 #367

Somerandomguy10111 commented Jan 11, 2025

tiktoken.encoding_for_model fails for o1 #367

tiktoken.encoding_for_model fails for o1 #367

Comments

Somerandomguy10111 commented Jan 11, 2025