Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiktoken.encoding_for_model fails for o1 #367

Open
Somerandomguy10111 opened this issue Jan 11, 2025 · 0 comments
Open

tiktoken.encoding_for_model fails for o1 #367

Somerandomguy10111 opened this issue Jan 11, 2025 · 0 comments

Comments

@Somerandomguy10111
Copy link

When you attempt to get retrieve an encoding through the below code, you are met with the following error message: KeyError: 'Could not automatically map o1 to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.'

import tiktoken
encoding = tiktoken.encoding_for_model('o1') 

I ran this unsing tiktoken==0.8.0 i.e. the latest release. I'm not sure if this is intended behaviour or not, I know that o1 is just an alias. But I think that it would make sense to support aliases.

The reason why this fails is because the MODEL_PREFIX_TO_ENCODING dictionary responsible for mapping from model prefix to encoding looks like this

"o1-": "o200k_base",

So tiktoken.encoding_for_model('o1-') and complete o1 model names work without issue. Either the dashes in the prefix dictionary could be removed or o1 could be added to the MODEL_TO_ENCODING dictionary explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant