Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT4o出现低级bug:发现最新token中的垃圾语料及实测GPT4o胡言乱语出现幻觉 #297

Closed
alexhmyang opened this issue May 18, 2024 · 3 comments

Comments

@alexhmyang
Copy link

GPT4o出现低级bug:发现最新token中的垃圾语料及实测GPT4o胡言乱语出现幻觉

微信截图_20240517140555

微信截图_20240517140604

微信截图_20240517140621

微信截图_20240517140405

微信截图_20240517113026

比如:词表里有一个垃圾词是“微信公众号天天中彩票”, 你只要在 gpt4o官网输入: 微信公众号天天中彩票 是什么意思 后,他就会胡言乱语了,比如他回答:【 “微信娱乐代理”可能是一个涉及成人内容的微信活动或群体。 “成人视频”是指可能包含成人视频或直播内容的服务。】大家可以看到,实际回答跟我们的问题一点关系都没有

@hauntsaninja
Copy link
Collaborator

I believe the folks who chose the vocabulary for GPT-4o are now aware of this, maybe they'll ship a patch to the vocab or GPT-4o or both.

This is known as the solidgoldmagikarp problem, named after a similarly problematic token from the GPT-2 vocabulary.

@hauntsaninja hauntsaninja closed this as not planned Won't fix, can't repro, duplicate, stale May 18, 2024
@echo-valor
Copy link

请问你是如何得到其中文词表的?

@alexhmyang
Copy link
Author

I believe the folks who chose the vocabulary for GPT-4o are now aware of this, maybe they'll ship a patch to the vocab or GPT-4o or both.我相信为 GPT-4o 选择词汇表的人们现在已经意识到这一点,也许他们会为 vocab 或 GPT-4o 或两者提供补丁。

This is known as the solidgoldmagikarp problem, named after a similarly problematic token from the GPT-2 vocabulary.这被称为“solidgoldmagikarp 问题”,以 GPT-2 词汇表中类似问题的标记命名。

hope they notice this issue and solve this problem, i think this is also related with openai safety team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants