Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

语料分类 #64

Open
zz00t opened this issue Feb 12, 2025 · 1 comment
Open

语料分类 #64

zz00t opened this issue Feb 12, 2025 · 1 comment

Comments

@zz00t
Copy link

zz00t commented Feb 12, 2025

有对应数据的语料分类的目录之类的吗?大佬

@esbatmop
Copy link
Owner

esbatmop commented Feb 17, 2025

1.huggingface上有少量分类的数据。
2.“为了长而持久的提供数据集的更新和下载,为了尽量避免版权争议,本数据集不提供压缩包内数据的索引和分类。”
3.所有压缩包内数据都清洗为7种语料格式:https://wiki.mnbvc.org/doku.php/%E7%8E%B0%E6%9C%89%E8%AF%AD%E6%96%99%E6%A0%BC%E5%BC%8F
4.所有压缩包解压后的子目录上都在目录名后缀上对本目录内语料做了分类说明

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants