GitHub - ymcui/cmrc2018: A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)

中文说明 | English

This repository contains the data for The Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018). We will present our paper on EMNLP 2019.

Title: A Span-Extraction Dataset for Chinese Machine Reading Comprehension
Authors: Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma, Shijin Wang, Guoping Hu
Link: https://www.aclweb.org/anthology/D19-1600/
Venue: EMNLP-IJCNLP 2019

Open Challenge Leaderboard (New!)

Keep track of the latest state-of-the-art systems on CMRC 2018 dataset.
https://ymcui.github.io/cmrc2018/

CMRC 2018 Public Datasets

Please download CMRC 2018 public datasets via the following CodaLab Worksheet.
https://worksheets.codalab.org/worksheets/0x92a80d2fab4b4f79a2b4064f7ddca9ce

Submission Guidelines

If you would like to test your model on the hidden test and challenge set, please follow the instructions on how to submit your model via CodaLab worksheet.
https://worksheets.codalab.org/worksheets/0x96f61ee5e9914aee8b54bd11e66ec647/

**Note that the test set on CLUE is NOT the complete test set. If you wish to evaluate your model OFFICIALLY on CMRC 2018, you should follow the guidelines here. **

Quick Load Through 🤗datasets

You can also access this dataset as part of the HuggingFace datasets library library as follow:

!pip install datasets
from datasets import load_dataset
dataset = load_dataset('cmrc2018')

More details on the options and usage for this library can be found on the nlp repository at https://github.com/huggingface/nlp

Reference

If you wish to use our data in your research, please cite:

@inproceedings{cui-emnlp2019-cmrc2018,
    title = "A Span-Extraction Dataset for {C}hinese Machine Reading Comprehension",
    author = "Cui, Yiming  and
      Liu, Ting  and
      Che, Wanxiang  and
      Xiao, Li  and
      Chen, Zhipeng  and
      Ma, Wentao  and
      Wang, Shijin  and
      Hu, Guoping",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1600",
    doi = "10.18653/v1/D19-1600",
    pages = "5886--5891",
}

International Standard Language Resource Number (ISLRN)

ISLRN: 013-662-947-043-2

http://www.islrn.org/resources/resources_info/7952/

Official HFL WeChat Account

Follow Joint Laboratory of HIT and iFLYTEK Research (HFL) on WeChat.

Contact us

Please submit an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
baseline		baseline
data		data
squad-style-data		squad-style-data
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
README_CN.md		README_CN.md
banner.png		banner.png
qrcode.jpg		qrcode.jpg
sponsor.png		sponsor.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Challenge Leaderboard (New!)

CMRC 2018 Public Datasets

Submission Guidelines

Quick Load Through 🤗datasets

Reference

International Standard Language Resource Number (ISLRN)

Official HFL WeChat Account

Contact us

About

Releases

Packages

Contributors 2

Languages

License

ymcui/cmrc2018

Folders and files

Latest commit

History

Repository files navigation

Open Challenge Leaderboard (New!)

CMRC 2018 Public Datasets

Submission Guidelines

Quick Load Through 🤗datasets

Reference

International Standard Language Resource Number (ISLRN)

Official HFL WeChat Account

Contact us

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages