TODO
all (41859) -> minus15 (29487)
|
-> plus15 -> train (9372)
|
-> dev (1500)
|
-> test (1500)
Download the latest version of words.hk data from the download page. Then run:
gzip -d all-*.csv.gz
python extract.py
python split_train_dev_test.py
python split_15.py
- Hong Kong Cantonese Corpus (HKCanCor)
- 香港大學語言學系
- 林璃蝶女士
- Can Cheng
- 昭源字體 Chiron Fonts
- 劉擇明博士