Skip to content

Latest commit

 

History

History
89 lines (70 loc) · 5.83 KB

chinese_word_segmentation.md

File metadata and controls

89 lines (70 loc) · 5.83 KB

Chinese Word Segmentation

Task

Chinese word segmentation is the task of splitting Chinese text (a sequence of Chinese characters) into words.

Example:

'上海浦东开发与建设同步' → ['上海', '浦东', '开发', ‘与', ’建设', '同步']

Systems

♠ marks the system that uses character unigram as input. ♣ marks the systme that uses character bigram as input.

  • Ma et al. (2018): BiLSTM-CRF + hyper-params search♠♣
  • Yang et al. (2017): Transition-based + Beam-search + Rich pretrain♠♣
  • Zhou et al. (2017): Greedy Search + word context♠
  • Chen et al. (2017): BiLSTM-CRF + adv. loss♠♣
  • Cai et al. (2017): Greedy Search+Span representation♠
  • Kurita et al. (2017): Transition-based + Joint model♠
  • Liu et al. (2016): neural semi-CRF♠
  • Cai and Zhao (2016): Greedy Search♠
  • Chen et al. (2015a): Gated Recursive NN♠♣
  • Chen et al. (2015b): BiLSTM-CRF♠♣

Evaluation

Metrics

F1-score

Dataset

Chinese Treebank 6

Model F1 Paper / Source Code
Ma et al. (2018) 96.7 State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Yang et al. (2017) 96.2 Neural Word Segmentation with Rich Pretraining
Zhou et al. (2017) 96.2 Word-Context Character Embeddings for Chinese Word Segmentation
Chen et al. (2017) 96.2 Adversarial Multi-Criteria Learning for Chinese Word Segmentation Github
Liu et al. (2016) 95.5 Exploring Segment Representations for Neural Segmentation Models Github
Chen et al. (2015b) 96.0 Long Short-Term Memory Neural Networks for Chinese Word Segmentation Github

Chinese Treebank 7

Model F1 Paper / Source Code
Ma et al. (2018) 96.6 State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Kurita et al. (2017) 96.2 Neural Joint Model for Transition-based Chinese Syntactic Analysis

AS

Model F1 Paper / Source Code
Ma et al. (2018) 96.2 State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Yang et al. (2017) 95.7 Neural Word Segmentation with Rich Pretraining
Cai et al. (2017) 95.3 Fast and Accurate Neural Word Segmentation for Chinese Github
Chen et al. (2017) 94.8 Adversarial Multi-Criteria Learning for Chinese Word Segmentation Github

CityU

Model F1 Paper / Source Code
Ma et al. (2018) 97.2 State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Cai et al. (2017) 95.6 Fast and Accurate Neural Word Segmentation for Chinese Github
Chen et al. (2017) 95.6 Adversarial Multi-Criteria Learning for Chinese Word Segmentation Github

PKU

Model F1 Paper / Source Code
Ma et al. (2018) 96.1 State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Cai et al. (2017) 95.8 Fast and Accurate Neural Word Segmentation for Chinese Github
Chen et al. (2017) 94.3 Adversarial Multi-Criteria Learning for Chinese Word Segmentation Github
Liu et al. (2016) 95.7 Exploring Segment Representations for Neural Segmentation Models Github
Cai and Zhao (2016) 95.7 Neural Word Segmentation Learning for Chinese Github

MSR

Model F1 Paper / Source Code
Ma et al. (2018) 98.1 State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Cai et al. (2017) 97.1 Fast and Accurate Neural Word Segmentation for Chinese Github
Chen et al. (2017) 96.0 Adversarial Multi-Criteria Learning for Chinese Word Segmentation Github
Liu et al. (2016) 97.6 Exploring Segment Representations for Neural Segmentation Models Github
Cai and Zhao (2016) 96.4 Neural Word Segmentation Learning for Chinese Github

Go back to the README