Skip to content
This repository has been archived by the owner on May 10, 2023. It is now read-only.

Cleanup rules for Thai #324

Closed
bact opened this issue Aug 30, 2020 · 1 comment · Fixed by #325
Closed

Cleanup rules for Thai #324

bact opened this issue Aug 30, 2020 · 1 comment · Fixed by #325
Labels
enhancement New feature or request

Comments

@bact
Copy link
Contributor

bact commented Aug 30, 2020

  • remove
    • periods and spaces at the beginning and end of sentence
    • orphan period
    • zero-width chars (occurs in some Thai texts, due to input method or some processing)
  • add
    • a space before and after Maiyamok, ?, and !
  • condense
    • repetitive chars to one char: spaces, Maiyamok
  • normalize
    • Sara E + Sara E -> Sara Ae
    • Nikhahit + (Tone marks +) Sara Aa -> (Tone marks +) Sara Am

--

See guidelines from the Office of the Royal Society

See also validation rules #318

@bact bact changed the title Add cleanup for Thai Cleanup rules for Thai Aug 30, 2020
@bact
Copy link
Contributor Author

bact commented Aug 30, 2020

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants