-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Welsh preprocessing + one change to common.py #63
Conversation
JRMeyer
commented
Jan 29, 2019
- added a whitespace collapsing line in common.py, which will replace all whitespace(s) with a single space. This should also capture tabs and non-standard whitespaces.
- added to welsh pre-processor, to check for normal letters
@kdavis-mozilla -- there's going to be a whole slew of languages coming your way, and the logic in the pre-processing is the same for each. I found (thanks to Francesco from Localization) the CLDR github, which contains alphabets for lots of languages: https://github.com/unicode-cldr/cldr-misc-full/tree/master/main Right now, the LANG.py scripts are just checking if the given sentence is "purely" the alphabet from CLDR, or if there are other chars. This is a good template for people to get started on their own language, I should think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some issues that should be addressed here, and in all the other PR's too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more info on the issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM