Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect language mapping for character 'ë' #66

Closed
sandernugterenedia opened this issue Jul 20, 2020 · 1 comment
Closed

Incorrect language mapping for character 'ë' #66

sandernugterenedia opened this issue Jul 20, 2020 · 1 comment
Labels
bug Something isn't working
Milestone

Comments

@sandernugterenedia
Copy link

For example, with the text: "Deze patiënt is al een tijd ziek" the detected language is sq (Albanian) while with the text "Deze patient is al een tijd ziek" (the same text without the "ë" character) the language is correctly identified as nl (Dutch). This also seems to happen for longer texts (if the text contains one or more "ë" characters it is classified as sq, when these characters are replaced with "e" the text is correctly detected as nl).

@pemistahl
Copy link
Owner

pemistahl commented Jul 20, 2020

When I implemented this back then, I thought that the letter ë would not be part of the main Dutch orthography. But now, after reading this Wikipedia article you seem to be right. I consider this a bug then.

Thank you for letting me know about this. I will correct it soon.

@pemistahl pemistahl added the bug Something isn't working label Jul 20, 2020
@pemistahl pemistahl changed the title Dutch text with diacritic character ë are identified with language code "sq" instead of "nl" Incorrect language mapping for character 'ë' Jul 20, 2020
@pemistahl pemistahl added this to the Lingua 1.0.2 milestone Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants