You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For example, with the text: "Deze patiënt is al een tijd ziek" the detected language is sq (Albanian) while with the text "Deze patient is al een tijd ziek" (the same text without the "ë" character) the language is correctly identified as nl (Dutch). This also seems to happen for longer texts (if the text contains one or more "ë" characters it is classified as sq, when these characters are replaced with "e" the text is correctly detected as nl).
The text was updated successfully, but these errors were encountered:
When I implemented this back then, I thought that the letter ë would not be part of the main Dutch orthography. But now, after reading this Wikipedia article you seem to be right. I consider this a bug then.
Thank you for letting me know about this. I will correct it soon.
pemistahl
changed the title
Dutch text with diacritic character ë are identified with language code "sq" instead of "nl"
Incorrect language mapping for character 'ë'
Jul 20, 2020
For example, with the text:
"Deze patiënt is al een tijd ziek"
the detected language issq
(Albanian) while with the text"Deze patient is al een tijd ziek"
(the same text without the "ë" character) the language is correctly identified asnl
(Dutch). This also seems to happen for longer texts (if the text contains one or more "ë" characters it is classified assq
, when these characters are replaced with "e" the text is correctly detected asnl
).The text was updated successfully, but these errors were encountered: