Incorrect language mapping for character 'ë' #66

sandernugterenedia · 2020-07-20T12:15:25Z

For example, with the text: "Deze patiënt is al een tijd ziek" the detected language is sq (Albanian) while with the text "Deze patient is al een tijd ziek" (the same text without the "ë" character) the language is correctly identified as nl (Dutch). This also seems to happen for longer texts (if the text contains one or more "ë" characters it is classified as sq, when these characters are replaced with "e" the text is correctly detected as nl).

The text was updated successfully, but these errors were encountered:

pemistahl · 2020-07-20T13:32:43Z

When I implemented this back then, I thought that the letter ë would not be part of the main Dutch orthography. But now, after reading this Wikipedia article you seem to be right. I consider this a bug then.

Thank you for letting me know about this. I will correct it soon.

pemistahl added the bug Something isn't working label Jul 20, 2020

pemistahl changed the title ~~Dutch text with diacritic character ë are identified with language code "sq" instead of "nl"~~ Incorrect language mapping for character 'ë' Jul 20, 2020

pemistahl added this to the Lingua 1.0.2 milestone Jul 23, 2020

pemistahl added a commit that referenced this issue Jul 23, 2020

Fix wrong language mapping for character 'ë' (#66)

4230747

pemistahl added a commit that referenced this issue Jul 23, 2020

Update plots and accuracy table (#66)

bb260f3

pemistahl closed this as completed Jul 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect language mapping for character 'ë' #66

Incorrect language mapping for character 'ë' #66

sandernugterenedia commented Jul 20, 2020

pemistahl commented Jul 20, 2020 •

edited

Loading

Incorrect language mapping for character 'ë' #66

Incorrect language mapping for character 'ë' #66

Comments

sandernugterenedia commented Jul 20, 2020

pemistahl commented Jul 20, 2020 • edited Loading

pemistahl commented Jul 20, 2020 •

edited

Loading