-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix iconv /w locale and UTF-8 input charset #9149
Conversation
cd13c25
to
e1bfa7d
Compare
I'm not sure that "Zlutouck'y kun\n" is wrong. Apparently, bot "Žluťoučký" and "Žluťoučky" are Czech words but with different meanings (please correct me if I'm wrong), and as such the apostrophe may be added deliberately to be able to distinguish both. Anyhow, I don't think this really depends on the operating system, but rather on the |
I am native Czech speaker 😃 and But I opened this issue not because of this difference althought is it wrong, but because in regular linux /w glibc, the output depends on the locale, but as long as the input encoding is UTF-8 and the target encoding is ASCII (and the result should have only 7-bit ASCII characters), it must not depend on locale, see the first example:
|
I don't think this is a PHP issue, and as such we likely can't do anything about it. The locale issue is likely something that would need to be changed in glibc (or we could drop supporting anything but libiconv, but that may cause major headaches for users and distro managers). The transliteration of |
So even the |
I'm not a 100% sure about that. One would need to debug, or at least check the glibc |
I am closing this PR as I cannot solve it, but the results are very strange which make iconv almost unusable in production as long specific locale cannot be guaranteed. |
can someone please help me understand what is going on with iconv when locale is set?
demo: https://3v4l.org/RmIZA
on linux outputs:
on Windows:
on Alpine OS/musl libc:
The input is always in UTF-8, so it should be independent on locale. The expected output
Zlutoucky kun
is in lower/7-bit ASCII only, thus the locale should not affect the result. But for some reasons, it does, and on Windows/Alpine OS/FreeBSD the result is independent on the locale, but always wrong.It seems like the iconv does "parse" the input differently/wrong based on locale even if the input encoding is specified as UTF-8.