Work around bug in `enc2utf8` / `translateCharUTF8` on Windows #287

patperry · 2017-11-08T20:03:14Z

R's handling of native text is buggy on Windows. Specifically, R marks all Windows-1252 text as Latin-1. This causes problems when converting from marked "latin1" strings to "UTF-8": bytes in the range 0x80 to 0x9F get translated as U+0080 to U+009F. See, for example, the input string "You don‘t get “your” money’s worth":

More context: https://stat.ethz.ch/pipermail/r-devel/2017-September/074908.html

You can work around this bug by interpreting CE_LATIN1 as Windows-1252 on Windows. Feel free to copy code from https://github.com/patperry/r-utf8/blob/master/src/util.c#L59

The text was updated successfully, but these errors were encountered:

gagolews · 2017-11-10T10:20:07Z

Hi, I've worked that already in #270

patperry · 2017-11-10T12:23:19Z

great to hear! sorry for not checking the devel version before posting

gagolews · 2017-11-10T12:24:04Z

no worries!
I'll be filing a CRAN update of stringi today

gagolews closed this as completed Nov 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work around bug in `enc2utf8` / `translateCharUTF8` on Windows #287

Work around bug in `enc2utf8` / `translateCharUTF8` on Windows #287

patperry commented Nov 8, 2017

gagolews commented Nov 10, 2017

patperry commented Nov 10, 2017

gagolews commented Nov 10, 2017

Work around bug in enc2utf8 / translateCharUTF8 on Windows #287

Work around bug in enc2utf8 / translateCharUTF8 on Windows #287

Comments

patperry commented Nov 8, 2017

gagolews commented Nov 10, 2017

patperry commented Nov 10, 2017

gagolews commented Nov 10, 2017

Work around bug in `enc2utf8` / `translateCharUTF8` on Windows #287

Work around bug in `enc2utf8` / `translateCharUTF8` on Windows #287