parsing_json.php improvements #75

Alcaro · 2017-11-27T00:40:37Z

(Apologies if this isn't the correct place, I couldn't find anything better.)

Just found http://seriot.ch/parsing_json.php. Great writeup, it's surprising how something so seemingly simple can have so many ways to screw up. I found a few possible improvements:

i_string_iso_latin_1.json | ["E9"]
n_string_invalid_utf-8.json | ["FF"]

As of #30, both are i_.

["\uD800\uD800"] makes some parsers go nuts. R jsonlite yields ["\U00010000"], while Ruby parser yields ["F0908080"]. I still don't get where this value comes from.

Overeager decoding of surrogate pairs. \uD800\uDC00 should yield \U00010000, I guess that one ignores the top 10 bits of the supposed surrogate-low? F0908080 is \U00010000 in UTF-8, again ignoring the top 10 bits.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parsing_json.php improvements #75

parsing_json.php improvements #75

Alcaro commented Nov 27, 2017

parsing_json.php improvements #75

parsing_json.php improvements #75

Comments

Alcaro commented Nov 27, 2017