Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix json_encode for unicode(emoji) characters #856

Merged
merged 3 commits into from
Jun 12, 2019
Merged

Fix json_encode for unicode(emoji) characters #856

merged 3 commits into from
Jun 12, 2019

Conversation

najdanovicivan
Copy link
Contributor

Fixes json_encode errors by substituting invalid UTF8 characters. Issue #855

@ezimuel
Copy link
Contributor

ezimuel commented Apr 18, 2019

Unfortunately the JSON_INVALID_UTF8_SUBSTITUTE constant has been introduced by PHP 7.2.
Right now, we are supporting PHP 7.0+ that means we cannot use this constant as you suggested.
Can you change the code to support this feature only for PHP 7.2+?
Moreover, can you provide a unit test?
If you are not familiar with unit testing let me know, I can point you in the right direction.
Thanks!

If PHP version is < 7.2 JSON_INVALID_UTF8_SUBSTITUTE is not defined.

In such case it can be defined to 0 so that it does not have any effect but the code can still be executed without crashing
@najdanovicivan
Copy link
Contributor Author

najdanovicivan commented Apr 30, 2019

I've added define to 0 for JSON_INVALID_UTF8_SUBSTITUTE if it's not already defined. This makes it work properly with PHP < 7.2 as with 0 value it does not have any effect.

I don't see how it will be possible to create those unit test as passing the emoji in the data will always fail without JSON_INVALID_UTF8_SUBSTITUTE on 7.0 and 7.1 with error code 5

This is not a complete solution for the issue. The right way to do it is to emulate the JSON_INVALID_UTF8_SUBSTITUTE on older PHP versions. I took a look at PHP sources to see what it actually does with the data. In php_json_escape_string method each time invalid UTF8 character is found it will be replaced with U+FFFD character.

Emulating this in older PHP version will require iterating through complete data array and checking each string value manually and replacing problematic characters in it. I don't know if it's a good solution doing it that way and how much it will impact the performance.

IMO this feature should be only available for PHP 7.2+ using JSON_INVALID_UTF8_SUBSTITUTE and if older version of PHP is used all string values should manually be checked for invalid UTF8 characters before passing them to the elasticsearch library. Otherwise all string data in the array will have to be checked by the emulation method even if there is no need for it due to not having UTF8 strings in the data.

@najdanovicivan
Copy link
Contributor Author

@ezimuel Any update on this one ?

@ezimuel ezimuel merged commit 2fab9f1 into elastic:master Jun 12, 2019
@ezimuel
Copy link
Contributor

ezimuel commented Jun 12, 2019

@najdanovicivan thanks for your contribution! I'll release it with the new major version 7.0.0 soon.

@j13k
Copy link

j13k commented Aug 9, 2019

@ezimuel, I'm not sure if the branching in this project will allow it, but if this fix could make it into 6.x as well it would be very helpful!

@ezimuel
Copy link
Contributor

ezimuel commented Aug 9, 2019

@j13k yes, I can backport it but you have to consider that this fix will works only using PHP 7.2+.

@j13k
Copy link

j13k commented Aug 9, 2019

@j13k yes, I can backport it but you have to consider that this fix will works only using PHP 7.2+.

I think that's fair, and the "define" shim should provide BC for earlier versions.

BTW, looking at the PHP supported versions page, pre-7.2 won't receive security updates past December, so folks really should be working on upgrading their codebases anyway!

@najdanovicivan najdanovicivan deleted the json-encode-patch branch August 19, 2019 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants