-
Notifications
You must be signed in to change notification settings - Fork 974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix json_encode for unicode(emoji) characters #856
Conversation
Unfortunately the |
If PHP version is < 7.2 JSON_INVALID_UTF8_SUBSTITUTE is not defined. In such case it can be defined to 0 so that it does not have any effect but the code can still be executed without crashing
I've added define to 0 for JSON_INVALID_UTF8_SUBSTITUTE if it's not already defined. This makes it work properly with PHP < 7.2 as with 0 value it does not have any effect. I don't see how it will be possible to create those unit test as passing the emoji in the data will always fail without JSON_INVALID_UTF8_SUBSTITUTE on 7.0 and 7.1 with error code 5 This is not a complete solution for the issue. The right way to do it is to emulate the JSON_INVALID_UTF8_SUBSTITUTE on older PHP versions. I took a look at PHP sources to see what it actually does with the data. In php_json_escape_string method each time invalid UTF8 character is found it will be replaced with U+FFFD character. Emulating this in older PHP version will require iterating through complete data array and checking each string value manually and replacing problematic characters in it. I don't know if it's a good solution doing it that way and how much it will impact the performance. IMO this feature should be only available for PHP 7.2+ using JSON_INVALID_UTF8_SUBSTITUTE and if older version of PHP is used all string values should manually be checked for invalid UTF8 characters before passing them to the elasticsearch library. Otherwise all string data in the array will have to be checked by the emulation method even if there is no need for it due to not having UTF8 strings in the data. |
@ezimuel Any update on this one ? |
@najdanovicivan thanks for your contribution! I'll release it with the new major version 7.0.0 soon. |
@ezimuel, I'm not sure if the branching in this project will allow it, but if this fix could make it into 6.x as well it would be very helpful! |
@j13k yes, I can backport it but you have to consider that this fix will works only using PHP 7.2+. |
I think that's fair, and the "define" shim should provide BC for earlier versions. BTW, looking at the PHP supported versions page, pre-7.2 won't receive security updates past December, so folks really should be working on upgrading their codebases anyway! |
Fixes json_encode errors by substituting invalid UTF8 characters. Issue #855