Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: mb_detect_encoding(): Argument #2 ($encodings) contains invalid encoding "UTF8" #15824

Closed
nicolas-grekas opened this issue Sep 10, 2024 · 6 comments · Fixed by #15829

Comments

@nicolas-grekas
Copy link
Contributor

Description

The following code: https://3v4l.org/3cnGD/rfc#vgit.master

<?php
echo mb_detect_encoding('abc', 'UTF8, ASCII');

Resulted in this output:

Fatal error: Uncaught ValueError: mb_detect_encoding(): Argument #2 ($encodings) contains invalid encoding "UTF8"

But I expected this output instead:

UTF-8

This is a regression, see https://3v4l.org/3cnGD

And this works if the list is given as an array: https://3v4l.org/lVva0/rfc#vgit.master

PHP Version

PHP 8.4

Operating System

No response

@cmb69
Copy link
Member

cmb69 commented Sep 10, 2024

Not sure when or why this has been changed, but using "UTF-8" is still supported (https://3v4l.org/8HA6Q/rfc#output).

@youkidearitai
Copy link
Contributor

Hmm. Seems could not catch at mbfl_name2encoding_ex.

mbfl_name2encoding_ex (name=0x7ff545cce8 "UTF8, ASCII", name_len=4) at /home/tekimen/src/php-src/ext/mbstring/libmbfl/mbfl/mbfl_encoding.c:338
338			if (key < sizeof(mbfl_encoding_ptr_list_after_hashing) / sizeof(mbfl_encoding_ptr_list_after_hashing[0])) {
(gdb) n
339				int8_t offset = mbfl_encoding_ptr_list_after_hashing[key];
(gdb) n
340				if (offset >= 0) {
(gdb) n
341					encoding = mbfl_encoding_ptr_list + offset;
(gdb) n
342					if (strncasecmp((*encoding)->name, name, name_len) == 0) {
(gdb) p offset
$28 = 45 '-'
(gdb) p encoding
$29 = (const mbfl_encoding * const *) 0x5556c95f78 <mbfl_encoding_ptr_list+360>
(gdb) p encoding->name
$30 = 0x5556a1f828 "ISO-8859-1"
(gdb) bt
#0  mbfl_name2encoding_ex (name=0x7ff545cce8 "UTF8, ASCII", name_len=4) at /home/tekimen/src/php-src/ext/mbstring/libmbfl/mbfl/mbfl_encoding.c:342
#1  0x00000055558f8464 in php_mb_parse_encoding_list (value=0x7ff545cce8 "UTF8, ASCII", value_length=11, return_list=0x7fffffd5e8, return_size=0x7fffffd5e0, persistent=false, arg_num=2)
    at /home/tekimen/src/php-src/ext/mbstring/mbstring.c:350
#2  0x00000055559077ac in zif_mb_detect_encoding (execute_data=0x7ff5413080, return_value=0x7fffffd688) at /home/tekimen/src/php-src/ext/mbstring/mbstring.c:3428

@cmb69
Copy link
Member

cmb69 commented Sep 10, 2024

According to git bisect, 3ad422e is the culprit. Maybe @nielsdos can have a look at this issue.

@youkidearitai
Copy link
Contributor

UTF8 is alias of UTF-8. Originally, catch below:

/* search aliases */
for (encoding = mbfl_encoding_ptr_list; *encoding; encoding++) {
if ((*encoding)->aliases) {
for (const char **alias = (*encoding)->aliases; *alias; alias++) {
if (strcasecmp(*alias, name) == 0) {
return *encoding;
}
}
}
}

However, strcasecmp can't catch variable name in "UTF8, ASCII".
I understanding only this.

@nielsdos
Copy link
Member

nielsdos commented Sep 10, 2024

The problem is simply that the loops below should use strncasecmp to take into account the length. Will fix this evening.

youkidearitai added a commit to youkidearitai/php-src that referenced this issue Sep 10, 2024
I fixed from strcasecmp to strncasecmp.
However, strncasecmp is specify size to php#3 parameter.
Hence, Add check strlen to imap and aliases.
youkidearitai added a commit to youkidearitai/php-src that referenced this issue Sep 10, 2024
I fixed from strcasecmp to strncasecmp.
However, strncasecmp is specify size to php#3 parameter.
Hence, Add check strlen to mime and aliases.
@youkidearitai
Copy link
Contributor

@nielsdos I tried create pull request. Feel free to comment. Thank you.
#15829

@nielsdos nielsdos linked a pull request Sep 10, 2024 that will close this issue
youkidearitai added a commit to youkidearitai/php-src that referenced this issue Sep 10, 2024
youkidearitai added a commit to youkidearitai/php-src that referenced this issue Sep 10, 2024
youkidearitai added a commit to youkidearitai/php-src that referenced this issue Sep 10, 2024
I fixed from strcasecmp to strncasecmp.
However, strncasecmp is specify size to php#3 parameter.
Hence, Add check length to mime and aliases.

Co-authored-by: Niels Dossche <[email protected]>
youkidearitai added a commit to youkidearitai/php-src that referenced this issue Sep 10, 2024
I fixed from strcasecmp to strncasecmp.
However, strncasecmp is specify size to php#3 parameter.
Hence, Add check length to mime and aliases.

Co-authored-by: Niels Dossche <[email protected]>
youkidearitai added a commit that referenced this issue Sep 11, 2024
I fixed from strcasecmp to strncasecmp.
However, strncasecmp is specify size to #3 parameter.
Hence, Add check length to mime and aliases.

Co-authored-by: Niels Dossche <[email protected]>
youkidearitai added a commit to youkidearitai/php-src that referenced this issue Sep 11, 2024
youkidearitai added a commit to youkidearitai/php-src that referenced this issue Sep 11, 2024
youkidearitai added a commit to youkidearitai/php-src that referenced this issue Sep 11, 2024
cmb69 pushed a commit that referenced this issue Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants