Skip to content

Commit

Permalink
Fix GH-15824 mb_detect_encoding() invalid "UTF8" (#15829)
Browse files Browse the repository at this point in the history
I fixed from strcasecmp to strncasecmp.
However, strncasecmp is specify size to #3 parameter.
Hence, Add check length to mime and aliases.

Co-authored-by: Niels Dossche <[email protected]>
  • Loading branch information
youkidearitai and nielsdos authored Sep 11, 2024
1 parent db54576 commit dc5f3b9
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 2 deletions.
4 changes: 2 additions & 2 deletions ext/mbstring/libmbfl/mbfl/mbfl_encoding.c
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ const mbfl_encoding *mbfl_name2encoding_ex(const char *name, size_t name_len)
/* search MIME charset name */
for (encoding = mbfl_encoding_ptr_list; *encoding; encoding++) {
if ((*encoding)->mime_name) {
if (strcasecmp((*encoding)->mime_name, name) == 0) {
if (strncasecmp((*encoding)->mime_name, name, name_len) == 0 && (*encoding)->mime_name[name_len] == '\0') {
return *encoding;
}
}
Expand All @@ -359,7 +359,7 @@ const mbfl_encoding *mbfl_name2encoding_ex(const char *name, size_t name_len)
for (encoding = mbfl_encoding_ptr_list; *encoding; encoding++) {
if ((*encoding)->aliases) {
for (const char **alias = (*encoding)->aliases; *alias; alias++) {
if (strcasecmp(*alias, name) == 0) {
if (strncasecmp(name, *alias, name_len) == 0 && (*alias)[name_len] == '\0') {
return *encoding;
}
}
Expand Down
37 changes: 37 additions & 0 deletions ext/mbstring/tests/gh15824.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
--TEST--
GH-15824 (ValueError: mb_detect_encoding(): Argument #2 ($encodings) contains invalid encoding "UTF8")
--EXTENSIONS--
mbstring
--FILE--
<?php
echo "== alias name ==\n";
var_dump(mb_detect_encoding('abc', 'UTF8, ASCII'));
var_dump(mb_detect_encoding('こんにちは', 'UTF8, ASCII'));
var_dump(mb_detect_encoding('こんにちは', 'ASCII, UTF8'));
var_dump(mb_detect_encoding("\xC5", 'US-ASCII, LATIN4'));
var_dump(mb_detect_encoding("\xC5", 'US-ASCII, cyrillic'));
/* 0x9D is not located in CP1254 */
var_dump(mb_detect_encoding("\x9D", 'US-ASCII, CP1254, cyrillic'));
var_dump(mb_detect_encoding("\x9D", 'US-ASCII, CP1254, cyrillic', false));

echo "== mime name ==\n";
var_dump(mb_detect_encoding('abc', 'ANSI_X3.4-1968, ISO-8859-1'));
var_dump(mb_detect_encoding('abc', 'CP50220, ANSI_X3.4-1968'));
/* last comma is not mistake, intentionally */
var_dump(mb_detect_encoding(bin2hex('1b24422422242424262428242a1b2842'), 'CP50220, ANSI_X3.4-1968,', false));
var_dump(mb_detect_encoding('😄', 'US-ASCII, UTF-8-Mobile#KDDI-B, UTF-8'));
?>
--EXPECT--
== alias name ==
string(5) "UTF-8"
string(5) "UTF-8"
string(5) "UTF-8"
string(10) "ISO-8859-4"
string(10) "ISO-8859-5"
string(10) "ISO-8859-5"
string(10) "ISO-8859-5"
== mime name ==
string(5) "ASCII"
string(7) "CP50220"
string(7) "CP50220"
string(19) "UTF-8-Mobile#KDDI-B"

2 comments on commit dc5f3b9

@nielsdos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youkidearitai You forgot to add a NEWS entry ;)

@youkidearitai
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Created PR #15835 .

Please sign in to comment.