Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix ${v//“”/str} and multibyte in ${v//~{E}/str} (re: ceae1e4)
Koichi Nakashima (@ko1nksm) reports: > $ ksh -c 'v=あいう; echo "${v//""/-}"' > -あ-�-�-い-�-�-う-�-�- > > $ ksh -c 'v=あいう; echo "${v//""/-}"' | od -tx1 > 0000000 2d e3 81 82 2d 81 2d 82 2d e3 81 84 2d 81 2d 84 > 0000020 2d e3 81 86 2d 81 2d 86 2d 0a > 0000032 There are two problems here. First is that the glob pattern in the expansion is empty (after removing the double quotes). Since an empty glob pattern matches nothing, the output string should simply be the value of v. However, since the referenced commit, the empty pattern is changed to ~(E)^, an (anchored) empty ERE. Unlike an empty glob pattern, an empty regular expression matches everything. So a '-' should be inserted at the start, end, and between each character. Which brings us to problem two: lack of multibyte processing in that scenario, causing corrupted output. As this commit, things work correctly: $ ksh -c 'v=あいう; echo "${v//""/-}"' あいう $ ksh -c 'v=あいう; echo "${v//~(E)/-}"' -あ-い-う- src/cmd/ksh93/sh/macro.c: varsub(): - Only convert an empty pattern to ~(E)^ if the operator character is '#'. This should not be done for '/' or '//'. This fixes the first problem. - The case of ${v//~(E)/str} brings us to some code that special- cases the behaviour for nmatch && match[1]==0 to avoid an infinite loop. Make that code multibyte-aware by calculating the size of each character in bytes using mbsize, and advancing by that number instead of 1. (In the case of an invalid multibyte sequence with mbsize returning -1, default to advancing by one byte as before.) This fixes the second problem. Resolves: #813
- Loading branch information