Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ksh93: "$*" joins positional parameters on the first byte of $IFS instead of first character #13

Closed
stephane-chazelas opened this issue Apr 27, 2016 · 1 comment
Labels

Comments

@stephane-chazelas
Copy link

$ ksh -c 'IFS=é; set : :; echo "$*"' | hd
00000000  3a c3 3a 0a                                       |:.:.|
00000004
$ echo é | hd
00000000  c3 a9 0a                                          |...|
00000003
$ locale charmap
UTF-8

Expected :é: (3a c3 a9 3a 0a)

POSIX says it must be the first character not byte http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02

bash and zsh use the first character.

@dannyweldon dannyweldon added the bug label Mar 5, 2017
@McDutchie
Copy link
Contributor

McDutchie commented Oct 9, 2017

On the release version (2012-08-01), expanding "$*" while IFS contains a UTF-8 character corrupts ksh93's shell-quoting mechanism, so that anything that produces output suitable for re-entry into the shell has the final quote missing -- even after IFS is restored.

This appears to be fixed on the current beta (2014-12-24), at least as compiled on Linux.

On the release version, the behaviour is as follows:

$ ksh -c 'i=$IFS; IFS=é; set : :; echo "$*"; IFS=$i; trap "echo end" EXIT; trap'
:?:
trap -- 'echo end EXIT
end

Note the missing quote after trap -- 'echo end. You get the same with the output of export -p, alias, etc.

Even using a subshell doesn't avoid the corruption (yay non-forking subshells). However, setting and then unsetting LC_ALL=C seems to be an effective workaround:

$ ksh -c 'IFS=é; set : :; echo "$*"; trap "echo end" EXIT; LC_ALL=C; unset LC_ALL; trap'
:?:
trap -- 'echo end' EXIT
end

McDutchie added a commit to modernish/modernish that referenced this issue Oct 9, 2017
libexec/modernish/cap/BUG_MULTIBIFS.t:
- Added. We're on a UTF-8 locale and the shell supports UTF-8
  characters in general (i.e. we don't have BUG_MULTIBYTE) --
  however, using multibyte characters as IFS field delimiters still
  doesn't work. For example, "$*" joins positional parameters on
  the first byte of $IFS instead of the first character.
  Found on ksh93 and mksh.
  Ref.: att/ast#13
  (On ksh93, only "$*" is affected; on mksh, multibyte IFS
  characters don't work in any context. I'm not bothering with
  separate bug tests; if multibyte IFS characters are broken for
  "$*" they shouldn't be used at all.)

README.md:
- document it
etscrivner added a commit to etscrivner/ast that referenced this issue Jul 28, 2018
Closes att#13. Previously, the `varsub` method used for the macro expansion of
`$param`, `${param}`, and `${param op word}` would incorrectly expand the
internal field separator (IFS) if it was a multibyte character. This was due to
truncation based on the incorrect assumption that the IFS would never be larger
than a single byte.

This change fixes this issue by carefully tracking the number of bytes that
should be persisted in the IFS case and ensuring that all bytes are written
during expansion and substitution.
etscrivner added a commit to etscrivner/ast that referenced this issue Jul 28, 2018
Closes att#13. Previously, the `varsub` method used for the macro expansion of
`$param`, `${param}`, and `${param op word}` would incorrectly expand the
internal field separator (IFS) if it was a multibyte character. This was due to
truncation based on the incorrect assumption that the IFS would never be larger
than a single byte.

This change fixes this issue by carefully tracking the number of bytes that
should be persisted in the IFS case and ensuring that all bytes are written
during expansion and substitution.
etscrivner added a commit to etscrivner/ast that referenced this issue Jul 28, 2018
Closes att#13. Previously, the `varsub` method used for the macro expansion of
`$param`, `${param}`, and `${param op word}` would incorrectly expand the
internal field separator (IFS) if it was a multibyte character. This was due to
truncation based on the incorrect assumption that the IFS would never be larger
than a single byte.

This change fixes this issue by carefully tracking the number of bytes that
should be persisted in the IFS case and ensuring that all bytes are written
during expansion and substitution.
etscrivner added a commit to etscrivner/ast that referenced this issue Jul 29, 2018
Closes att#13. Previously, the `varsub` method used for the macro expansion of
`$param`, `${param}`, and `${param op word}` would incorrectly expand the
internal field separator (IFS) if it was a multibyte character. This was due to
truncation based on the incorrect assumption that the IFS would never be larger
than a single byte.

This change fixes this issue by carefully tracking the number of bytes that
should be persisted in the IFS case and ensuring that all bytes are written
during expansion and substitution.
etscrivner added a commit to etscrivner/ast that referenced this issue Jul 29, 2018
Closes att#13. Previously, the `varsub` method used for the macro expansion of
`$param`, `${param}`, and `${param op word}` would incorrectly expand the
internal field separator (IFS) if it was a multibyte character. This was due to
truncation based on the incorrect assumption that the IFS would never be larger
than a single byte.

This change fixes this issue by carefully tracking the number of bytes that
should be persisted in the IFS case and ensuring that all bytes are written
during expansion and substitution.
etscrivner added a commit to etscrivner/ast that referenced this issue Jul 29, 2018
Closes att#13. Previously, the `varsub` method used for the macro expansion of
`$param`, `${param}`, and `${param op word}` would incorrectly expand the
internal field separator (IFS) if it was a multibyte character. This was due to
truncation based on the incorrect assumption that the IFS would never be larger
than a single byte.

This change fixes this issue by carefully tracking the number of bytes that
should be persisted in the IFS case and ensuring that all bytes are written
during expansion and substitution.
etscrivner added a commit to etscrivner/ast that referenced this issue Jul 29, 2018
Closes att#13. Previously, the `varsub` method used for the macro expansion of
`$param`, `${param}`, and `${param op word}` would incorrectly expand the
internal field separator (IFS) if it was a multibyte character. This was due to
truncation based on the incorrect assumption that the IFS would never be larger
than a single byte.

This change fixes this issue by carefully tracking the number of bytes that
should be persisted in the IFS case and ensuring that all bytes are written
during expansion and substitution.
JohnoKing added a commit to JohnoKing/ksh that referenced this issue Jul 25, 2020
This commit fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch.
The modernish regression test suite now only reports eight test failures.

src/cmd/ksh93/sh/macro.c:
- Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified for
  compatibility with C89). Explanation from att#737:

  Previously, the varsub method used for the macro expansion of $param, ${param},
  and ${param op word} would incorrectly expand the internal field separator (IFS)
  if it was a multibyte character. This was due to truncation based on the
  incorrect assumption that the IFS would never be larger than a single byte.

  This change fixes this issue by carefully tracking the number of bytes that
  should be persisted in the IFS case and ensuring that all bytes are written
  during expansion and substitution.

  Bug report: att#13

- Fixed another bug that caused multibyte characters with the same initial byte
  to be treated as the same character by the IFS. This bug was occurring because
  the first byte of a multibyte character wasn't being written to the stack when
  the IFS delimiter had the same initial byte:

  $ IFS=£
  $ v='§'
  $ set -- $v
  $ v="${1-}"
  $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug
  00000000  a7 0a                                             |..|
  00000002

  Bug report: att#1372

src/cmd/ksh93/tests/variables.sh:
- Add (reworked) regression tests from ksh2020 for the multibyte IFS bugs.
- Add a regression test for att#1372 based on the reproducer.
JohnoKing added a commit to JohnoKing/ksh that referenced this issue Jul 25, 2020
This commit fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch.
The modernish regression test suite now only reports eight test failures.

src/cmd/ksh93/sh/macro.c:
- Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified
  for compatibility with C89). Explanation from att#737:

  Previously, the varsub method used for the macro expansion of $param, ${param},
  and ${param op word} would incorrectly expand the internal field separator (IFS)
  if it was a multibyte character. This was due to truncation based on the
  incorrect assumption that the IFS would never be larger than a single byte.

  This change fixes this issue by carefully tracking the number of bytes that
  should be persisted in the IFS case and ensuring that all bytes are written
  during expansion and substitution.

  Bug report: att#13

- Fixed another bug that caused multibyte characters with the same initial byte
  to be treated as the same character by the IFS. This bug was occurring because
  the first byte of a multibyte character wasn't being written to the stack when
  the IFS delimiter had the same initial byte:

  $ IFS=£
  $ v='§'
  $ set -- $v
  $ v="${1-}"
  $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug
  00000000  a7 0a                                             |..|
  00000002

  Bug report: att#1372

src/cmd/ksh93/tests/variables.sh:
- Add (reworked) regression tests from ksh2020 for the multibyte IFS bugs.
- Add a regression test for att#1372 based on the reproducer.
JohnoKing added a commit to JohnoKing/ksh that referenced this issue Jul 25, 2020
This commit fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch.
The modernish regression test suite now only reports eight test failures.

src/cmd/ksh93/sh/macro.c:
- Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified
  for compatibility with C89). Explanation from att#737:

  Previously, the varsub method used for the macro expansion of $param, ${param},
  and ${param op word} would incorrectly expand the internal field separator (IFS)
  if it was a multibyte character. This was due to truncation based on the
  incorrect assumption that the IFS would never be larger than a single byte.

  This change fixes this issue by carefully tracking the number of bytes that
  should be persisted in the IFS case and ensuring that all bytes are written
  during expansion and substitution.

  Bug report: att#13

- Fixed another bug that caused multibyte characters with the same initial byte
  to be treated as the same character by the IFS. This bug was occurring because
  the first byte of a multibyte character wasn't being written to the stack when
  the IFS delimiter had the same initial byte:

  $ IFS=£
  $ v='§'
  $ set -- $v
  $ v="${1-}"
  $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug
  00000000  a7 0a                                             |..|
  00000002

  Bug report: att#1372

src/cmd/ksh93/tests/variables.sh:
- Add (reworked) regression tests from ksh2020 for the multibyte IFS bugs.
- Add a regression test for att#1372 based on the reproducer.
McDutchie pushed a commit to ksh93/ksh that referenced this issue Jul 25, 2020
Add support for multibyte characters to $IFS

This commit fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch.

src/cmd/ksh93/sh/macro.c:
- Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified
  for compatibility with C89). Explanation from att#737:

  Previously, the varsub method used for the macro expansion of $param, ${param},
  and ${param op word} would incorrectly expand the internal field separator (IFS)
  if it was a multibyte character. This was due to truncation based on the
  incorrect assumption that the IFS would never be larger than a single byte.

  This change fixes this issue by carefully tracking the number of bytes that
  should be persisted in the IFS case and ensuring that all bytes are written
  during expansion and substitution.

  Bug report: att#13

- Fixed another bug that caused multibyte characters with the same initial byte
  to be treated as the same character by the IFS. This bug was occurring because
  the first byte of a multibyte character wasn't being written to the stack when
  the IFS delimiter had the same initial byte:

  $ IFS=£
  $ v='§'
  $ set -- $v
  $ v="${1-}"
  $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug
  00000000  a7 0a                                             |..|
  00000002

  Bug report: att#1372

src/cmd/ksh93/tests/variables.sh:
- Add (reworked) regression tests from ksh2020 for the multibyte IFS bugs.
- Add a regression test for att#1372 based on the reproducer.
JohnoKing added a commit to JohnoKing/ksh that referenced this issue Jul 27, 2020
The following is quoted from Marcin Cieślak [*]:
When running under FreeBSD /bin/sh (and not ksh) we get spurious
file named '=' created in the root. This is because the "checksh"
function runs /bin/sh -c '(( .sh.version >= 20111111 ))' which
produces a "=" file with /bin/sh as a side effect.

This bug was reported in att#13, but was closed in error. I
was still getting the "=" file to generate on FreeBSD.

bin/package,
src/cmd/INIT/package.sh:
- Fix the creation of a spurious '=' file by making sure /bin/sh
  has support for (( ... )) arithmetic.

.gitignore:
- Remove the '=' file entry since it no longer has a purpose.

[*]: https://bsd.network/@saper/103196289917156347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants