Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multibyte characters to $IFS #92

Merged
merged 3 commits into from
Jul 25, 2020

Conversation

JohnoKing
Copy link

This pull request fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch. The modernish regression test suite now only reports eight test failures.

  • Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified for compatibility with C89). Explanation from Fix expansion of multibyte IFS characters att/ast#737:

    Previously, the varsub method used for the macro expansion of $param, ${param}, and ${param op word} would incorrectly expand the internal field separator (IFS) if it was a multibyte character. This was due to truncation based on the incorrect assumption that the IFS would never be larger than a single byte.

    This change fixes this issue by carefully tracking the number of bytes that should be persisted in the IFS case and ensuring that all bytes are written during expansion and substitution.

    Bug report: ksh93: "$*" joins positional parameters on the first byte of $IFS instead of first character att/ast#13

  • Fixed another bug that caused multibyte characters with the same initial byte to be treated as the same character by the IFS. This bug was occurring because the initial byte wasn't being written to the stack when the IFS delimiter and multibyte character had the same initial byte:

    $ IFS=£
    $ v='§'
    $ set -- $v
    $ v="${1-}"
    $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug
    00000000  a7 0a                                             |..|
    00000002

    The code in question was skipping past the initial byte with continue:

    ksh/src/cmd/ksh93/sh/macro.c

    Lines 2403 to 2406 in 8c16f38

    if(n==S_MBYTE)
    {
    if(sh_strchr(mp->ifsp,cp-1)<0)
    continue;

    This problem is fixed by putting the initial byte on the stack with sfputc before the continue.

    Bug report: IFS: UTF-8 support is incomplete att/ast#1372

This commit fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch.
The modernish regression test suite now only reports eight test failures.

src/cmd/ksh93/sh/macro.c:
- Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified
  for compatibility with C89). Explanation from att#737:

  Previously, the varsub method used for the macro expansion of $param, ${param},
  and ${param op word} would incorrectly expand the internal field separator (IFS)
  if it was a multibyte character. This was due to truncation based on the
  incorrect assumption that the IFS would never be larger than a single byte.

  This change fixes this issue by carefully tracking the number of bytes that
  should be persisted in the IFS case and ensuring that all bytes are written
  during expansion and substitution.

  Bug report: att#13

- Fixed another bug that caused multibyte characters with the same initial byte
  to be treated as the same character by the IFS. This bug was occurring because
  the first byte of a multibyte character wasn't being written to the stack when
  the IFS delimiter had the same initial byte:

  $ IFS=£
  $ v='§'
  $ set -- $v
  $ v="${1-}"
  $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug
  00000000  a7 0a                                             |..|
  00000002

  Bug report: att#1372

src/cmd/ksh93/tests/variables.sh:
- Add (reworked) regression tests from ksh2020 for the multibyte IFS bugs.
- Add a regression test for att#1372 based on the reproducer.
@McDutchie
Copy link

McDutchie commented Jul 25, 2020

Did you figure out the fix for field splitting (the second fix)? Nice find!

Re:

The modernish regression test suite now only reports eight test failures.

I would hope those are xfails, not actual test failures…

@McDutchie McDutchie merged commit 8b5f11d into ksh93:master Jul 25, 2020
@JohnoKing
Copy link
Author

JohnoKing commented Jul 25, 2020

I would hope those are xfails, not actual test failures…

No need to worry, the test failures are just xfails for bugs like BUG_BRACQUOT.

@JohnoKing JohnoKing deleted the fix-multibyte-ifs branch July 27, 2020 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants