Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The letter 'ç' is no longer upper-cased by .toUpperCase from 7.1.0 #9785

Closed
Yomguithereal opened this issue Nov 24, 2016 · 16 comments
Closed
Assignees
Labels
i18n-api Issues and PRs related to the i18n implementation. question Issues that look for answers.

Comments

@Yomguithereal
Copy link

Yomguithereal commented Nov 24, 2016

  • Version: 7.1.0
  • Platform: OSX

Hello @nodejs,

I stumbled upon what seems to be a bug in node.js v7.1.0.

In node v6.9.1 & v7.0.0, we have the following:

'suçais'.toUpperCase()
>>> 'SUÇAIS'

while in node v7.1.0

'suçais'.toUpperCase()
>>> 'SUçAIS'

I noticed this because it made a CI build fail (ref) and broke some of the phonetic algorithms of the Talisman library.

Sorry if this is normal and I did not read the changelogs carefully enough.

Have a good day.

@bnoordhuis bnoordhuis added i18n-api Issues and PRs related to the i18n implementation. question Issues that look for answers. labels Nov 24, 2016
@bnoordhuis
Copy link
Member

Does it work when you install the full-icu package? V8 delegates more to ICU in v7.x and the release binaries only include en_US because of size constraints.

@Yomguithereal
Copy link
Author

Yomguithereal commented Nov 24, 2016

Hello @bnoordhuis, I tried the following:

git clone [email protected]:Yomguithereal/talisman.git
cd talisman
git checkout 003d44a3aaa81155331aaa23bc7708fa489428d6
npm install
npm install full-icu
npm test

I still have the same problem. Did I do it wrong?

@paulgirard
Copy link

I could replicate this bug on Linux Mint :

$uname -a
Linux ### 3.19.0-32-generic #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

@bnoordhuis
Copy link
Member

cc @nodejs/intl

@Yomguithereal
Copy link
Author

I tried to build nodejs up to the 0f871e1 commit and uppercasing works as expected. This seems to confirm that the introduction of small-icu in the 3d1766f commit is the source of the problem, as @bnoordhuis said.

@helio-frota
Copy link
Contributor

confirmed here too

[hf@archT440 ~]$ node -v
v7.2.0
[hf@archT440 ~]$ node
> 'aço'.toUpperCase()
'AçO'
> 

@srl295
Copy link
Member

srl295 commented Nov 28, 2016

It's not that commit - it's ICU 58.1. Applying ICU 57 on top of current HEAD also gives the right result. Or rather, some interaction between node/v8 and ICU 58. Because ICU's own toUpper works correctly for v58.1 here.

@srl295
Copy link
Member

srl295 commented Nov 28, 2016

Also configure --without-intl produces Ç

@jungshik
Copy link

The latest v8 (ToT) does not use el-Upper transliterator any more and just uses ICU's uppercase API. I'll test with ToT v8.

@srl295
Copy link
Member

srl295 commented Nov 28, 2016

Seems to be something wrong in v8's optimizations under Runtime_StringToUpperCaseI18N. Commenting out the optimizations produces the correct result. Thanks @jungshik - I may float a patch that just disables these lines for now, to be replaced when v8 is refreshed.

a couple possible workarounds

  • compile: configure --without-intl
  • runtime: node --no_icu_case_mapping

@jungshik
Copy link

I can reproduce the bug with the latest ToT (without reading the bug much, I thought it's about Greek) when --icu_case_mapping flag is on. So, there's obviously a bug. Thank you for the report.

@srl295
Copy link
Member

srl295 commented Nov 28, 2016

@jungshik It's Bug synergy!!

@jungshik
Copy link

There was a typo in ToLatin1Upper() ( E7 should be F7) in runtime-i18n.cc

@jungshik
Copy link

@Yomguithereal
Copy link
Author

Thanks @srl295 and @jungshik. Tell me if I can do anything to help.

@jungshik
Copy link

It's fixed in v8 ToT: https://bugs.chromium.org/p/v8/issues/detail?id=5681&desc=2
Thanks again for the report.

srl295 added a commit to srl295/node that referenced this issue Dec 30, 2016
Original commit message:
  Fix the uppercasing of U+00E7(ç) and U+00F7(÷)
  Due to a typo in runtime-i18n.js, 'ç'(U+00E7) was not uppercased
  while '÷'(U+00F7) was incorrectly uppercased to '×'(U+00D7).

  Add a comprehensive test for Latin-1 supplemental block
  (U+00A0 ~ U+00FF). (they're special-cased for speed-up and
  needs to have a test for the range.).

  TEST=intl/general/case-mapping
  BUG=v8:5681

  Review-Url: https://codereview.chromium.org/2533033003
  Cr-Commit-Position: refs/heads/master@{nodejs#41331}

PR-URL: nodejs#9828
Fixes: nodejs#9785
srl295 added a commit to srl295/node that referenced this issue Dec 30, 2016
* add test for ç/Ç in various locales

Reviewed-by: TBD
PR-URL: nodejs#9828
Fixes: nodejs#9785
MylesBorins pushed a commit that referenced this issue Jan 6, 2017
* add test for ç/Ç in various locales

PR-URL: #9828
Fixes: #9785
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ali Ijaz Sheikh <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
italoacasas pushed a commit to italoacasas/node that referenced this issue Jan 18, 2017
* add test for ç/Ç in various locales

PR-URL: nodejs#9828
Fixes: nodejs#9785
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ali Ijaz Sheikh <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
italoacasas pushed a commit to italoacasas/node that referenced this issue Jan 19, 2017
* add test for ç/Ç in various locales

PR-URL: nodejs#9828
Fixes: nodejs#9785
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ali Ijaz Sheikh <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
targos pushed a commit that referenced this issue Jan 28, 2017
Original commit message:
  Fix the uppercasing of U+00E7(ç) and U+00F7(÷)
  Due to a typo in runtime-i18n.js, 'ç'(U+00E7) was not uppercased
  while '÷'(U+00F7) was incorrectly uppercased to '×'(U+00D7).

  Add a comprehensive test for Latin-1 supplemental block
  (U+00A0 ~ U+00FF). (they're special-cased for speed-up and
  needs to have a test for the range.).

  TEST=intl/general/case-mapping
  BUG=v8:5681

  Review-Url: https://codereview.chromium.org/2533033003
  Cr-Commit-Position: refs/heads/master@{#41331}

PR-URL: #9828
Fixes: #9785
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ali Ijaz Sheikh <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
targos pushed a commit that referenced this issue Jan 28, 2017
* add test for ç/Ç in various locales

PR-URL: #9828
Fixes: #9785
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ali Ijaz Sheikh <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
italoacasas pushed a commit to italoacasas/node that referenced this issue Jan 30, 2017
Original commit message:
  Fix the uppercasing of U+00E7(ç) and U+00F7(÷)
  Due to a typo in runtime-i18n.js, 'ç'(U+00E7) was not uppercased
  while '÷'(U+00F7) was incorrectly uppercased to '×'(U+00D7).

  Add a comprehensive test for Latin-1 supplemental block
  (U+00A0 ~ U+00FF). (they're special-cased for speed-up and
  needs to have a test for the range.).

  TEST=intl/general/case-mapping
  BUG=v8:5681

  Review-Url: https://codereview.chromium.org/2533033003
  Cr-Commit-Position: refs/heads/master@{nodejs#41331}

PR-URL: nodejs#9828
Fixes: nodejs#9785
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ali Ijaz Sheikh <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
italoacasas pushed a commit to italoacasas/node that referenced this issue Jan 30, 2017
* add test for ç/Ç in various locales

PR-URL: nodejs#9828
Fixes: nodejs#9785
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ali Ijaz Sheikh <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
italoacasas pushed a commit to italoacasas/node that referenced this issue Jan 30, 2017
Original commit message:
  Fix the uppercasing of U+00E7(ç) and U+00F7(÷)
  Due to a typo in runtime-i18n.js, 'ç'(U+00E7) was not uppercased
  while '÷'(U+00F7) was incorrectly uppercased to '×'(U+00D7).

  Add a comprehensive test for Latin-1 supplemental block
  (U+00A0 ~ U+00FF). (they're special-cased for speed-up and
  needs to have a test for the range.).

  TEST=intl/general/case-mapping
  BUG=v8:5681

  Review-Url: https://codereview.chromium.org/2533033003
  Cr-Commit-Position: refs/heads/master@{nodejs#41331}

PR-URL: nodejs#9828
Fixes: nodejs#9785
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ali Ijaz Sheikh <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
italoacasas pushed a commit to italoacasas/node that referenced this issue Jan 30, 2017
* add test for ç/Ç in various locales

PR-URL: nodejs#9828
Fixes: nodejs#9785
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ali Ijaz Sheikh <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-api Issues and PRs related to the i18n implementation. question Issues that look for answers.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants