-
Notifications
You must be signed in to change notification settings - Fork 10
Should we map the code if the type is region ? #81
Comments
ICU internally already defaults to replacing deprecated region codes and it's not possible to disable this behaviour: https://github.com/unicode-org/icu/blob/f917c43cf153bfca7ffd60fc1cdcbb32360967ce/icu4c/source/common/locresdata.cpp#L97-L102 CLDR also doesn't provide localised names for deprecated region names, e.g. there's no entry for "SU" to localise it to "Soviet Union". For example, Furthermore the linked ICU source code shows some issues when we don't properly canonicalise upfront before calling into ICU: d8> new Intl.DisplayNames("en", {type:"region", style:"narrow"}).of("GB")
"UK"
d8> new Intl.DisplayNames("en", {type:"region", style:"narrow"}).of("UK")
"United Kingdom" whereas in SpiderMonkey with explicit canonicalisation: js> new Intl.DisplayNames("en", {type:"region", style:"narrow"}).of("GB")
"UK"
js> new Intl.DisplayNames("en", {type:"region", style:"narrow"}).of("UK")
"UK" (ICU only handles the string If we don't want to use some ad-hoc steps to canonicalise a standalone region subtag, we could prepend the language subtag "und" to get a proper Unicode BCP 47 locale identifier and then canonicalise that one. So for example when the region subtag is "SU", we prepend "und" to get "und-SU", canonicalise "und-SU" to get "und-RU" and then extract the region subtag from "und-RU" to get "RU". Apart from that, canonicalisation will also help implementations to properly call into ICU, because ICU expects at least case canonicalised inputs. (Too lazy to properly report this bug. 😄) d8> new Intl.DisplayNames("en", {type:"region", style:"narrow"}).of("su")
"su"
d8> new Intl.DisplayNames("en", {type:"region", style:"narrow"}).of("SU")
"Russia" |
@anba - ok. That make sense. Any suggestion how should we spec out such mapping process? |
I dig int a little bit more. I do not think ICU perform such mapping for all as what @anba said. I think there are some mapping there but not the mapping in the UTS35. I believe ICU does not map the following |
(Hmm, does this further strengthen my argument to perform an explicit canonicalisation before calling into ICU, because then we don't need to rely on some hard-coded values in ICU? 😄) |
Here are the meeting notes about our discussion in 2020-07-09 ECMA 402 meeting: FYT: Anba suggested that we canonicalize the code, not just in terms of casing but in terms of aliases. The tricky part is canonicalizing the region code and script code. There isn't a mapping defined for this in UTS 35. If we do this, we need to have some way to spec it out clearly. Do we want to perform this additional mapping or just the casing change? |
Nope, ICU doesn't support it: #81 (comment) For cases like the "UK" one outlined in #81 (comment), I'll probably keep complete canonicalisation in SpiderMonkey, even if the spec only requires case canonicalisation, but return only the case normalised code if no localised name is present. So for example, "su" will still return "Russia" in SpiderMonkey, but in case there's no localised name for "Russia", case normalised "SU" will be returned (instead of "RU"). |
|
Yes, exactly that. |
Discussion from 2020-09-10: |
This proposal is now in stage 4 per 2020-sept TC39 meeting. If you still feel a need to map the code when the type is region, please file a new issue in the v2 repo. I am closing this issue now. |
In #77 (comment)
@anba suggested
"
Region (and scripts) subtags should also get canonicalised to replaced outdated subtags with their preferred value.
"
This issue track the "region part" only since the issue with script is different.
I have concern about this. (canonicalize the region code). There are no pre-defined process in UTS35 for this. The process for the region subtag within unicode_language_id stated in https://unicode-org.github.io/cldr/ldml/tr35.html#Canonical_Unicode_Locale_Identifiers depends on the language code (and script code if present) while there are multiple territories listed in the replacement attribute of territoryAlias.
The text was updated successfully, but these errors were encountered: