Differences between RFC 6067 and UTR 35 Unicode extension canonicalization #43

anba · 2018-05-21T17:54:14Z

When comparing https://tools.ietf.org/html/rfc6067#section-2.1.1 against https://www.unicode.org/reports/tr35/#u_Extension, UTR 35 contains the following two additional canonicalization steps:

All keys and types use the canonical form (from the name attribute; see Section 3.6.4 U Extension Data Files).
Type value "true" is removed.

The current proposal only implements canonicalization per RFC 6067. We should document the difference between RFC 6067 and UTR 35.

I'm not sure if we want to apply the canonicalization steps from UTR 35, because while removing "true" type values is easy, the requirement to replace deprecated keys and types requires more thought. For example when the time zone tz Unicode extension key is canonicalized, we may want to ensure the result is consistent with 6.4.2 CanonicalizeTimeZoneName.

The text was updated successfully, but these errors were encountered:

littledan · 2018-06-11T12:26:28Z

It's curious that these don't quite match. Let's discuss the mismatches at the next Intl meeting for more context.

srl295 · 2018-06-15T17:39:02Z

@yumaoka @aphillips ?

yumaoka · 2018-06-15T18:36:49Z

I agree 'Type value "true" is removed." is a mismatch. Not 100% sure the first point - related to deprecation - is a deviation, but I think it should be clearly stated somewhere.

littledan · 2018-06-24T23:02:26Z

@yumaoka Which alternative should ECMA-402 select among the mismatch about whether "true" is removed?

aphillips · 2018-06-25T21:34:52Z

I think ECMA-402 should remove true: it makes the tag longer and adds no value. The RFC was written before that behavior was defined.

littledan · 2018-07-14T16:58:47Z

OK, from @aphillips and @yumaoka's comments, it sounds like we should just reference UTS 35.

For example when the time zone tz Unicode extension key is canonicalized, we may want to ensure the result is consistent with 6.4.2 CanonicalizeTimeZoneName.

Not sure if here, you might be referring to possible slight differences between the time zone normalization from the tz database and CLDR (I don't know about these differences, but IIRC you've filed bugs about this sort of thing in the past). If so, I'd imagine that these edge case database differences don't matter so much and it should be OK to use the database of CLDR in an practical implementation. I'd be interested in any other thoughts on that topic.

Previously, the algorithm was defined in terms of direct algorithms, based on a reading of RFC 6067. It turns out that RFC is a bit out of date, and a more current algorithm is found in UTS #35. Rather than copying that algorith here, this patch simply provides a normative reference from one standard to another. Closes #43

This was referenced Jul 14, 2018

Normative: Cite UTS #35 for canonicalizing Unicode extension tags #48

Merged

Normative: Represent keywords in resolvedOptions().locale #37

Closed

littledan closed this as completed in #48 Jul 28, 2018

jswalden mentioned this issue Nov 16, 2019

Update references to match current UTS 35 spec #77

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences between RFC 6067 and UTR 35 Unicode extension canonicalization #43

Differences between RFC 6067 and UTR 35 Unicode extension canonicalization #43

anba commented May 21, 2018

littledan commented Jun 11, 2018

srl295 commented Jun 15, 2018

yumaoka commented Jun 15, 2018

littledan commented Jun 24, 2018

aphillips commented Jun 25, 2018

littledan commented Jul 14, 2018

Differences between RFC 6067 and UTR 35 Unicode extension canonicalization #43

Differences between RFC 6067 and UTR 35 Unicode extension canonicalization #43

Comments

anba commented May 21, 2018

littledan commented Jun 11, 2018

srl295 commented Jun 15, 2018

yumaoka commented Jun 15, 2018

littledan commented Jun 24, 2018

aphillips commented Jun 25, 2018

littledan commented Jul 14, 2018