Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Differences between RFC 6067 and UTR 35 Unicode extension canonicalization #43

Closed
anba opened this issue May 21, 2018 · 6 comments
Closed

Comments

@anba
Copy link
Contributor

anba commented May 21, 2018

When comparing https://tools.ietf.org/html/rfc6067#section-2.1.1 against https://www.unicode.org/reports/tr35/#u_Extension, UTR 35 contains the following two additional canonicalization steps:

The current proposal only implements canonicalization per RFC 6067. We should document the difference between RFC 6067 and UTR 35.

I'm not sure if we want to apply the canonicalization steps from UTR 35, because while removing "true" type values is easy, the requirement to replace deprecated keys and types requires more thought. For example when the time zone tz Unicode extension key is canonicalized, we may want to ensure the result is consistent with 6.4.2 CanonicalizeTimeZoneName.

@littledan
Copy link
Member

It's curious that these don't quite match. Let's discuss the mismatches at the next Intl meeting for more context.

@srl295
Copy link
Member

srl295 commented Jun 15, 2018

@yumaoka @aphillips ?

@yumaoka
Copy link

yumaoka commented Jun 15, 2018

I agree 'Type value "true" is removed." is a mismatch. Not 100% sure the first point - related to deprecation - is a deviation, but I think it should be clearly stated somewhere.

@littledan
Copy link
Member

@yumaoka Which alternative should ECMA-402 select among the mismatch about whether "true" is removed?

@aphillips
Copy link

I think ECMA-402 should remove true: it makes the tag longer and adds no value. The RFC was written before that behavior was defined.

@littledan
Copy link
Member

OK, from @aphillips and @yumaoka's comments, it sounds like we should just reference UTS 35.

For example when the time zone tz Unicode extension key is canonicalized, we may want to ensure the result is consistent with 6.4.2 CanonicalizeTimeZoneName.

Not sure if here, you might be referring to possible slight differences between the time zone normalization from the tz database and CLDR (I don't know about these differences, but IIRC you've filed bugs about this sort of thing in the past). If so, I'd imagine that these edge case database differences don't matter so much and it should be OK to use the database of CLDR in an practical implementation. I'd be interested in any other thoughts on that topic.

littledan added a commit that referenced this issue Jul 14, 2018
Previously, the algorithm was defined in terms of direct algorithms,
based on a reading of RFC 6067. It turns out that RFC is a bit out
of date, and a more current algorithm is found in UTS #35. Rather
than copying that algorith here, this patch simply provides a
normative reference from one standard to another.

Closes #43
littledan added a commit that referenced this issue Jul 28, 2018
Previously, the algorithm was defined in terms of direct algorithms,
based on a reading of RFC 6067. It turns out that RFC is a bit out
of date, and a more current algorithm is found in UTS #35. Rather
than copying that algorith here, this patch simply provides a
normative reference from one standard to another.

Closes #43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants