Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Ensure 'calendar', 'collation', and 'numberingSystem' are valid Unicode extension types #23

Merged
merged 1 commit into from
Apr 14, 2018

Conversation

anba
Copy link
Contributor

@anba anba commented Apr 12, 2018

At the minimum we want to restrict the Unicode extension options to match the syntax [(3*8alphanum) *("-" (3*8alphanum))], which is the syntax for Unicode extension type values per http://www.unicode.org/reports/tr35/#Unicode_locale_identifier. The [(3*8alphanum) *("-" (3*8alphanum))] syntax was used because we already refer to the RFC 5646 ABNF in other parts of this proposal.

In #19, we may or may not further restrict the possible values for 'calendar', 'collation', and 'numberingSystem', but this is the minimum restriction we need to apply in any case.

@littledan
Copy link
Member

I'm not sure if we might want to iterate on this editorially (this is the first time we include BNF directly in the spec text), but these are the semantics that I was hoping for. Thanks for writing this patch.

@littledan
Copy link
Member

@anba What was the reasoning for including the [] around the grammar here? It looks to me like ABNF definition in the link you provided is 3*8alphanum *(sep 3*8alphanum), without the brackets.

(Not sure what I meant by "(this is the first time we include BNF directly in the spec text)" since @anba referenced earlier usages.)

@anba
Copy link
Contributor Author

anba commented May 16, 2018

I've made the type optional to match its usage in complete language tags where absent type subtags are valid, cf. keyword = key (sep type)? in http://www.unicode.org/reports/tr35/#Unicode_locale_identifier resp. https://tools.ietf.org/html/rfc6067#section-2.1:

A 'keyword' is a sequence of subtags consisting of a 'key' subtag,
followed by zero or more 'type' subtags (so a 'key' might appear
alone and not be accompanied by a 'type' subtag).

But I don't have a strong opinion about allowing empty strings as type values, we could probably just as well disallow it and require to always use the long form, i.e. to use "true". (IIRC the empty string type is simply a shorthand form for "true" and nothing more.)

@littledan
Copy link
Member

Is something like en-u-ca-true or en-u-ca alone valid? I thought some of these only work semantically with a particular type provided.

@anba
Copy link
Contributor Author

anba commented May 21, 2018

Hmm, I guess it depends on how "valid" is defined. For example using the definition from https://www.unicode.org/reports/tr35/#Unicode_Locale_Extension_Data_Files and https://www.unicode.org/reports/tr35/#Old_Locale_Extension_Syntax, "en-u-ca-true" is not valid, just as "en-u-ca-mycal" is not valid, because neither "true" nor "mycal" are registered types for the Unicode extension key "ca".

The special handling for "true" seems to be defined in https://www.unicode.org/reports/tr35/#Key_And_Type_Definitions_, which doesn't restrict it to only certain Unicode extension keys.

But when going through the relevant extension keys for Intl.Locale again, it seems like allowing empty strings is not really useful. So I guess right now I'd lean towards removing allowing empty strings.

@littledan
Copy link
Member

We discussed this question in the May 2018 Intl meeting. Since these options don't have reasonable defaults, we decided to throw when a value is not provided.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants