-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normative: remove tables of Unicode property values and aliases #2649
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! 🥳
What’s the timeline on getting the accepted proposal reflected “on paper” in the Unicode Standard? We don’t necessarily need to wait for that to happen (given the consensus) but it’d be useful to know.
spec.html
Outdated
<emu-note> | ||
<p>For example, `Xpeo` and `Old_Persian` are valid `Script_Extensions` values, but `xpeo` and `Old Persian` aren't.</p> | ||
</emu-note> | ||
<emu-note> | ||
<p>This algorithm differs from <a href="https://unicode.org/reports/tr44/#Matching_Symbolic">the matching rules for symbolic values listed in UAX44</a>: case, <emu-xref href="#sec-white-space">white space</emu-xref>, U+002D (HYPHEN-MINUS), and U+005F (LOW LINE) are not ignored, and the `Is` prefix is not supported.</p> | ||
</emu-note> | ||
<emu-note> | ||
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the files <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> and <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p> | ||
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this note even needed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. ES still needs a list of properties, since it only supports an explicit subset.
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p> | |
<p>The spellings of entries in these tables (including casing) were chosen to match the historically first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. Note that the precise spellings in those files are guaranteed to be stable. Additional aliases might be added in future versions of Unicode.</p> |
Not sure if we need all the "Unicode property" clarification or if we could just use "property" in most places. I guess same goes for existing usage of "Unicode code point". The qualifier mostly seems unnecessary. |
Not sure. @markusicu could probably answer. I was just going to let this PR sit until it happened, unless the committee asks for it to be merged sooner when I ask for consensus. |
The new policy has been approved by the Unicode Technical Committee and by the Unicode executive officers. |
spec.html
Outdated
@@ -34530,13 +34530,13 @@ <h1>Static Semantics: Early Errors</h1> | |||
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyName| is not identical to a List of Unicode code points that is a Unicode property name or property alias listed in the “Property name and aliases” column of <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>. | |||
</li> | |||
<li> | |||
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyValue| is not identical to a List of Unicode code points that is a value or value alias for the Unicode property or property alias given by SourceText of |UnicodePropertyName| listed in the “Property value and aliases” column of the corresponding tables <emu-xref href="#table-unicode-general-category-values"></emu-xref> or <emu-xref href="#table-unicode-script-values"></emu-xref>. | |||
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyValue| is not identical to a property value or property value alias for the Unicode property or property alias given by SourceText of |UnicodePropertyName| listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A UnicodePropertyValue should only match a Unicode property value alias, right? Not also a Unicode property alias?
Thus
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyValue| is not identical to a property value or property value alias for the Unicode property or property alias given by SourceText of |UnicodePropertyName| listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>. | |
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyValue| is not identical to a property value alias for the Unicode property or property alias given by SourceText of |UnicodePropertyName| listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, where does it say Unicode property alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, sorry, I think I misread "a property value or property value alias" for "a property alias or property value alias".
You are right, it just says "value or value alias". So not wrong, just redundant: In Unicode parlance, all of these are "aliases". The "value" is the logical thing, and the "aliases" are the symbolic strings for the thing. Thus I think my suggestion is useful despite my brain fart :-}
Example: https://www.unicode.org/reports/tr44/#Property_Value_Aliases
In PropertyValueAliases.txt, the first field contains the abbreviated alias for a Unicode property, the second field specifies an abbreviated symbolic name for a value of that property, and the third field specifies the long symbolic name for that value of that property. These are the preferred aliases. Additional aliases for some property values may be specified in the fourth or subsequent fields.
spec.html
Outdated
</li> | ||
</ul> | ||
<emu-grammar>UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue</emu-grammar> | ||
<ul> | ||
<li> | ||
It is a Syntax Error if the List of Unicode code points that is SourceText of |LoneUnicodePropertyNameOrValue| is not identical to a List of Unicode code points that is a Unicode general category or general category alias listed in the “Property value and aliases” column of <emu-xref href="#table-unicode-general-category-values"></emu-xref>, nor a binary property or binary property alias listed in the “Property name and aliases” column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>. | ||
It is a Syntax Error if the List of Unicode code points that is SourceText of |LoneUnicodePropertyNameOrValue| is not identical to a Unicode property value or property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, nor a binary property or binary property alias listed in the “Property name and aliases” column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto?
It is a Syntax Error if the List of Unicode code points that is SourceText of |LoneUnicodePropertyNameOrValue| is not identical to a Unicode property value or property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, nor a binary property or binary property alias listed in the “Property name and aliases” column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>. | |
It is a Syntax Error if the List of Unicode code points that is SourceText of |LoneUnicodePropertyNameOrValue| is not identical to a Unicode property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, nor a binary property name or binary property alias listed in the “Property name and aliases” column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>. |
spec.html
Outdated
@@ -35656,7 +35656,7 @@ <h1>Runtime Semantics: CompileToCharSet</h1> | |||
<emu-grammar>UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue</emu-grammar> | |||
<emu-alg> | |||
1. Let _s_ be SourceText of |LoneUnicodePropertyNameOrValue|. | |||
1. If ! UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a List of Unicode code points that is the name of a Unicode general category or general category alias listed in the “Property value and aliases” column of <emu-xref href="#table-unicode-general-category-values"></emu-xref>, then | |||
1. If ! UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a Unicode property value or property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. If ! UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a Unicode property value or property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, then | |
1. If ! UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a Unicode property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, then |
spec.html
Outdated
1. Let _value_ be the canonical property value of _v_ as given in the “Canonical property value” column of the corresponding row. | ||
1. Return the List of Unicode code points _value_. | ||
</emu-alg> | ||
<p>Implementations must support the Unicode property value names and aliases listed in <emu-xref href="#table-unicode-general-category-values"></emu-xref> and <emu-xref href="#table-unicode-script-values"></emu-xref>. To ensure interoperability, implementations must not support any other property value names or aliases.</p> | ||
<p>Implementations must support the Unicode property values and property value aliases listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a> for the properties listed in <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>. To ensure interoperability, implementations must not support any other property values or property value aliases.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are pointing to the "lastest" Unicode Character Database, which makes this spec "evergreen". However, as a "must support", should you provide a little leniency for recent versions, or the versions as of the release of the ES implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already normatively refer to the latest version of the Unicode standard (for things like whitespace, ID_Start, etc). From the spec,
Additionally, ECMAScript 2017 mandated always using the latest version of the Unicode standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, this was the relevant PR: #620
spec.html
Outdated
<emu-note> | ||
<p>For example, `Xpeo` and `Old_Persian` are valid `Script_Extensions` values, but `xpeo` and `Old Persian` aren't.</p> | ||
</emu-note> | ||
<emu-note> | ||
<p>This algorithm differs from <a href="https://unicode.org/reports/tr44/#Matching_Symbolic">the matching rules for symbolic values listed in UAX44</a>: case, <emu-xref href="#sec-white-space">white space</emu-xref>, U+002D (HYPHEN-MINUS), and U+005F (LOW LINE) are not ignored, and the `Is` prefix is not supported.</p> | ||
</emu-note> | ||
<emu-note> | ||
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the files <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> and <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p> | ||
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. ES still needs a list of properties, since it only supports an explicit subset.
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p> | |
<p>The spellings of entries in these tables (including casing) were chosen to match the historically first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. Note that the precise spellings in those files are guaranteed to be stable. Additional aliases might be added in future versions of Unicode.</p> |
spec.html
Outdated
@@ -34530,13 +34530,13 @@ <h1>Static Semantics: Early Errors</h1> | |||
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyName| is not identical to a List of Unicode code points that is a Unicode property name or property alias listed in the “Property name and aliases” column of <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: You could simplify the tables of properties: List only one name for each property, no further aliases, and just point to PropertyAliases.txt for aliases.
The updated policy has been published today: Property value aliases, once defined in PropertyValueAliases.txt, will never be removed, nor will their precise spelling be changed. |
3a55d88
to
5f86037
Compare
@bakkot rebased and addressed comment |
e7496c3
to
3c29c06
Compare
Following the acceptance of L2/22-029, Proposal to guarantee stability of spelling of property names, values, and aliases in UCD, we no longer need to keep these tables.