Regular Expressions Patterns
The syntax of is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.
- This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [U] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [U] parameter present on the goal symbol.
+ This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [UnicodeMode] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [UnicodeMode] parameter present on the goal symbol.
Syntax
- Term[U, N] ::
- [+U] Assertion[+U, ?N]
- [+U] Atom[+U, ?N] Quantifier
- [+U] Atom[+U, ?N]
- [~U] QuantifiableAssertion[?N] Quantifier
- [~U] Assertion[~U, ?N]
- [~U] ExtendedAtom[?N] Quantifier
- [~U] ExtendedAtom[?N]
-
- Assertion[U, N] ::
+ Term[UnicodeMode, N] ::
+ [+UnicodeMode] Assertion[+UnicodeMode, ?N]
+ [+UnicodeMode] Atom[+UnicodeMode, ?N] Quantifier
+ [+UnicodeMode] Atom[+UnicodeMode, ?N]
+ [~UnicodeMode] QuantifiableAssertion[?N] Quantifier
+ [~UnicodeMode] Assertion[~UnicodeMode, ?N]
+ [~UnicodeMode] ExtendedAtom[?N] Quantifier
+ [~UnicodeMode] ExtendedAtom[?N]
+
+ Assertion[UnicodeMode, N] ::
`^`
`$`
`\` `b`
`\` `B`
- [+U] `(` `?` `=` Disjunction[+U, ?N] `)`
- [+U] `(` `?` `!` Disjunction[+U, ?N] `)`
- [~U] QuantifiableAssertion[?N]
- `(` `?` `<=` Disjunction[?U, ?N] `)`
- `(` `?` `<!` Disjunction[?U, ?N] `)`
+ [+UnicodeMode] `(` `?` `=` Disjunction[+UnicodeMode, ?N] `)`
+ [+UnicodeMode] `(` `?` `!` Disjunction[+UnicodeMode, ?N] `)`
+ [~UnicodeMode] QuantifiableAssertion[?N]
+ `(` `?` `<=` Disjunction[?UnicodeMode, ?N] `)`
+ `(` `?` `<!` Disjunction[?UnicodeMode, ?N] `)`
QuantifiableAssertion[N] ::
- `(` `?` `=` Disjunction[~U, ?N] `)`
- `(` `?` `!` Disjunction[~U, ?N] `)`
+ `(` `?` `=` Disjunction[~UnicodeMode, ?N] `)`
+ `(` `?` `!` Disjunction[~UnicodeMode, ?N] `)`
ExtendedAtom[N] ::
`.`
- `\` AtomEscape[~U, ?N]
+ `\` AtomEscape[~UnicodeMode, ?N]
`\` [lookahead == `c`]
- CharacterClass[~U]
- `(` Disjunction[~U, ?N] `)`
- `(` `?` `:` Disjunction[~U, ?N] `)`
+ CharacterClass[~UnicodeMode]
+ `(` Disjunction[~UnicodeMode, ?N] `)`
+ `(` `?` `:` Disjunction[~UnicodeMode, ?N] `)`
InvalidBracedQuantifier
ExtendedPatternCharacter
@@ -45590,49 +45590,49 @@ Syntax
ExtendedPatternCharacter ::
SourceCharacter but not one of `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `|`
- AtomEscape[U, N] ::
- [+U] DecimalEscape
- [~U] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ _NcapturingParens_]
- CharacterClassEscape[?U]
- CharacterEscape[?U, ?N]
- [+N] `k` GroupName[?U]
+ AtomEscape[UnicodeMode, N] ::
+ [+UnicodeMode] DecimalEscape
+ [~UnicodeMode] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ _NcapturingParens_]
+ CharacterClassEscape[?UnicodeMode]
+ CharacterEscape[?UnicodeMode, ?N]
+ [+N] `k` GroupName[?UnicodeMode]
- CharacterEscape[U, N] ::
+ CharacterEscape[UnicodeMode, N] ::
ControlEscape
`c` ControlLetter
`0` [lookahead ∉ DecimalDigit]
HexEscapeSequence
- RegExpUnicodeEscapeSequence[?U]
- [~U] LegacyOctalEscapeSequence
- IdentityEscape[?U, ?N]
+ RegExpUnicodeEscapeSequence[?UnicodeMode]
+ [~UnicodeMode] LegacyOctalEscapeSequence
+ IdentityEscape[?UnicodeMode, ?N]
- IdentityEscape[U, N] ::
- [+U] SyntaxCharacter
- [+U] `/`
- [~U] SourceCharacterIdentityEscape[?N]
+ IdentityEscape[UnicodeMode, N] ::
+ [+UnicodeMode] SyntaxCharacter
+ [+UnicodeMode] `/`
+ [~UnicodeMode] SourceCharacterIdentityEscape[?N]
SourceCharacterIdentityEscape[N] ::
[~N] SourceCharacter but not `c`
[+N] SourceCharacter but not one of `c` or `k`
- ClassAtomNoDash[U, N] ::
+ ClassAtomNoDash[UnicodeMode, N] ::
SourceCharacter but not one of `\` or `]` or `-`
- `\` ClassEscape[?U, ?N]
+ `\` ClassEscape[?UnicodeMode, ?N]
`\` [lookahead == `c`]
- ClassEscape[U, N] ::
+ ClassEscape[UnicodeMode, N] ::
`b`
- [+U] `-`
- [~U] `c` ClassControlLetter
- CharacterClassEscape[?U]
- CharacterEscape[?U, ?N]
+ [+UnicodeMode] `-`
+ [~UnicodeMode] `c` ClassControlLetter
+ CharacterClassEscape[?UnicodeMode]
+ CharacterEscape[?UnicodeMode, ?N]
ClassControlLetter ::
DecimalDigit
`_`
- When the same left hand sides occurs with both [+U] and [\~U] guards it is to control the disambiguation priority.
+ When the same left-hand sides occurs with both [+UnicodeMode] and [\~UnicodeMode] guards it is to control the disambiguation priority.
@@ -45648,7 +45648,7 @@ Static Semantics: Early Errors
NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges
-
- It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *true* or IsCharacterClass of the second |ClassAtom| is *true* and this production has a [U] parameter.
+ It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *true* or IsCharacterClass of the second |ClassAtom| is *true* and this production has a [UnicodeMode] parameter.
-
It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *false* and IsCharacterClass of the second |ClassAtom| is *false* and the CharacterValue of the first |ClassAtom| is larger than the CharacterValue of the second |ClassAtom|.
@@ -45657,7 +45657,7 @@
Static Semantics: Early Errors
NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges
-
- It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *true* or IsCharacterClass of |ClassAtom| is *true* and this production has a [U] parameter.
+ It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *true* or IsCharacterClass of |ClassAtom| is *true* and this production has a [UnicodeMode] parameter.
-
It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *false* and IsCharacterClass of |ClassAtom| is *false* and the CharacterValue of |ClassAtomNoDash| is larger than the CharacterValue of |ClassAtom|.