Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
## Description This PR changes the `HIDDEN_PREFIX` of `ModuleSource` from the non-conforming `$h\u200D_` zero-width joiner (`ZWJ`) notation to the conforming `$h\u034F_` combining grapheme joiner (`CGJ`) notation. A future PR may address further changes to a `$\u034F`-prefixed and `\u034F$`-suffixed format as was suggested by @michaelfig in discussions. ### Motivation This change is motivated after encountering a parsing error when using `rollup` which was traced back to the `$h\u200D_`-prefixed identifier in an `endoScript` bundle. More importantly, this is also motivated by the subsequent discovery that `rollup`'s implementation was actually conforming to the ECMAScript Specification when it was throwing this error. To elaborate, while runtimes today will accept the special identifier notation that is currently being introduced by the `ModuleSource` rewrites, the current `$h\u200D_` zero-width joiner (`ZWJ`) notation does not conform to the specifications defined in the [ECMAScript Lexical Grammar](https://tc39.es/ecma262/#sec-names-and-keywords). In essence, what the specifications entail is that the character sequence for [Identifier Names](https://tc39.es/ecma262/#sec-identifier-names) once unescaped would be expected to match the `/^[$_\p{ID_Start}][$_\p{ID_Continue}]*$/u` pattern, aside from the additional `#` character prefix required in the case of private fields. As such, one can test this in the console by evaluating the following: ```js Object.fromEntries([String.raw`$h\u200D_`, String.raw`$h\u034F_`].map(id => [id, /^[$_\p{ID_Start}][$_\p{ID_Continue}]*$/u.test(JSON.parse(`"${id}"`))])) ``` The above would yield the following object in a runtime where the unicode escape sequences are retained: ```js {$h\u200D_: false, $h\u034F_: true} ``` Digging closer in the Unicode Standard, it seems that the zero-width joiner (`ZWJ`) may indeed be used in a conforming notation per [Emoji Profile in Annex #31 of the Unicode Standard](http://www.unicode.org/reports/tr31/#Emoji_Profile), however this is not applicable for this purpose as it would require the use of emojis. At this point, my suggestion to instead use the combining grapheme joiner (`CGJ`) is best articulated with this excerpt that I am borrowing from its canonical Wikipedia entry: > However, in contrast to the zero-width joiner and similar characters, the `CGJ` does not affect whether the two letters are rendered separately or as a ligature or cursively joined—the default behavior for this is determined by the font.[^1] > > > [^1]: https://en.wikipedia.org/wiki/Combining_grapheme_joiner The Wikipedia article offers additional nuances about the differences, while the [Proposal for addition of COMBINING GRAPHEME JOINER](https://www.unicode.org/L2/L2000/00274-N2236-grapheme-joiner.htm) offers the necessary context about its intent. It is fair to note that there are many uses of the zero-width joiner (`ZWJ`) already in the wild, and in fact there are currently `test262` tests for its occurrence. That said, unless those uses are conforming to the ECMAScript Specification and the Unicode Standard, they will limit code portability and adoption by users who may end up confused by failures similar to the one encountered with `rollup`. Ultimately, with the reasonable recommendations to exercise caution when it comes to bundling `ses` and related sources that are best bundled with `bundleSource` instead, those sources may still need to be parsed with tools like `rollup` for different purposes that would be aligned with the expectations that they are being handled safely. ### Approach #### Substituting the invisible joiner character A search across the monorepo for `(?:\u200d|\\u200d)_` yields only 3 files of interest: - `packages/module-source/TESTS.md` - `packages/module-source/src/hidden.js` - `packages/module-source/test/module-source.test.js` While making changes to the 3 files of interest, a distinction is made between matching `\$h\\u200d_` and `\$h\u200d_` where the replacements are respectively `$h\\u034f_` and `$h\u034f_`, along with their `$c` equivalents. The search across the monorepo for `(?:\u200d|\\u200d)_` yields another 978 files that are not of interest found in: - `packages/test262-runner/test262/test/language/expressions/class/elements` - `packages/test262-runner/test262/test/language/statements/class/elements` All those files remain unchanged. #### Ensuring generic wording is used For testing and other purposes where descriptive phrases are used to refer to the use of `ZWJ`, `CGJ` or other characters for this same intent, the phrase *"invisible joiner character"* is suggested. ### Security Considerations **Does not apply to my knowledge** ### Scaling Considerations **Does not apply to my knowledge** ### Documentation Considerations **Does not apply to my knowledge** ### Testing Considerations **See**: #2436 (comment) ### Compatibility Considerations While the changes do not affect compatibility when the generated code is evaluated at runtime, there can potentially be compatibility concerns with tools that have been specifically designed to work with the current notation. ### Upgrade Considerations **Does not apply to my knowledge**
- Loading branch information