-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created a branded type for identifier-escaped strings #16915
Conversation
Then flowed it throughout the compiler, finding and fixing a handful of bugs relating to underscore-prefixed identifiers in the process. Includes a test for two cases noticed - diagnostics from conflicting symbols from export *'s, and enum with underscore prefixed member emit.
cc @sandersn and @andy-ms - you're both assigned to some of the issues this PR fixes. |
src/compiler/binder.ts
Outdated
@@ -147,7 +147,7 @@ namespace ts { | |||
options = opts; | |||
languageVersion = getEmitScriptTarget(options); | |||
inStrictMode = bindInStrictMode(file, opts); | |||
classifiableNames = createMap<string>(); | |||
classifiableNames = createMap<EscapedIdentifier>() as EscapedIdentifierMap<EscapedIdentifier>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use createEscapedMap
as this occurs many times.
src/compiler/types.ts
Outdated
@@ -3005,7 +3005,24 @@ namespace ts { | |||
isRestParameter?: boolean; | |||
} | |||
|
|||
export type SymbolTable = Map<Symbol>; | |||
export type EscapedIdentifier = (string & { __escapedIdentifier: void }) | (void & { __escapedIdentifier: void }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the union for?
Also, would name this EscapedString
since it's not an escaped version of Identifier
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm, the name's actually a little hard, since a plain escaped string I would expect to be a string where "
has been replaced with \"
to escape it (which we do! elsewhere!). I called it EscapedIdentifier
at first since Identifier was the only documented place where we stored strings escaped this way - except that was by no means true. From identifiers, it leaked into symbol names (which was desirable, based on use), and from there, even string literals were (sometimes and inconsistently) escaped to be compatible with symbol names. All the
inconsistency is gone now, but the name definitely no longer makes sense. What about UnderscoreEscapedString
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, re: the shape:
The shape of this brand is also rather unique compared to others we've used. Instead of just an intersection of a string and an object, it is that union-ed with an intersection of void and an object. This makes it wholly incompatible with a normal string (which is good, it cannot be misused on assignment or on usage), while still being comparable with a normal string via === (also good) and castable from a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move that to a doc comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnderscoreEscapedString
is a bit verbose for such a common type. EscapedName
might be better.
src/compiler/binder.ts
Outdated
const name = getNameOfDeclaration(node); | ||
if (name) { | ||
if (isAmbientModule(node)) { | ||
return isGlobalScopeAugmentation(<ModuleDeclaration>node) ? "__global" : `"${(<LiteralExpression>name).text}"`; | ||
const moduleName = name.kind === SyntaxKind.Identifier ? unescapeIdentifier((<Identifier>name).text) : (<LiteralExpression>name).text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If isAmbientModule
, the name should always be a string literal.
src/compiler/binder.ts
Outdated
} | ||
if (name.kind === SyntaxKind.ComputedPropertyName) { | ||
const nameExpression = (<ComputedPropertyName>name).expression; | ||
// treat computed property names where expression is string/numeric literal as just string/numeric literal | ||
if (isStringOrNumericLiteral(nameExpression)) { | ||
return nameExpression.text; | ||
return nameExpression.text && escapeIdentifier(nameExpression.text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text
shouldn't be optional on a literal, right?
src/compiler/binder.ts
Outdated
@@ -1028,7 +1029,7 @@ namespace ts { | |||
function bindBreakOrContinueStatement(node: BreakOrContinueStatement): void { | |||
bind(node.label); | |||
if (node.label) { | |||
const activeLabel = findActiveLabel(node.label.text); | |||
const activeLabel = findActiveLabel(unescapeIdentifier(node.label.text)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to just always use an escaped name in an ActiveLabel
.
src/services/navigateTo.ts
Outdated
@@ -85,11 +85,13 @@ namespace ts.NavigateTo { | |||
|
|||
function getTextOfIdentifierOrLiteral(node: Node) { | |||
if (node) { | |||
if (node.kind === SyntaxKind.Identifier || | |||
node.kind === SyntaxKind.StringLiteral || | |||
if (node.kind === SyntaxKind.Identifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern has occurred a few times, could use a helper function.
src/services/navigationBar.ts
Outdated
@@ -580,12 +580,12 @@ namespace ts.NavigationBar { | |||
// Otherwise, we need to aggregate each identifier to build up the qualified name. | |||
const result: string[] = []; | |||
|
|||
result.push(moduleDeclaration.name.text); | |||
result.push(moduleDeclaration.name.kind === SyntaxKind.Identifier ? unescapeIdentifier(moduleDeclaration.name.text) : moduleDeclaration.name.text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern has occurred a few times, could use a helper function.
src/services/services.ts
Outdated
@@ -609,11 +609,13 @@ namespace ts { | |||
|
|||
function getTextOfIdentifierOrLiteral(node: Node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, here is a helper function!
src/services/services.ts
Outdated
@@ -2111,7 +2113,7 @@ namespace ts { | |||
node.parent.kind === SyntaxKind.ExternalModuleReference || | |||
isArgumentOfElementAccessExpression(node) || | |||
isLiteralComputedPropertyDeclarationName(node)) { | |||
setNameTable((<LiteralExpression>node).text, node); | |||
setNameTable((<LiteralExpression>node).text as EscapedIdentifier, node); // Literal expressions are escaped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parseLiteralLikeNode
doesn't seem to escape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct! When I started it did (as it does right now), and that leaked escaped strings to wholly too many places. This probably surfaces as a bug.
src/services/utilities.ts
Outdated
@@ -1272,7 +1272,7 @@ namespace ts { | |||
// If this is an export or import specifier it could have been renamed using the 'as' syntax. | |||
// If so we want to search for whatever is under the cursor. | |||
if (isImportOrExportSpecifierName(location) || isStringOrNumericLiteral(location) && location.parent.kind === SyntaxKind.ComputedPropertyName) { | |||
return location.text; | |||
return location.kind === SyntaxKind.Identifier ? unescapeIdentifier((location as Identifier).text) : (location as LiteralLikeNode).text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a helper function
|
@andy-ms
Would |
I think if it's a case where a string literal just happens to not need escaping (e.g.
Don't know about |
|
@andy-ms I think I've responded to all your feedback and I've implemented the list of internal symbol names as part of the brand type declaration (which, IMO, is almost more useful as a source of documentation than anything else). |
src/compiler/parser.ts
Outdated
@@ -1227,7 +1226,9 @@ namespace ts { | |||
|
|||
function parsePropertyNameWorker(allowComputedPropertyNames: boolean): PropertyName { | |||
if (token() === SyntaxKind.StringLiteral || token() === SyntaxKind.NumericLiteral) { | |||
return <StringLiteral | NumericLiteral>parseLiteralNode(/*internName*/ true); | |||
const node = <StringLiteral | NumericLiteral>parseLiteralNode(); | |||
internIdentifier(node.text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internIdentifier
won't have the desired effect if used as a statement; it returns the interned string, but it won't set node.text
to it. I like the way it was before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it, but according to my knowledge of strings in V8 (which, admitted, is gleaned from stackoverflow answers), reassignment of the value from the map isn't required to make the runtime intern the string.
It does this:
let identifier = identifiers.get(text);
if (identifier === undefined) {
identifiers.set(text, identifier = text);
}
return identifier;
Its primary use (now that it is no longer handling escaping things) in our code is setting up the identifiers
list which is used in the emitter and declaration emitter for knowing what identifiers it needs to avoid to generate names. Secondarily, this will cause the string to be interned in V8, since it's used as a property name (or map key in ES6) - a reassignment of the version from the map should not be required (since the same string primitive that's passed in is what gets interned (thus is already assigned to the object) or is already interned if already in the map).
TBQH I feel like the function should be renamed, since interning is not its primary use. Could probably reduce the space used by it a little bit by making it a Map<true>
, too.
src/compiler/parser.ts
Outdated
@@ -4098,7 +4099,7 @@ namespace ts { | |||
indexedAccess.argumentExpression = allowInAnd(parseExpression); | |||
if (indexedAccess.argumentExpression.kind === SyntaxKind.StringLiteral || indexedAccess.argumentExpression.kind === SyntaxKind.NumericLiteral) { | |||
const literal = <LiteralExpression>indexedAccess.argumentExpression; | |||
literal.text = internIdentifier(literal.text); | |||
internIdentifier(literal.text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, don't use this as a statement
src/compiler/parser.ts
Outdated
@@ -5624,7 +5625,8 @@ namespace ts { | |||
node.flags |= NodeFlags.GlobalAugmentation; | |||
} | |||
else { | |||
node.name = <StringLiteral>parseLiteralNode(/*internName*/ true); | |||
node.name = <StringLiteral>parseLiteralNode(); | |||
internIdentifier(node.name.text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
src/compiler/types.ts
Outdated
@@ -3005,7 +3005,42 @@ namespace ts { | |||
isRestParameter?: boolean; | |||
} | |||
|
|||
export type SymbolTable = Map<Symbol>; | |||
export type InternalSymbolName = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be better as a real string enum. Also, could you add the comment to the code about why (void & { __escapedIdentifier: void })
is present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of comments. Still looking.
src/compiler/types.ts
Outdated
export type UnderscoreEscapedString = (string & { __escapedIdentifier: void }) | (void & { __escapedIdentifier: void }) | InternalSymbolName; | ||
|
||
/** EscapedStringMap based on ES6 Map interface. */ | ||
export interface UnderscoreEscapedMap<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stupid question, but why can't this just be type EscapedNameMap<T> = Map<T, EscapedName>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We originally didn't make the key of Map
generic, so it's hardcoded as string
. Although maybe a better way to do this would be to define ESMap<K, V>
and Map<V> = ESMap<string, V>
and UnderscoreEscapedMap<V> = ESMap<UnderscoreEscapedString, V>
.
src/compiler/utilities.ts
Outdated
switch (name.kind) { | ||
case SyntaxKind.Identifier: | ||
return (<Identifier>name).text; | ||
case SyntaxKind.StringLiteral: | ||
case SyntaxKind.NumericLiteral: | ||
return (<LiteralExpression>name).text; | ||
return (<LiteralExpression>name).text as UnderscoreEscapedString; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why isn't text
always UnderscoreEscapedString
, at least for LiteralExpression
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, this might just be an unsafe cast. Should it be escaped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, this is probably unsafe, given where this is used. I'll add the escapes, rather than casts.
src/compiler/utilities.ts
Outdated
export function isIntrinsicJsxName(name: UnderscoreEscapedString | string) { | ||
// An escaped identifier had a leading underscore prior to being escaped, which would return true | ||
// The escape adds an extra underscore which does not change the result | ||
const ch = (name as string).substr(0, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: How does this code tell us that a string is an intrinsic JSX name? Its first character is lowercase? notAnInstrinsic
seems like an obvious counterexample.
Actually, this is probably a fine predicate, just one that shouldn't be promoted to utilities.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was already here? It's used within the checker and the jsx transform. Also, a JSX intrinsic is just any tag that starts with a lowercase letter (ie, body
), indicating that its type needs to be looked up globally rather than locally, right?
src/compiler/utilities.ts
Outdated
export function unescapeIdentifier(identifier: string): string { | ||
return identifier.length >= 3 && identifier.charCodeAt(0) === CharacterCodes._ && identifier.charCodeAt(1) === CharacterCodes._ && identifier.charCodeAt(2) === CharacterCodes._ ? identifier.substr(1) : identifier; | ||
export function unescapeLeadingUnderscores(identifier: UnderscoreEscapedString): string { | ||
return (identifier as string).length >= 3 && (identifier as string).charCodeAt(0) === CharacterCodes._ && (identifier as string).charCodeAt(1) === CharacterCodes._ && (identifier as string).charCodeAt(2) === CharacterCodes._ ? (identifier as string).substr(1) : identifier as string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting: could you create a temp here to make the line shorter? (Maybe name it id
. :)
@@ -1203,7 +1202,7 @@ namespace ts { | |||
if (token() !== SyntaxKind.Identifier) { | |||
node.originalKeywordKind = token(); | |||
} | |||
node.text = internIdentifier(scanner.getTokenValue()); | |||
node.text = escapeLeadingUnderscores(internIdentifier(scanner.getTokenValue())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internIdentifier used to escape before interning. Now the code interns, then escapes. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct for how we use the identifiers
map the internIdentifier
function uses (namely, looking for collisions in generated identifiers).
@@ -0,0 +1,60 @@ | |||
tests/cases/compiler/index.tsx(33,1): error TS2308: Module "./b" has already exported a member named '__foo'. Consider explicitly re-exporting to resolve the ambiguity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't expect a test named doubleUnderscoreSymbolsAndEmit to have errors. Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I wanted the js output of the enum (which was bugged) and this diagnostic (which used to have an extra underscore) - I could split it into two tests if you'd prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just change the name to omit 'Symbols'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too late - already split this file into about 4 unique tests which test different things with double underscore prefixes. 🐱
src/compiler/utilities.ts
Outdated
@@ -3974,8 +4009,8 @@ namespace ts { | |||
* @param identifier The escaped identifier text. | |||
* @returns The unescaped identifier text. | |||
*/ | |||
export function unescapeIdentifier(identifier: string): string { | |||
return identifier.length >= 3 && identifier.charCodeAt(0) === CharacterCodes._ && identifier.charCodeAt(1) === CharacterCodes._ && identifier.charCodeAt(2) === CharacterCodes._ ? identifier.substr(1) : identifier; | |||
export function unescapeLeadingUnderscores(identifier: UnderscoreEscapedString): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should definitely be documented as breaking change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add an alias to the old name with an @deprecated
jsdoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f15d2ef
to
9424f14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments:
- It's really scary that you fixed tens of bugs, especially in the language service, and didn't see any changes in the tests.
- I don't have a good understanding of which type to use when I'm writing new code in the checker or binder.
To fix (2), please add to the PR summary. And maybe to the wiki or to https://sandersn.github.io/manual/Typescript-compiler-implementation.html for future reference.
To fix (1), we'd need to write tons of tests for code that's now checked by the compiler anyway. So it's probably not needed.
src/services/completions.ts
Outdated
@@ -1505,7 +1506,8 @@ namespace ts.Completions { | |||
// TODO(jfreeman): Account for computed property name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments look old and probably out of date. Can you update them? (Just remove Jason's name from the TODO if it still applies, I guess.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed Jason's name, but I think computed property names might still not be handled specially - if a computed property is returned from getNameOfDeclaration
, getEscapedTextOfIdentifierOrLiteral
will quietly return undefined
.
src/compiler/types.ts
Outdated
| "export=" // Export assignment symbol | ||
| "default"; // Default export symbol (technically not wholly internal, but included here for usability) | ||
|
||
export type UnderscoreEscapedString = (string & { __escapedIdentifier: void }) | (void & { __escapedIdentifier: void }) | InternalSymbolName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A short name is __String
. But it's pretty (too?) cute.
src/compiler/binder.ts
Outdated
@@ -258,7 +259,7 @@ namespace ts { | |||
case SyntaxKind.ExportDeclaration: | |||
return "__export"; | |||
case SyntaxKind.ExportAssignment: | |||
return (<ExportAssignment>node).isExportEquals ? "export=" : "default"; | |||
return ((<ExportAssignment>node).isExportEquals ? "export=" : "default"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like some extra parens got left in after their associated casts were removed.
Then flowed it throughout the compiler, finding and fixing a handful of bugs relating to underscore-prefixed identifiers in the process. Includes a test for two (new) cases noticed - diagnostics from conflicting symbols from export *'s, and enums with underscore prefixed member emit.
I mentioned this while talking among the team last week. This can be done in place of/alongside #16868.
Additionally, this fixes #15334, #14268, #11902, and #3268. (Each should also now exist as a test case in this PR, too)
The shape of this brand is also rather unique compared to others we've used. Instead of just an intersection of a string and an object, it is that union-ed with an intersection of void and an object. This makes it wholly incompatible with a normal string (which is good, it cannot be misused on assignment or on usage), while still being comparable with a normal string via
===
(also good) and castable from a string.Summary: