-
Notifications
You must be signed in to change notification settings - Fork 21
Always add "groups" to regexp-result to make it easier to speed-up the common case? #34
Comments
I believe we discussed this possibility at some point, and decided to not do it in favor of keeping semantics entirely the same when there are no groups. Implementers, would this change affect performance of your engines? cc @jgruber @bterlson @msaboff @tschneidereit |
Setting aside performance reasons, I would strongly urge inclusion of Even if it was Please don't make this an unnecessary hazard. |
@getify Could you mention a use case where you'd really want to read a group of a particular name, but while not knowing what the RegExp is? |
3 use cases:
|
Avoiding prototype chain lookups seems sensible to me, and this is indeed what both V8 and JSC implement. I’m in favor of this change. WDYT, @schuay @hashseed @msaboff @tschneidereit @bterlson? |
So it would be an own property on regexes with groups, and there'd be a prototype fallback that returned what, an empty object? If so, would it be the same one every time, would it be frozen, etc? |
As a developer, I think it'd be nice if there was a way to tell cleanly that no named groups were matched (other than checking for Object.keys() being empty). If Edit: Unfortunately, |
I agree that avoiding the prototype chain lookups makes sense, yes. I don't have a particularly strong opinion on whether the value of the property should be @jandem, please speak up if for whatever reason this wouldn't be good for SpiderMonkey. |
@tschneidereit if it's an empty null object, should it be frozen and always the same object? or newly created every time? |
I don't have a particularly strong opinion on this. It being the same object would certainly be more performant, but I don't know how much it'd matter, or whether there might be downsides I'm not immediately seeing. |
I'm fine with this spec change. Concerning @getify's suggestion of a The suggestion to always provide |
For @getify 's concern, we discussed this earlier in #12 . Would the use cases you mention be satisfied by the proposal for the For @anba's suggestion, it seems reasonable to me. I didn't understand the fast path correctness issue on the first reading--that does add some kinds of motivation, though at the same time, it's not the completely unsolvable--in theory, the RegExp fastpath could be skipped when a To clarify with respect to @mathiasbynens 's comment, @anba 's proposal does not describe V8's current implementation. Actually, this is an edge case where V8 doesn't currently have consistent behavior. You have to somehow trick V8 into taking the slow path to get the spec's behavior if you want groups higher in the prototype chain to show up, as in the following transcript:
I would've hoped that we could only look at the |
FWIW, in V8's fast path for
prints However, in the slow path, we look up the prototype chain as per spec, and print Due to this, I think it makes sense to either
I slightly lean towards (2) because (1) affects the behavior of existing regexps that do not use named capture group syntax. |
2, is interesting. It'd be new ground--I don't know about another place in the spec that we do an own-property lookup. Should a getter work there? |
Option 3: |
Nice find! I'd be ok with both 1. and 2. as proposed above. 1. would give us the additional advantage of having not having two different RegExpResult shapes (one with, one without the groups property). Note that with 1., RegExp.p.exec could still be overridden and return result objects of arbitrary shape, leading to a possible prototype lookup later on. @littledan there are one or two spots that check HasOwnProperty before calling Get, e.g.: https://tc39.github.io/ecma262/#sec-function.prototype.bind. |
It already does. But that's a separate topic (prototype-lookup for properties on the groups object vs. the 'groups' property on the RegExpResult). |
@schuay with option 1, what value would you expect when there’s no named captures in the regex? |
@littledan the problem, as I mentioned in my first post of this thread, is that the In practice, my prediction is this will mean the suboptimal solutions (like Of course, that also means future optimization hurdles for engines, as those kinds of hacks are harder to statically detect. I'm just basically saying that it's a hazard/footgun we can reasonably predict, so designing it into the language is unfortunate when we could tweak just a little bit and avoid/minimize it. Regarding the "breaking change" claims... can someone provide any real example/use-case where code will change/break if 'groups' shows up on all regex result objects (not the regexp objects themselves)? And specifically, if the worry is that code (libraries/etc) annotates regexp results with extra properties and may collide with this new "groups" property, how is that concern unique only to existing code and NOT a hazard if named-group regexes start being passed around into those utilities? Seems like the hazard would be universal (and quite low). I'm failing to imagine any such legitimate "breaking" case, but I provided 3 real non-imagined counter cases above. |
@getify I see the concern with @schuay Oh, I see your point about HasOwnProperty/Get; I was picturing somehow using GetOwnProperty. The two are observably different, and I don't think anyone does the latter pattern, but yes, we could do that. Does anyone have a preference between option 1 and 2 then? If no one has any preference at all, I'll do 1 since it seems very marginally simpler. |
@littledan FWIW I am also skeptical of designing things on the assumption that That said, I agree that it would be weird to design this specific API to avoid non-ObjectCoercible values to make it easier for programmers to not worry about whether or not the object exists before trying to read properties of it, since the vast majority of the APIs in JavaScript and the web platform were not designed that way - including notably |
I understand. I wish But I also think we shouldn't design one feature contingent on another entirely unrelated proposal-feature working perfectly and shipping cleanly. Hard to predict/correlate unrelated paths through this process.
I indeed would consider it very sketchy (and discouraged) if someone designed an API where, based on run-time conditions/inputs, an object on the public API was either present or not present. Imagine for a moment that jQuery had done something like this with their default Ajax settings... like instead of their Think how many jQuery users could potentially trip over trying to make assignments like And my sentiment here is a hundred fold stronger when we're talking about built-in language stuff, which can never change. jQuery could theoretically realize this mistake later and deprecate/fix it. JS can't. What if |
@littledan fwiw, the spec doesn't tend to use |
Can you give any examples from JS or the web platform, as precedent, where an object is either present or absent on some API/namespace, depending on how the API is used (or other runtime conditions)? Not return values from functions ( |
@getify something could primarily be conditional if it's the return value from a function (including a getter); but sure, arrow functions lack |
I'm not sure I understand why this distinction is important. Given that we don't have proper support for returning multiple values and therefore get by with returning objects with multiple properties, a property being present or absent on the single value returned from a function amounts to the same thing as the function returning multiple values one of which may be present or absent. That is to say, I don't really understand why In any case: the |
Things which might be either null or another value is so core to the web platform that WebIDL's type system has nullable types built in for that use case. If you want to see some places where it's used, search around in the HTML spec for a question mark. If it's in an IDL block, that's a case that might be null or might be an object or other value, depending on runtime conditions. |
If the RegExp does not have named groups, set a groups property of the result to undefined. This change is made to facilitate optimizing RegExp implementations, which often take a "fast path" for built-in, unmodified to not incur the overhead of the full subclassable semantics. If the "groups" property may be read from higher up on the prototype chain, the fast path for RegExp.prototype.replace would become more complex or slower, or alternatively, more conditions about the environment might need to be checked as a precondition to apply the fast path. An alternate approach would be to only read an own groups property based on a HasOwnProperty test followed by a Get. I don't see big advantages or disadvantages to that approach vs this one, and I'd be fine to revisit this patch if more differentiating factors are raised before Stage 4. Closes #34
If the RegExp does not have named groups, set a groups property of the result to undefined. This change is made to facilitate optimizing RegExp implementations, which often take a "fast path" for built-in, unmodified to not incur the overhead of the full subclassable semantics. If the "groups" property may be read from higher up on the prototype chain, the fast path for RegExp.prototype.replace would become more complex or slower, or alternatively, more conditions about the environment might need to be checked as a precondition to apply the fast path. An alternate approach would be to only read an own groups property based on a HasOwnProperty test followed by a Get. I don't see big advantages or disadvantages to that approach vs this one, and I'd be fine to revisit this patch if more differentiating factors are raised before Stage 4. Closes #34
If the RegExp does not have named groups, set a groups property of the result to undefined. This change is made to facilitate optimizing RegExp implementations, which often take a "fast path" for built-in, unmodified to not incur the overhead of the full subclassable semantics. If the "groups" property may be read from higher up on the prototype chain, the fast path for RegExp.prototype.replace would become more complex or slower, or alternatively, more conditions about the environment might need to be checked as a precondition to apply the fast path. An alternate approach would be to only read an own groups property based on a HasOwnProperty test followed by a Get. I don't see big advantages or disadvantages to that approach vs this one, and I'd be fine to revisit this patch if more differentiating factors are raised before Stage 4. Closes #34
https://bugs.webkit.org/show_bug.cgi?id=204067 Patch by Alexey Shvayka <[email protected]> on 2019-11-12 Reviewed by Ross Kirsling. JSTests: * test262/expectations.yaml: Mark 4 test cases as passing. Source/JavaScriptCore: After RegExp named capture groups were initially implemented in JSC, the spec was changed to unconditionally create "groups" property. (tc39/proposal-regexp-named-groups#34) This patch implements the change (that was shipped by V8), reducing number of structures we use for RegExpMatchesArray, and also sets [[Prototype]] of "groups" object to `null`. (step 24 of https://tc39.es/ecma262/#sec-regexpbuiltinexec) * dfg/DFGAbstractInterpreterInlines.h: (JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects): * dfg/DFGStrengthReductionPhase.cpp: (JSC::DFG::StrengthReductionPhase::handleNode): * runtime/JSGlobalObject.cpp: (JSC::JSGlobalObject::init): (JSC::JSGlobalObject::fireWatchpointAndMakeAllArrayStructuresSlowPut): (JSC::JSGlobalObject::visitChildren): * runtime/JSGlobalObject.h: (JSC::JSGlobalObject::regExpMatchesArrayStructure const): (JSC::JSGlobalObject::regExpMatchesArrayWithGroupsStructure const): Deleted. * runtime/RegExpMatchesArray.cpp: (JSC::createStructureImpl): (JSC::createRegExpMatchesArrayWithGroupsStructure): Deleted. (JSC::createRegExpMatchesArrayWithGroupsSlowPutStructure): Deleted. * runtime/RegExpMatchesArray.h: (JSC::createRegExpMatchesArray): * runtime/StringPrototype.cpp: (JSC::replaceUsingRegExpSearch): git-svn-id: http://svn.webkit.org/repository/webkit/trunk@252374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
https://bugs.webkit.org/show_bug.cgi?id=204067 Patch by Alexey Shvayka <[email protected]> on 2019-11-12 Reviewed by Ross Kirsling. JSTests: * test262/expectations.yaml: Mark 4 test cases as passing. Source/JavaScriptCore: After RegExp named capture groups were initially implemented in JSC, the spec was changed to unconditionally create "groups" property. (tc39/proposal-regexp-named-groups#34) This patch implements the change (that was shipped by V8), reducing number of structures we use for RegExpMatchesArray, and also sets [[Prototype]] of "groups" object to `null`. (step 24 of https://tc39.es/ecma262/#sec-regexpbuiltinexec) * dfg/DFGAbstractInterpreterInlines.h: (JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects): * dfg/DFGStrengthReductionPhase.cpp: (JSC::DFG::StrengthReductionPhase::handleNode): * runtime/JSGlobalObject.cpp: (JSC::JSGlobalObject::init): (JSC::JSGlobalObject::fireWatchpointAndMakeAllArrayStructuresSlowPut): (JSC::JSGlobalObject::visitChildren): * runtime/JSGlobalObject.h: (JSC::JSGlobalObject::regExpMatchesArrayStructure const): (JSC::JSGlobalObject::regExpMatchesArrayWithGroupsStructure const): Deleted. * runtime/RegExpMatchesArray.cpp: (JSC::createStructureImpl): (JSC::createRegExpMatchesArrayWithGroupsStructure): Deleted. (JSC::createRegExpMatchesArrayWithGroupsSlowPutStructure): Deleted. * runtime/RegExpMatchesArray.h: (JSC::createRegExpMatchesArray): * runtime/StringPrototype.cpp: (JSC::replaceUsingRegExpSearch): Canonical link: https://commits.webkit.org/217426@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@252374 268f45cc-cd09-0410-ab3c-d52691b4dbfc
I wonder if the following change for RegExpBuiltinExec will make it easier to improve the performance of GetSubstitution for the common case when only normal RegExp objects are used in
RegExp.prototype[@@replace]
.With the current semantics, implementations always need to lookup
"groups"
on%ArrayPrototype%
and%ObjectPrototype%
before using an optimized fast path. For example this test caseprints
"--bar--"
with the current spec proposal.The text was updated successfully, but these errors were encountered: