Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querystring: fix empty pairs handling #11234

Closed

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented Feb 8, 2017

This is a follow up to #10967, which is (currently) a semver-major.

This PR fixes outstanding issues stemming from the aforementioned PR and includes the tests from #11171. FWIW the original issue for all of this was that empty query string key/value pairs before the end of the string were not being ignored like an empty key/value at the end of the string would be.

I've also included some improvements to the parser in general, along with a couple of new benchmarks to highlight some of those improvements. Here are the results for benchmark/querystring/querystring-parse.js:

                                                                  improvement confidence      p.value
 querystring/querystring-parse.js n=1000000 type="altspaces"          84.35 %        *** 1.191823e-35
 querystring/querystring-parse.js n=1000000 type="encodefake"          2.96 %          * 4.610320e-02
 querystring/querystring-parse.js n=1000000 type="encodelast"          4.78 %         ** 7.673085e-03
 querystring/querystring-parse.js n=1000000 type="encodemany"          2.83 %            5.099655e-02
 querystring/querystring-parse.js n=1000000 type="manyblankpairs"    192.66 %        *** 4.461978e-48
 querystring/querystring-parse.js n=1000000 type="manypairs"          -0.14 %            9.307657e-01
 querystring/querystring-parse.js n=1000000 type="multicharsep"        1.65 %            2.795571e-01
 querystring/querystring-parse.js n=1000000 type="multivalue"          1.40 %            3.885362e-01
 querystring/querystring-parse.js n=1000000 type="multivaluemany"      1.82 %            4.116370e-01
 querystring/querystring-parse.js n=1000000 type="noencode"           -0.45 %            8.223732e-01

CI: https://ci.nodejs.org/job/node-test-pull-request/6279/

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)
  • querystring

@mscdex mscdex added dont-land-on-v4.x querystring Issues and PRs related to the built-in querystring module. labels Feb 8, 2017
This commit fixes handling of empty pairs that occur before the end
of the query string so that they are also ignored.

Additionally, some optimizations have been made, including:

* Avoid unnecessary code execution where possible
* Use a lookup table when checking for hex characters
* Avoid forced decoding when '+' characters are encountered and we
are using the default decoder

Fixes: nodejs#10454
@mscdex mscdex force-pushed the querystring-fix-empty-pairs-handling branch from 71973b3 to 7b72507 Compare February 8, 2017 08:41
encodeCheck = 1;
continue;
} else if (encodeCheck > 0) {
// eslint-disable-next-line no-extra-boolean-cast
Copy link
Contributor Author

@mscdex mscdex Feb 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anyone is curious, this was added because there seems to be a positive difference when explicitly converting to a boolean in the conditional vs. implicitly converting the numeric value (or undefined) to a boolean. eslint doesn't know that isHexTable contains non-booleans, so I explicitly silence these "errors."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to use isHexTable[code] > 0 as a more readable option? The performance seems to be quite similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's debatable. I think the explicit conversion is easier to reason about than remembering how undefined values compare with numeric values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, in this case I'd propose isHexTable[code] === 1 as an alternative

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anyone is curious, this was added because there seems to be a positive difference when explicitly converting to a boolean in the conditional vs. implicitly converting the numeric value (or undefined) to a boolean.

Could you clarify what you mean by the "positive difference"?

Copy link
Contributor Author

@mscdex mscdex Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify what you mean by the "positive difference"?

An increase in performance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. AFAIK if (foo) has exactly the same behavior as if (!!foo), so I'm not sure why the performance would be different.

If there really is a difference, then I propose adding a comment explaining that.

// timed loop. Instead, just execute the function a "sufficient" number of
// times before the timed loop to ensure the function is optimized just once.
if (type === 'multicharsep') {
for (i = 0; i < n; i += 1)
Copy link
Member

@joyeecheung joyeecheung Feb 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be n? Refs: v8/src/runtime-profiler.cc, this might change but 1e3 should probabaly be safe.

// Number of times a function has to be seen on the stack before it is
// compiled for baseline.
static const int kProfilerTicksBeforeBaseline = 1;
// Number of times a function has to be seen on the stack before it is
// optimized.
static const int kProfilerTicksBeforeOptimization = 2;
// If the function optimization was disabled due to high deoptimization count,
// but the function is hot and has been seen on the stack this number of times,
// then we try to reenable optimization for this function.
static const int kProfilerTicksBeforeReenablingOptimization = 250;
// If a function does not have enough type info (according to
// FLAG_type_info_threshold), but has seen a huge number of ticks,
// optimize it as it is.
static const int kTicksWhenNotEnoughTypeInfo = 100;
// We only have one byte to store the number of ticks.
STATIC_ASSERT(kProfilerTicksBeforeOptimization < 256);
STATIC_ASSERT(kProfilerTicksBeforeReenablingOptimization < 256);
STATIC_ASSERT(kTicksWhenNotEnoughTypeInfo < 256);

Copy link
Contributor Author

@mscdex mscdex Feb 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I used n / 2, but I seemed to get better (less variable) results with n, especially considering the default value of n. Either way, n should be enough no matter what, so I kept it.

@stevenvachon
Copy link

Please merge asap.

@mscdex
Copy link
Contributor Author

mscdex commented Feb 9, 2017

@stevenvachon Other @nodejs/collaborators need to sign off on this first.

@stevenvachon
Copy link

Hate to nag, but how about now? I'd really like to play with this in a nightly build.

jasnell pushed a commit that referenced this pull request Feb 13, 2017
This commit fixes handling of empty pairs that occur before the end
of the query string so that they are also ignored.

Additionally, some optimizations have been made, including:

* Avoid unnecessary code execution where possible
* Use a lookup table when checking for hex characters
* Avoid forced decoding when '+' characters are encountered and we
are using the default decoder

Fixes: #10454
PR-URL: #11234
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Nicu Micleușanu <[email protected]>
jasnell pushed a commit that referenced this pull request Feb 13, 2017
PR-URL: #11234
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Nicu Micleușanu <[email protected]>
@jasnell
Copy link
Member

jasnell commented Feb 13, 2017

Landed in ff785fd...8bcc122

@jasnell jasnell closed this Feb 13, 2017
@mscdex mscdex deleted the querystring-fix-empty-pairs-handling branch February 14, 2017 01:14
@TimothyGu TimothyGu added the whatwg-url Issues and PRs related to the WHATWG URL implementation. label Feb 14, 2017
TimothyGu pushed a commit that referenced this pull request Feb 17, 2017
This specific bug was fixed by #11234. This commit turns on the tests
accordingly.

PR-URL: #11369
Ref: #11234
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@stevenvachon
Copy link

This has a "dont-land-on-v7.x" label, yet it appears to be working in v7.6

@mscdex
Copy link
Contributor Author

mscdex commented Feb 28, 2017

@stevenvachon This didn't land in v7.x. Can you explain what you mean by "working?" For example, querystring.parse('a&&b') still returns { a: '', '': '', b: '' } in v7.6.0, however master returns { a: '', b: '' } (due to this PR being merged in master).

I should also point out that querystring.parse('a&b&') has always resulted in { a: '', b: '' }. This PR was for fixing the case when an empty pair does not occur at the end of the string.

@stevenvachon
Copy link

stevenvachon commented Feb 28, 2017

This issue was referenced in #11171 as the solution. My comment there, containing a test case that was failing on 7.5.x is now passing on 7.6

@mscdex
Copy link
Contributor Author

mscdex commented Feb 28, 2017

@stevenvachon With the test case you're referencing I get the same result on node v0.10.41, v4.0.0, v6.5.0, v7.5.0, v7.6.0, and master:

> querystring.parse('a=&a=value&a=')
{ a: [ '', 'value', '' ] }

@stevenvachon
Copy link

Shit.. I dunno what's going on then. My original issue is not reproducible with 7.5, which was available days before posting.

@mscdex
Copy link
Contributor Author

mscdex commented Feb 28, 2017

@stevenvachon Ah ok, so that was originally using WHATWG URL, which does its parsing in a specific way and its implementation has changed often since node v7.0.0. FWIW though I tried node v7.4.0, v7.5.0, and v7.6.0 and got the same expected result you showed in your original issue, namely:

> const url = new (require("url").URL)("http://www.domain.com:123/dir/file.html?var=&var=value&var=#hash");
undefined
> console.log( Array.from(url.searchParams) );
[ [ 'var', '' ], [ 'var', 'value' ], [ 'var', '' ] ]
undefined

@stevenvachon
Copy link

Thank you. Sorry for wasting space in here. I'll have to update a number of test suites as I'd built around this supposed issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
querystring Issues and PRs related to the built-in querystring module. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants