Refresh Request.accept functionality #2687

ahopkins · 2023-02-16T21:59:40Z

This falls on the heels of #2663 and #2668 and builds of the work that @Tronic did in the latter of the two PRs. This closely tracks his changes with some additions to make the pattern more compatible with the existing implementation. The following are the changes from the implementation in #2668:

Rename Matched >> Accept
Accept.__eq__ operator also checks q value
Add all comparison operators to Accept
Add match to Accept
MediaType parts to have wildcard
MediaType to match using str or MediaType as input

The main breaking changes from main are that the in operator is not longer equivalent to match. Also, the params to use wildcards has been simplified in match to a single flag.

codecov · 2023-02-16T22:04:53Z

Codecov Report

Base: 88.603% // Head: 88.567% // Decreases project coverage by -0.036% ⚠️

Coverage data is based on head (9f77909) compared to base (6f5303e).
Patch coverage: 91.304% of modified lines in pull request are covered.

Additional details and impacted files

@@              Coverage Diff              @@
##              main     #2687       +/-   ##
=============================================
- Coverage   88.603%   88.567%   -0.036%     
=============================================
  Files           87        87               
  Lines         6853      6849        -4     
  Branches      1171      1176        +5     
=============================================
- Hits          6072      6066        -6     
+ Misses         539       538        -1     
- Partials       242       245        +3

Impacted Files	Coverage Δ
sanic/headers.py	`96.097% <90.804%> (-0.075%)`	⬇️
sanic/errorpages.py	`97.938% <100.000%> (ø)`
sanic/request.py	`94.776% <100.000%> (+0.248%)`	⬆️
sanic/app.py	`89.342% <0.000%> (-0.711%)`	⬇️
sanic/server/websockets/impl.py	`37.788% <0.000%> (+0.230%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Tronic · 2023-02-16T22:24:48Z

Had a brief look at the code but not properly yet, commenting based on the summary you made:

Rename Matched >> Accept

OK. The original implementation had something else named Accept, so I kept the match object named differently to avoid confusion.

Accept.__eq__ operator also checks q value

Add all comparison operators to Accept

These are problematic. I suggest not making any of the things behave like strings to avoid possible confusion especially with non-standard comparisons

Add match to Accept

What is this needed for? This feature is already bloated, would rather keep it smaller (as stated already in #2200).

MediaType parts to have wildcard

Keeping the parts separate custom types is bad for performance. I believe this was the primary reason why my PR is twice faster.

MediaType to match using str or MediaType as input

I will look at that more closer later. My implementation intentionally kept header and what they are matched against asymmetric in some cases, not pretending that MediaType (of header) and MIME str (match argument) are the same. Note that it also allows matching against a parameter but a parameter on header does not cause match to fail.

sanic/headers.py

tests/test_headers.py

sanic/headers.py

ahopkins · 2023-02-19T11:20:53Z

I have run some benchmarks on the various implementations.

branch	s / 100,000 parses	RFC compliant
`main` (current implementation)	1.587453	✔️
#2663 (my first PR)	1.583373	✔️
#2668 (@Tronic PR)	0.927016	❌
#2687 (my second PR)	1.105352	✔️
#2687 (my second PR, using only `str`)	1.081902	✔️

So changing from MediaTypePart (a subclass of str to plain str saves 0.02344999999999997 over 100,000 runs, or an average of 0.234499μs per run. I am happy to make this change as I am not tied to MediaTypePart. It only served the purpose of abstracting away == '*'.

Testing further, going to a cached key for sorting is 1.064842/100k runs or .405099μs. It should be noted that these are certainly not meant to be statistically significant as they are super sensitive to background noise on my machine. I run these multiple times and attempt to normalize to gather trends. So at best the numbers are a guideline and not an exact rule.

Tronic · 2023-02-19T19:01:51Z

Good changes. I still suggest leaving out the comparison operators of match objects (btw, I do find the name Accept of that still a bit confusing). The intended use case with my PR is checking which argument matched:

if request.accept.match("application/json", "text/html") == "text/html":
    # respond HTML
else:
    # respond JSON

Granted, there are good alternative ways to do that e.g. by accessing the mime property, and many possible semantics for equality comparison of the match objects. Still, having multiple match objects compare by their quality is certainly stretching that and it is hard to imagine real use for that in applications, when the match function already performs selection of best format.

Also, the PR still mixes comparisons by plain q vs. by the sort key, and should be consistent one way or the other in all its operations.

ahopkins · 2023-02-19T20:02:08Z

I do find the name Accept of that still a bit confusing

OK

Also, the PR still mixes comparisons by plain q vs. by the sort key, and should be consistent one way or the other in all its operations.

This is certainly a question for open debate.

Consider the following change in AcceptList.match:

# from ...
        a = sorted(
            (-acc.q, i, j, mime, acc)
            for j, acc in enumerate(self)
            if accept_wildcards or not acc.has_wildcard
            for i, mime in enumerate(mimes)
            if acc.match(mime)
        )

# to ...
        a = sorted(
            (*acc.key, i, j, mime, acc)
            for j, acc in enumerate(self)
            if accept_wildcards or not acc.has_wildcard
            for i, mime in enumerate(mimes)
            if acc.match(mime)
        )

What would you expect the result of this to be?

accept = parse_accept("text/*, text/plain, text/plain;format=flowed, */*")
accept.match("text/csv", "text/plain")

Should it be "try and give me text/csv if possible" or "give me text/csv if explicitly allowed, otherwise fallback to text/plain"?

I think it would be more intuitive if the result was text/csv, meaning in this case we only care about qvalues because we want to provide equal weight to wildcards and explicit. @Tronic?

What if we allow both? This edge case certainly needs to be documented though no matter how we handle it.

request.accept.match(..., rfc_priority=True)  # better name? 
# ... prefer_non_wildcard=True or prefer_explicit=True

I honestly am not sure we need to go that far. I think for this use case sort by just q value makes sense. This is a different use case than the sorting pattern in the RFC.

sanic/headers.py

Tronic

I guess we've had enough review of this. In principle all looks good to me and it certainly is an improvement over what we had. If you have any finishing touches, feel free to do them but this is LGTM as is or with changes.

Tronic · 2023-02-26T01:24:22Z

Moved documentation from other PR (may not be fully up to date but archiving here anyway):

Accept header

Sanic has had a helper for parsing the Accept header in HTTP requests since version 21.9, PR #2200. This feature was never documented, but it provides two methods for matching: "text/html" in request.accept and request.accept.match("text/html"). Both are identical and always produce True if the header includes the wildcard */*, which is included by all clients by default. The latter method has optional kwargs to skip wildcard matches to make it more useful in such cases.

Additionally, request.accept is a list that apps can traverse to do their own matching against each item of the header, or print its repr() for debugging purposes. The items themselves are derived from str, but with equality and comparisons implemented by q values only, thus item == "text/html" does not do what one might expect.

This PR rewrites the entire handling, changing its semantics and inner function, making it behave in a more practical, less surprising way, and with additional and removed functionality. A default value of accept: */* is used if the header is missing, as is required by the RFC. Conversion str(request.accept) reformats the parsed and sorted header as an Accept header value.

The match function can now take multiple MIME types as arguments and return the best match based on both the client's and the app's preferences. It returns a Matched object that it is truthy alike the earlier bool return value, but tells which argument matched which header item. This is mostly compatible with the old version but the two kwargs are replaced by one that has different semantics: it skips any wildcard entries on the header, but still allows the application to provide wildcard types that will match non-wildcard header items.

The in matching uses items' equality comparison and behaves mostly identically to that against the request.headers["accept"] string. The item equality comparison is now by literal MIME alone, i.e. wildcards only match identical wildcards.

A deprecation warning will be needed for version 22.12 LTS users who would be affected by these changes, with alternatives that can work with old and new versions.

ahopkins added 3 commits February 16, 2023 22:59

Blend new pattern with backwards compat

18a84dd

Blend new pattern with backwards compat

284d82e

Update errorpage to not use in

b255145

ahopkins requested a review from a team as a code owner February 16, 2023 21:59

Add check for typing

7d84be3

Remove commented out code

7237fdb

Tronic reviewed Feb 16, 2023

View reviewed changes

sanic/headers.py Outdated Show resolved Hide resolved

Tronic reviewed Feb 16, 2023

View reviewed changes

tests/test_headers.py Show resolved Hide resolved

Tronic reviewed Feb 17, 2023

View reviewed changes

sanic/headers.py Outdated Show resolved Hide resolved

Tronic reviewed Feb 17, 2023

View reviewed changes

sanic/headers.py Outdated Show resolved Hide resolved

ahopkins added 2 commits February 19, 2023 13:38

String only media type parts

6eeee63

Typing fixes

276fab8

ahopkins added 2 commits February 19, 2023 21:57

Rename Accept to Matched

843b3fe

Add back negative

5418bd0

ahopkins added 2 commits February 19, 2023 22:04

Cleanup tests

338e2b5

Fix error page test

9f77909

ahopkins requested a review from Tronic February 20, 2023 21:21

Tronic reviewed Feb 21, 2023

View reviewed changes

sanic/headers.py Show resolved Hide resolved

Tronic approved these changes Feb 21, 2023

View reviewed changes

ahopkins added this pull request to the merge queue Feb 21, 2023

ahopkins removed this pull request from the merge queue due to the queue being cleared Feb 21, 2023

ahopkins merged commit d238995 into main Feb 21, 2023

ahopkins deleted the accept-updates branch February 21, 2023 06:22

Tronic mentioned this pull request Feb 26, 2023

Error page rendering format selection #2668

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh Request.accept functionality #2687

Refresh Request.accept functionality #2687

ahopkins commented Feb 16, 2023 •

edited

Loading

codecov bot commented Feb 16, 2023 •

edited

Loading

Tronic commented Feb 16, 2023

ahopkins commented Feb 19, 2023

Tronic commented Feb 19, 2023

ahopkins commented Feb 19, 2023 •

edited

Loading

Tronic left a comment

Tronic commented Feb 26, 2023

Refresh Request.accept functionality #2687

Refresh Request.accept functionality #2687

Conversation

ahopkins commented Feb 16, 2023 • edited Loading

codecov bot commented Feb 16, 2023 • edited Loading

Codecov Report

Tronic commented Feb 16, 2023

ahopkins commented Feb 19, 2023

Tronic commented Feb 19, 2023

ahopkins commented Feb 19, 2023 • edited Loading

Tronic left a comment

Choose a reason for hiding this comment

Tronic commented Feb 26, 2023

Accept header

ahopkins commented Feb 16, 2023 •

edited

Loading

codecov bot commented Feb 16, 2023 •

edited

Loading

ahopkins commented Feb 19, 2023 •

edited

Loading