Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh Request.accept functionality #2687

Merged
merged 11 commits into from
Feb 21, 2023
Merged

Refresh Request.accept functionality #2687

merged 11 commits into from
Feb 21, 2023

Conversation

ahopkins
Copy link
Member

@ahopkins ahopkins commented Feb 16, 2023

This falls on the heels of #2663 and #2668 and builds of the work that @Tronic did in the latter of the two PRs. This closely tracks his changes with some additions to make the pattern more compatible with the existing implementation. The following are the changes from the implementation in #2668:

  • Rename Matched >> Accept
  • Accept.__eq__ operator also checks q value
  • Add all comparison operators to Accept
  • Add match to Accept
  • MediaType parts to have wildcard
  • MediaType to match using str or MediaType as input

The main breaking changes from main are that the in operator is not longer equivalent to match. Also, the params to use wildcards has been simplified in match to a single flag.

@ahopkins ahopkins requested a review from a team as a code owner February 16, 2023 21:59
@codecov
Copy link

codecov bot commented Feb 16, 2023

Codecov Report

Base: 88.603% // Head: 88.567% // Decreases project coverage by -0.036% ⚠️

Coverage data is based on head (9f77909) compared to base (6f5303e).
Patch coverage: 91.304% of modified lines in pull request are covered.

Additional details and impacted files
@@              Coverage Diff              @@
##              main     #2687       +/-   ##
=============================================
- Coverage   88.603%   88.567%   -0.036%     
=============================================
  Files           87        87               
  Lines         6853      6849        -4     
  Branches      1171      1176        +5     
=============================================
- Hits          6072      6066        -6     
+ Misses         539       538        -1     
- Partials       242       245        +3     
Impacted Files Coverage Δ
sanic/headers.py 96.097% <90.804%> (-0.075%) ⬇️
sanic/errorpages.py 97.938% <100.000%> (ø)
sanic/request.py 94.776% <100.000%> (+0.248%) ⬆️
sanic/app.py 89.342% <0.000%> (-0.711%) ⬇️
sanic/server/websockets/impl.py 37.788% <0.000%> (+0.230%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Tronic
Copy link
Member

Tronic commented Feb 16, 2023

Had a brief look at the code but not properly yet, commenting based on the summary you made:

  • Rename Matched >> Accept

OK. The original implementation had something else named Accept, so I kept the match object named differently to avoid confusion.

  • Accept.__eq__ operator also checks q value
  • Add all comparison operators to Accept

These are problematic. I suggest not making any of the things behave like strings to avoid possible confusion especially with non-standard comparisons

  • Add match to Accept

What is this needed for? This feature is already bloated, would rather keep it smaller (as stated already in #2200).

  • MediaType parts to have wildcard

Keeping the parts separate custom types is bad for performance. I believe this was the primary reason why my PR is twice faster.

  • MediaType to match using str or MediaType as input

I will look at that more closer later. My implementation intentionally kept header and what they are matched against asymmetric in some cases, not pretending that MediaType (of header) and MIME str (match argument) are the same. Note that it also allows matching against a parameter but a parameter on header does not cause match to fail.

sanic/headers.py Outdated Show resolved Hide resolved
sanic/headers.py Outdated Show resolved Hide resolved
sanic/headers.py Outdated Show resolved Hide resolved
@ahopkins
Copy link
Member Author

I have run some benchmarks on the various implementations.

branch s / 100,000 parses RFC compliant
main (current implementation) 1.587453 ✔️
#2663 (my first PR) 1.583373 ✔️
#2668 (@Tronic PR) 0.927016
#2687 (my second PR) 1.105352 ✔️
#2687 (my second PR, using only str) 1.081902 ✔️

So changing from MediaTypePart (a subclass of str to plain str saves 0.02344999999999997 over 100,000 runs, or an average of 0.234499μs per run. I am happy to make this change as I am not tied to MediaTypePart. It only served the purpose of abstracting away == '*'.

Testing further, going to a cached key for sorting is 1.064842/100k runs or .405099μs. It should be noted that these are certainly not meant to be statistically significant as they are super sensitive to background noise on my machine. I run these multiple times and attempt to normalize to gather trends. So at best the numbers are a guideline and not an exact rule.

@Tronic
Copy link
Member

Tronic commented Feb 19, 2023

Good changes. I still suggest leaving out the comparison operators of match objects (btw, I do find the name Accept of that still a bit confusing). The intended use case with my PR is checking which argument matched:

if request.accept.match("application/json", "text/html") == "text/html":
    # respond HTML
else:
    # respond JSON

Granted, there are good alternative ways to do that e.g. by accessing the mime property, and many possible semantics for equality comparison of the match objects. Still, having multiple match objects compare by their quality is certainly stretching that and it is hard to imagine real use for that in applications, when the match function already performs selection of best format.

Also, the PR still mixes comparisons by plain q vs. by the sort key, and should be consistent one way or the other in all its operations.

@ahopkins
Copy link
Member Author

ahopkins commented Feb 19, 2023

I do find the name Accept of that still a bit confusing

OK

Also, the PR still mixes comparisons by plain q vs. by the sort key, and should be consistent one way or the other in all its operations.

This is certainly a question for open debate.

Consider the following change in AcceptList.match:

# from ...
        a = sorted(
            (-acc.q, i, j, mime, acc)
            for j, acc in enumerate(self)
            if accept_wildcards or not acc.has_wildcard
            for i, mime in enumerate(mimes)
            if acc.match(mime)
        )

# to ...
        a = sorted(
            (*acc.key, i, j, mime, acc)
            for j, acc in enumerate(self)
            if accept_wildcards or not acc.has_wildcard
            for i, mime in enumerate(mimes)
            if acc.match(mime)
        )

What would you expect the result of this to be?

accept = parse_accept("text/*, text/plain, text/plain;format=flowed, */*")
accept.match("text/csv", "text/plain")

Should it be "try and give me text/csv if possible" or "give me text/csv if explicitly allowed, otherwise fallback to text/plain"?

I think it would be more intuitive if the result was text/csv, meaning in this case we only care about qvalues because we want to provide equal weight to wildcards and explicit. @Tronic?

What if we allow both? This edge case certainly needs to be documented though no matter how we handle it.

request.accept.match(..., rfc_priority=True)  # better name? 
# ... prefer_non_wildcard=True or prefer_explicit=True

I honestly am not sure we need to go that far. I think for this use case sort by just q value makes sense. This is a different use case than the sorting pattern in the RFC.

@ahopkins ahopkins requested a review from Tronic February 20, 2023 21:21
Copy link
Member

@Tronic Tronic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we've had enough review of this. In principle all looks good to me and it certainly is an improvement over what we had. If you have any finishing touches, feel free to do them but this is LGTM as is or with changes.

@ahopkins ahopkins added this pull request to the merge queue Feb 21, 2023
@ahopkins ahopkins removed this pull request from the merge queue due to the queue being cleared Feb 21, 2023
@ahopkins ahopkins merged commit d238995 into main Feb 21, 2023
@ahopkins ahopkins deleted the accept-updates branch February 21, 2023 06:22
@Tronic
Copy link
Member

Tronic commented Feb 26, 2023

Moved documentation from other PR (may not be fully up to date but archiving here anyway):

Accept header

Sanic has had a helper for parsing the Accept header in HTTP requests since version 21.9, PR #2200. This feature was never documented, but it provides two methods for matching: "text/html" in request.accept and request.accept.match("text/html"). Both are identical and always produce True if the header includes the wildcard */*, which is included by all clients by default. The latter method has optional kwargs to skip wildcard matches to make it more useful in such cases.

Additionally, request.accept is a list that apps can traverse to do their own matching against each item of the header, or print its repr() for debugging purposes. The items themselves are derived from str, but with equality and comparisons implemented by q values only, thus item == "text/html" does not do what one might expect.

This PR rewrites the entire handling, changing its semantics and inner function, making it behave in a more practical, less surprising way, and with additional and removed functionality. A default value of accept: */* is used if the header is missing, as is required by the RFC. Conversion str(request.accept) reformats the parsed and sorted header as an Accept header value.

The match function can now take multiple MIME types as arguments and return the best match based on both the client's and the app's preferences. It returns a Matched object that it is truthy alike the earlier bool return value, but tells which argument matched which header item. This is mostly compatible with the old version but the two kwargs are replaced by one that has different semantics: it skips any wildcard entries on the header, but still allows the application to provide wildcard types that will match non-wildcard header items.

The in matching uses items' equality comparison and behaves mostly identically to that against the request.headers["accept"] string. The item equality comparison is now by literal MIME alone, i.e. wildcards only match identical wildcards.

A deprecation warning will be needed for version 22.12 LTS users who would be affected by these changes, with alternatives that can work with old and new versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants