Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convenience methods for cookie creation and deletion #2706

Merged
merged 53 commits into from
Mar 21, 2023
Merged

Conversation

ahopkins
Copy link
Member

@ahopkins ahopkins commented Mar 5, 2023

Background

For years I have been meaning to add this PR ...


Creating and deleting cookies in Sanic requires a rather bizarre construct because of the implementation as a modified dictionary object.

@app.route("/cookie")
async def test(request):
    response = text("There's a cookie up in this response")
    response.cookies["test"] = "It worked!"
    response.cookies["test"]["domain"] = ".yummy-yummy-cookie.com"
    response.cookies["test"]["httponly"] = True
    return response

That just plain looks weird and is not intuitive.

Solution

This PR aims to fix this by making these objects feel more Pythonic with methods, and descriptors.

@app.route("/cookie")
async def test(request):
    response = text("There's a cookie up in this response")
	cookie = response.cookies.add_cookie("test", "It worked!", domain=".yummy-yummy-cookie.com")
    cookie.httponly = True
    return response

As shown, you can either add properties in the convenience method, or to the returned Cookie object. This is fully backwards compat.

The other thing this PR addresses is a similar method for deletions:

@app.route("/cookie")
async def test(request):
    response = text("Time to eat some cookies muahaha")

    # This cookie will be set to expire in 0 seconds
    response.cookies.delete_coolie("kill_me")

    return response

It also adds a couple more methods:

cookies.get_cookie("foo")
cookies.has_cookie("foo")

Backwards compat

This attempts to be backwards compatible with a move towards deprecating the overloaded dict style implementation. We will continue to support __getitem__ and __setitem__ after we remove the rest of the dict support with an ongoing notice to move to the new style.

Changes

This introduces secure=True and samesite="lax" as defaults. It also adds support for a missing cookie property (partitioned=True), and adds explicit support and checking for adding __Host- and __Secure- style prefixes.

Additionally, this adds property style accessors similar to headers:

# equivalent of request.cookies.get("session-token", "")
request.cookies.session_token

TODO

  • tests

sanic/cookies.py Outdated Show resolved Hide resolved
sanic/cookies.py Outdated Show resolved Hide resolved
@Tronic
Copy link
Member

Tronic commented Mar 14, 2023

I noticed that the escaping of cookie values is based on a non-standard method of octal escapes used in Python's SimpleCookie and also flask/werkzeug. This is fine, but the code from SimpleCookie used in Sanic apparently has a bug as it operates on Unicode codepoints while it looks like it was intended to operate on bytes (as werkzeug does). As a result, it escapes all of 128-255 but doesn't touch other Unicode code points. Werkzeug has fixed this bug and escapes everything that is not ASCII.

However, since browsers are happy with UTF-8 as is, we actually only need to escape a few ASCII characters. This is accomplished by using this mapping instead of the current code:

_Translator = {ch: f"\\{ch:03o}" for ch in bytes(range(32)) + b'";\\\x7F'}

The above only escapes ASCII control characters, punctuation that cannot occur in cookie value even when quoted, and the backslash character that is used in this escaping scheme. Latin range appears in document.cookie without any mangling, as do emoji and other more exotic Unicode. Sanic already decodes request cookie header using SimpleCookie such that all valid Unicode characters (incl. ASCII and extended control characters) are restored as they were.

It is remains recommended that applications use URL encoding (%-escaped UTF-8) for their textual cookie values or base64 for binary data, avoiding any potential incompatibilities with this custom scheme.

@ahopkins
Copy link
Member Author

I spent a lot of time debating various strategies for the cookies property. Balancing all considerations, I am settling on the implementation because:

  • it is performant
  • it is more accurate than existing implementation when served multiple cookies with the same value
  • it creates a familiar pattern to request.args, request.files, etc
  • it is fully backwards compatible

Here is some of the thought process and benchmarking:


I was heading in the direction of providing a regular (single value) dict as Request.cookies with an option that echos Request.get_form, Request.get_args, and Request.get_query_args.

It would look something like this:

class Request:
    ...
    cookies = property(get_cookies)

    def get_cookies(self, first_value_only=True):
        cookie = self.headers.getone("cookie", "")
        return parse_cookie(cookie, first_value_only=first_value_only)

This would yield:

>>> print(request.cookies["foo"])
bar
>>> print(request.cookies.get("foo"))
bar
>>> print(request.get_cookies(False).getlist("foo"))
['bar', 'some other stuff too']

With this in mind, I did some benchmarking on various solutions.

Before showing the benchmarked implementations, here is a common function among the results:

def _extract(token):
    name, __, value = token.partition("=")
    name = name.strip()
    value = value.strip()

    if not name or _COOKIE_NAME_RESERVED_CHARS.search(name):
        return None

    if len(value) > 2 and value[0] == '"' and value[-1] == '"':
        value = http_cookies._unquote(value)

    return name, value

First, was a test using walrus (which we cannot use in 3.7) so that the _extract method would always return a value. This enables the ability to have a comprehension style generator. The alternative (which benchmarked about the same) is to raise an exception in _extract instead of return None. This would allow us to have Python 3.7 support, but means failing on bad cookies instead of ignoring them.

Result

200000 loops, best of 5: 1.44 usec per loop
def parse_cookie(raw: str, first_value_only: bool = True):
    tokens = raw.split(";")
    if first_value_only:
        return dict(e for token in reversed(tokens) if (e := _extract(token)))

    cookies: Dict[str, List] = {}

    for token in tokens:
        name, value = _extract(token)
        if name in cookies:
            cookies[name].append(value)
        else:
            cookies[name] = [value]

    return cookies

The next implementation tried does a similar thing, except it keeps everything in the forloop. This performed consistently better than the previous (either with or without the walrus)

Result

200000 loops, best of 5: 1.23 usec per loop
def parse_cookie(raw: str, first_value_only: bool = True):
    tokens = raw.split(";")
    cookies: Dict[str, List] = {}

    for token in tokens:
        e = _extract(token)
        if not e:
            continue
        name, value = e
        if first_value_only:
            if name in cookies:
                continue
            cookies[name] = value
        elif name in cookies:
            cookies[name].append(value)
        else:
            cookies[name] = [value]

    return cookies

Using MultiDict is the simplest looking solution, but the slowest. It is shown here with the walrus version for 3.8+. I had similar experience with the raise style that is 3.7 compat.

Result

200000 loops, best of 5: 2 usec per loop
def _extract(token):
    name, __, value = token.partition("=")
    name = name.strip()
    value = value.strip()

    if not name or _COOKIE_NAME_RESERVED_CHARS.search(name):
        return None

    if len(value) > 2 and value[0] == '"' and value[-1] == '"':
        value = http_cookies._unquote(value)

    return name, value

def parse_cookie_md(raw: str):
    tokens = raw.split(";")
    return MultiDict(e for token in tokens if (e := _extract(token)))

The final test I did took the idea of the forloop only and put the _extract into it. I also removed the first_value_only argument completely. We can handle this compat layer as shown in the PR. It makes a consistent API to other request properties.

No surprises, removing the function call and additional conditional check is the most performant.

Result

200000 loops, best of 5: 1.08 usec per loop
def parse_cookie(raw: str):
    cookies: Dict[str, List] = {}

    for token in raw.split(";"):
        name, __, value = token.partition("=")
        name = name.strip()
        value = value.strip()

        if not name:
            continue

        if _COOKIE_NAME_RESERVED_CHARS.search(name):
            continue

        if len(value) > 2 and value[0] == '"' and value[-1] == '"':
            value = http_cookies._unquote(value)

        if name in cookies:
            cookies[name].append(value)
        else:
            cookies[name] = [value]

    return cookies

Conclusion: There is not much more optimization worth squeezing out at this time. Could it be better? Perhaps. But for the purpose of this PR it is good enough and already a huge improvement over the existing implementation.

  • It has a more consistent API
  • It more correctly handles edge cases
  • Is clearly faster (1.08 usec per loop v 6.33 nsec per loop for SimpleCookie)

@ahopkins ahopkins marked this pull request as ready for review March 14, 2023 22:03
@ahopkins ahopkins requested a review from a team as a code owner March 14, 2023 22:03
@codecov
Copy link

codecov bot commented Mar 15, 2023

Codecov Report

Patch coverage: 96.907% and project coverage change: +0.108 🎉

Comparison is base (61aa16f) 88.788% compared to head (3a8d4e1) 88.896%.

❗ Current head 3a8d4e1 differs from pull request most recent head 3f0c8cb. Consider uploading reports for the commit 3f0c8cb to get more accurate results

Additional details and impacted files
@@              Coverage Diff              @@
##              main     #2706       +/-   ##
=============================================
+ Coverage   88.788%   88.896%   +0.108%     
=============================================
  Files           87        92        +5     
  Lines         6868      7007      +139     
  Branches      1179      1195       +16     
=============================================
+ Hits          6098      6229      +131     
- Misses         530       533        +3     
- Partials       240       245        +5     
Impacted Files Coverage Δ
sanic/log.py 100.000% <ø> (ø)
sanic/response/types.py 84.302% <50.000%> (-0.404%) ⬇️
sanic/cookies/request.py 95.918% <95.918%> (ø)
sanic/cookies/response.py 96.666% <96.666%> (ø)
sanic/request/form.py 98.245% <98.245%> (ø)
sanic/cookies/__init__.py 100.000% <100.000%> (ø)
sanic/request/__init__.py 100.000% <100.000%> (ø)
sanic/request/parameters.py 100.000% <100.000%> (ø)
sanic/request/types.py 94.219% <100.000%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ahopkins ahopkins requested a review from Tronic March 19, 2023 13:40
sanic/cookies/response.py Outdated Show resolved Hide resolved
tests/test_cookies.py Outdated Show resolved Hide resolved
@Tronic
Copy link
Member

Tronic commented Mar 20, 2023

I think I looked through all the code. Still need to do some actual test runs prior to approval. Will do that later today.

@Tronic
Copy link
Member

Tronic commented Mar 20, 2023

I am getting slow item access on CookieRequestParameters (p):

>>> %timeit p = CookieRequestParameters(parse_cookie("foo=123; bar=xxx"))
1.01 µs ± 3.25 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

>>> %timeit p.foo
1.58 µs ± 5.06 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

>>> %timeit p["foo"]
2.74 µs ± 4.75 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

>>> %timeit dict.__getitem__(p, "foo")
35.2 ns ± 0.123 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

This appears to be mainly because of try-except catching multiple times before it looks up the bare cookie. Using __Host-foo instead it gets considerably faster, although could still be better:

>>> %timeit p.foo
796 ns ± 3.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Can we optimize this a bit more? It is not critical for req/s but it feels like we could get some fairly easy improvement for request cookie access time.

@Tronic
Copy link
Member

Tronic commented Mar 20, 2023

Adding cookies is also slow:

>>> %timeit CookieJar(MultiDict()).add_cookie("foo", 123, host_prefix=True)
4.29 µs ± 15.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Out of that about 3.9 µs is due to add_cookie; construction is fast. Looks like it is the Cookie class that is the culprit:

>>> %timeit Cookie("foo", 123, host_prefix=True)
3.63 µs ± 27.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

@Tronic
Copy link
Member

Tronic commented Mar 20, 2023

Other than that one change request, I didn't find other functional issues. It would be nice if something could be done about performance here and now, but we can certainly get back to that in 23.6 too.

@ahopkins
Copy link
Member Author

ahopkins commented Mar 20, 2023

Can we optimize this a bit more? It is not critical for req/s but it feels like we could get some fairly easy improvement for request cookie access time.

Yes. Thanks for taking the time to put this together. Stay tuned.

@ahopkins
Copy link
Member Author

I was about to speedup the response cookies (Cookie constructor) by using update for most of the values. This was about a 2x improvement for me.

Response cookies might be a bit more tricky. Removing the try/except pattern was only a marginal (and perhaps not even statistically significant) improvement. It might need a different strategy, or, we come up with a different plan completely. But, I really do like that we abstract away all of the prefixes. That seems like a good feature add that I do not want to give up. I will try and play with it some more, but if there's no obvious 2x style improvement like for the request cookies, I would be inclined to leave it and come back in another round.

@ahopkins
Copy link
Member Author

ahopkins commented Mar 20, 2023

I did some more tests on a couple alternatives. One that uses get() in series, and the other that does a in check. Both performed worse than try/except. Perhaps we can add a get_raw or some other method that would bypass the prefix checks. I am currently of the opinion that we leave this and iterate on after this.

@ahopkins ahopkins requested a review from a team March 20, 2023 21:41
@ahopkins ahopkins merged commit 1a63b9b into main Mar 21, 2023
@ahopkins ahopkins deleted the cookie-method branch March 21, 2023 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants