-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add convenience methods for cookie creation and deletion #2706
Conversation
I noticed that the escaping of cookie values is based on a non-standard method of octal escapes used in Python's SimpleCookie and also flask/werkzeug. This is fine, but the code from SimpleCookie used in Sanic apparently has a bug as it operates on Unicode codepoints while it looks like it was intended to operate on bytes (as werkzeug does). As a result, it escapes all of 128-255 but doesn't touch other Unicode code points. Werkzeug has fixed this bug and escapes everything that is not ASCII. However, since browsers are happy with UTF-8 as is, we actually only need to escape a few ASCII characters. This is accomplished by using this mapping instead of the current code: _Translator = {ch: f"\\{ch:03o}" for ch in bytes(range(32)) + b'";\\\x7F'} The above only escapes ASCII control characters, punctuation that cannot occur in cookie value even when quoted, and the backslash character that is used in this escaping scheme. Latin range appears in It is remains recommended that applications use URL encoding (%-escaped UTF-8) for their textual cookie values or base64 for binary data, avoiding any potential incompatibilities with this custom scheme. |
I spent a lot of time debating various strategies for the
Here is some of the thought process and benchmarking: I was heading in the direction of providing a regular (single value) dict as It would look something like this: class Request:
...
cookies = property(get_cookies)
def get_cookies(self, first_value_only=True):
cookie = self.headers.getone("cookie", "")
return parse_cookie(cookie, first_value_only=first_value_only) This would yield: >>> print(request.cookies["foo"])
bar
>>> print(request.cookies.get("foo"))
bar
>>> print(request.get_cookies(False).getlist("foo"))
['bar', 'some other stuff too'] With this in mind, I did some benchmarking on various solutions. Before showing the benchmarked implementations, here is a common function among the results: def _extract(token):
name, __, value = token.partition("=")
name = name.strip()
value = value.strip()
if not name or _COOKIE_NAME_RESERVED_CHARS.search(name):
return None
if len(value) > 2 and value[0] == '"' and value[-1] == '"':
value = http_cookies._unquote(value)
return name, value First, was a test using walrus (which we cannot use in 3.7) so that the Result
def parse_cookie(raw: str, first_value_only: bool = True):
tokens = raw.split(";")
if first_value_only:
return dict(e for token in reversed(tokens) if (e := _extract(token)))
cookies: Dict[str, List] = {}
for token in tokens:
name, value = _extract(token)
if name in cookies:
cookies[name].append(value)
else:
cookies[name] = [value]
return cookies The next implementation tried does a similar thing, except it keeps everything in the forloop. This performed consistently better than the previous (either with or without the walrus) Result
def parse_cookie(raw: str, first_value_only: bool = True):
tokens = raw.split(";")
cookies: Dict[str, List] = {}
for token in tokens:
e = _extract(token)
if not e:
continue
name, value = e
if first_value_only:
if name in cookies:
continue
cookies[name] = value
elif name in cookies:
cookies[name].append(value)
else:
cookies[name] = [value]
return cookies Using Result
def _extract(token):
name, __, value = token.partition("=")
name = name.strip()
value = value.strip()
if not name or _COOKIE_NAME_RESERVED_CHARS.search(name):
return None
if len(value) > 2 and value[0] == '"' and value[-1] == '"':
value = http_cookies._unquote(value)
return name, value
def parse_cookie_md(raw: str):
tokens = raw.split(";")
return MultiDict(e for token in tokens if (e := _extract(token))) The final test I did took the idea of the forloop only and put the No surprises, removing the function call and additional conditional check is the most performant. Result
def parse_cookie(raw: str):
cookies: Dict[str, List] = {}
for token in raw.split(";"):
name, __, value = token.partition("=")
name = name.strip()
value = value.strip()
if not name:
continue
if _COOKIE_NAME_RESERVED_CHARS.search(name):
continue
if len(value) > 2 and value[0] == '"' and value[-1] == '"':
value = http_cookies._unquote(value)
if name in cookies:
cookies[name].append(value)
else:
cookies[name] = [value]
return cookies Conclusion: There is not much more optimization worth squeezing out at this time. Could it be better? Perhaps. But for the purpose of this PR it is good enough and already a huge improvement over the existing implementation.
|
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #2706 +/- ##
=============================================
+ Coverage 88.788% 88.896% +0.108%
=============================================
Files 87 92 +5
Lines 6868 7007 +139
Branches 1179 1195 +16
=============================================
+ Hits 6098 6229 +131
- Misses 530 533 +3
- Partials 240 245 +5
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
I think I looked through all the code. Still need to do some actual test runs prior to approval. Will do that later today. |
I am getting slow item access on CookieRequestParameters (p): >>> %timeit p = CookieRequestParameters(parse_cookie("foo=123; bar=xxx"))
1.01 µs ± 3.25 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit p.foo
1.58 µs ± 5.06 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit p["foo"]
2.74 µs ± 4.75 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>>> %timeit dict.__getitem__(p, "foo")
35.2 ns ± 0.123 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) This appears to be mainly because of >>> %timeit p.foo
796 ns ± 3.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Can we optimize this a bit more? It is not critical for req/s but it feels like we could get some fairly easy improvement for request cookie access time. |
Adding cookies is also slow: >>> %timeit CookieJar(MultiDict()).add_cookie("foo", 123, host_prefix=True)
4.29 µs ± 15.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) Out of that about 3.9 µs is due to >>> %timeit Cookie("foo", 123, host_prefix=True)
3.63 µs ± 27.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) |
Other than that one change request, I didn't find other functional issues. It would be nice if something could be done about performance here and now, but we can certainly get back to that in 23.6 too. |
Yes. Thanks for taking the time to put this together. Stay tuned. |
I was about to speedup the response cookies ( Response cookies might be a bit more tricky. Removing the try/except pattern was only a marginal (and perhaps not even statistically significant) improvement. It might need a different strategy, or, we come up with a different plan completely. But, I really do like that we abstract away all of the prefixes. That seems like a good feature add that I do not want to give up. I will try and play with it some more, but if there's no obvious 2x style improvement like for the request cookies, I would be inclined to leave it and come back in another round. |
I did some more tests on a couple alternatives. One that uses |
Background
For years I have been meaning to add this PR ...
Creating and deleting cookies in Sanic requires a rather bizarre construct because of the implementation as a modified dictionary object.
That just plain looks weird and is not intuitive.
Solution
This PR aims to fix this by making these objects feel more Pythonic with methods, and descriptors.
As shown, you can either add properties in the convenience method, or to the returned
Cookie
object. This is fully backwards compat.The other thing this PR addresses is a similar method for deletions:
It also adds a couple more methods:
Backwards compat
This attempts to be backwards compatible with a move towards deprecating the overloaded
dict
style implementation. We will continue to support__getitem__
and__setitem__
after we remove the rest of thedict
support with an ongoing notice to move to the new style.Changes
This introduces
secure=True
andsamesite="lax"
as defaults. It also adds support for a missing cookie property (partitioned=True
), and adds explicit support and checking for adding__Host-
and__Secure-
style prefixes.Additionally, this adds property style accessors similar to headers:
TODO