-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decode headers as UTF-8 also in ASGI #2606
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #2606 +/- ##
=============================================
- Coverage 88.617% 88.539% -0.079%
=============================================
Files 87 87
Lines 6844 6841 -3
Branches 1178 1176 -2
=============================================
- Hits 6065 6057 -8
- Misses 537 539 +2
- Partials 242 245 +3
... and 5 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On hold until v23
This should probably continued. Needs some practical testing with ASGI servers to see how they behave, but the likely outcome is that we need to use UTF-8 on that side as well, the same as with the integrated server. Two points of interest:
I can help run those tests once I get my other PRs sorted out, but @ChihweiLHBird feel free to do your own tests if you can. If it looks like that UTF-8 is going to work identically with ASGI as it does with the Sanic server, we are all good to implement UTF-8 on ASGI as well. |
@Tronic Sure, I can try to test it. |
@ChihweiLHBird Could you get this finished? We had a discussion and want to have this in the upcoming 23.3 release. |
@Tronic Sorry, I was busy last few days. I am on it right now. |
@Tronic Just done some manual testing, and it seems for the test cases I tried, there is not an issue for header decoding. If it uses UTF-8 decoding, it will successfully decode emojis and other UTF-8 characters but failed to decode some characters in latin-1 which isn't in UTF-8, like Tried |
Could you also This will require some tests that reach those errors. |
Note on further changes: (not necessarily in this PR)
After these changes all three modes should behave more closely to each other, and be stricter in RFC compliance. |
PR #2710 now implements charsets in URL handling, so that should not be addressed here. But implement error handling for headers as instructed above. |
It will stay laxer with header names. The entire header is decoded at once because that is faster than decoding each field separately, and thus we cannot check that they are ASCII (without performance penalty). |
04940ef
to
da9eb50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work :)
@ahopkins May I have your review? |
Closes #2604