-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Headers are unnecessary encoded to ByteString
if there is no Transfer-Encoding: chunked
header
#42579
Comments
@nodejs/http |
Hello. You are correct in saying the that When TE is not present, the headers are sent with the data but there is no encoding specified and Node defaults to UTF-8. However, the problem is not in how Node processes the headers but in the value it self. No spec mandates (at least for what I could find) that non UTF-8 should be converted as latin1/binary and then the client should/might reinterpret them. My suggestion is to encode them using RFC8187 (which you already have in your code when I have filed a PR that explicit this in the docs. |
For RFC8187:
@ShogunPanda we should use |
I see that PHP-based servers work this way — they encode headers (it fact, it's only needed for non- It's not something new. Why I can't do the same with node.js?
However, it should work even without this. |
So, it feels like a bug. Clients expect headers as is — as Is it not a bug? |
Yes, |
Here's the thing. The client expect ASCII only. Not binary. The big difference is that technically the RFC7230 only allows US-ASCII (so from 0 to 127) while we're using latin1 (0 to 255). The client might not understand the extension at all.
In my opinion supporting that is a deviation from the spec.
That's another thing. Once again, I'd like to remark that RFC8187 is the newer standard available (2018), built on top of older RFCs. Consider this to be correct case and any other client support as hackish, since it's not regulated anywhere. |
PR-URL: nodejs#42624 Fixes: nodejs#42579 Co-authored-by: Antoine du Hamel <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: #42624 Fixes: #42579 Co-authored-by: Antoine du Hamel <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: #42624 Fixes: #42579 Co-authored-by: Antoine du Hamel <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: #42624 Fixes: #42579 Co-authored-by: Antoine du Hamel <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: #42624 Fixes: #42579 Co-authored-by: Antoine du Hamel <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
PR-URL: nodejs/node#42624 Fixes: nodejs/node#42579 Co-authored-by: Antoine du Hamel <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Antoine du Hamel <[email protected]>
Version
v17.5.0
Platform
Windows 10
Subsystem
http
What steps will reproduce the bug?
Run this
http
server:Rock & roll 音楽 («🎵🎶»).txt
name.Rock & roll 音楽 («🎵🎶»).txt
name (in this case "Transfer-Encoding" header will be removed)How often does it reproduce? Is there a required condition?
Always.
What is the expected behavior?
Both files are downloaded with
Rock & roll 音楽 («🎵🎶»).txt
name.What do you see instead?
The first file has the correct name —
Rock & roll 音楽 («🎵🎶»).txt
.The second one has wrong name —
Rock & roll é_³æ¥½ («ð__µð__¶Â»).txt
Additional information
TL;DR
If there is
"Transfer-Encoding: chunked"
header (exactlychunked
)setHeader
works properly, it sets the input header (ByteString
) as is.(Note:
"Transfer-Encoding: chunked"
is set by default.)In any other case it additionally (unnecessary) encodes the header to
ByteString
.So, the header is encoded twice, that is wrong.
Additional info
The most of HTTP headers are contains only ASCII characters. But when you need to put in a header (For example,
"Content-Disposition"
, or any custom header) a string that contains non-ASCII* character(s), you can't just put it in as issetHeader
.For example:
A HTTP header is a Binary String (
ByteString
) —UTF-8
bytes withinString
object.*There is no problem with the headers which contain only ASCII characters, since ASCII charset is subset of
UTF-8
andLatin 1
encodings, sotoByteString(ASCIIString) === ASCIIString
.To get a
ByteString
fromUSVString
you just need to takeUTF-8
bytes from an input string then represent them inLatin 1
(ISO-8859-1
) encoding.For example, in Node.js:
*To be honest, the entire quote of [
ByteString
](https://webidl.spec.whatwg.org/#idl-ByteString:As I can see, a browser also can detect if the string is "just"
8859-1
, notUTF-8
bytes encoded in8859-1
.So, both
"Content-Disposition"
headers arevalid"valid"**:The result in
both"both"** cases is a file with"¡«£»÷ÿ.png"
** name, even while"¡«£»÷ÿ.png" !== toByteString("¡«£»÷ÿ.png")
.UPDATE:
**Using non-UTF-8 bytes ("some other 8-bit-per-code-unit encoding") in
ByteString
is browser/OS language dependent!For example, in Firefox with non-EN language using of
"¡«£»÷ÿ.png"
as is (withouttoByteString()
) results to������.png
filename, instead of¡«£»÷ÿ.png
In Chrome it will be
Ў«Ј»чя.png
for Cyrillic.So, I think it (using of
8859-1
in "usual way") should be highly unrecommended.Headers should always be a
ByteString
with only UTF-8 bytes represented as8859-1
(Latin 1
).Problem
The problem is that I can't correctly set a header that is a
ByteString
(UTF-8
bytes inLatin 1
) if the original string contains non-ASCII characters.Like the other servers do it.
The problem appears only when the
Transfer-Encoding: chunked
header (which is present by default) is removed (or changed).In this case
setHeader
encodes the header to Binary String.That is unnecessary, since it's already a
ByteString
.It's not possible to put in
setHeader
aUSVString
, since in this case it will throwTypeError [ERR_INVALID_CHAR]: Invalid character in header content
error.So, the header is encoded to
"binary"
twice, and browsers download the file with the wrong filenames:Rock & roll é_³æ¥½ («ð__µð__¶Â»).txt
instead ofRock & roll 音楽 («🎵🎶»).txt
.You can open the demo server with disabled
Transfer-Encoding: chunked
header (http://localhost:8000/?te=0 ) and check it:The header is encoded twice!
Examples
A lot of forums encodes headers such way for the attached files (XenForo, vBulletin, for example).
The real life examples:
Oh, wait, it requires an account, if you don't have/(want to create an account), just use my demo server.
Anyway, just look at the screenshots below.
In the browser console you can verify that header are
ByteString
:As a bonus, here is an example of Java server made with
ServerSocket
which also works properly:Main.java
The text was updated successfully, but these errors were encountered: