Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

moved: Body parsing bug due to special characters/encoding? #7

Open
tj opened this issue Apr 29, 2011 · 2 comments
Open

moved: Body parsing bug due to special characters/encoding? #7

tj opened this issue Apr 29, 2011 · 2 comments

Comments

@tj
Copy link
Owner

tj commented Apr 29, 2011

I was working on a bookmarklet that, among other things, form-posts the title of whatever page you're on to my server running Express, and I'm seeing Connect's body parser choke on some pages from Amazon.

Here's a super simple test case:

https://gist.github.com/947895

Run that website locally, drag the bookmarklet to your toolbar, and click it on any of the provided Amazon links. You should see an error message like this one:

URIError: URI malformed
    at decodeURIComponent (native)
    at /usr/local/lib/node/.npm/qs/0.1.0/package/lib/querystring.js:28:18
    at Array.reduce (native)
    at /usr/local/lib/node/.npm/qs/0.1.0/package/lib/querystring.js:27:6
    at IncomingMessage.<anonymous> (/usr/local/lib/node/.npm/connect/1.3.0/package/lib/middleware/bodyParser.js:74:15)
    at IncomingMessage.emit (events.js:61:17)
    at HTTPParser.onMessageComplete (http.js:132:23)
    at Socket.ondata (http.js:1007:22)
    at Socket._onReadable (net.js:677:27)
    at IOWatcher.onReadable [as callback] (net.js:177:10)

This happens on Amazon pages where the title has special characters, like é or ü. You can change the title of an Amazon page (e.g. by setting document.title in the console) to just é, for example, and it will cause the bug.

I've done some investigating and can give you some more info, but at a high level, it seems that the browser in this case encodes the form differently than encodeURIComponent() does, which causes decodeURIComponent() — used by Connect's body parser — to choke.

For example, calling encodeURIComponent() on that é yields %C3%A9 everywhere, but what the server receives in the form body from these Amazon pages is %E9. Attempting to decodeURIComponent() on %E9 causes this error.

I tried making a sample page for this, but the form post matched encodeURIComponent(). I'm guessing the behavior on Amazon is related to encoding, but I haven't been able to confirm, maybe because Express sends a Content-Type header that specifies utf-8.

All said, it seems that Connect's body parser shouldn't break on these encodings. Hope this info helps. Thanks!

@tj
Copy link
Owner Author

tj commented Apr 29, 2011

^ moved from senchalabs/connect

@hokaccha
Copy link

I also had a similar case. When POST with Shift_JIS, decodeURIComponent cannot decode.

Because decodeURIComponent use only UTF-8. Other charset should use an appropriate function.

For example, This is Shift_JIS decoder library.
http://lightbox.on.coocan.jp/ecl_new.txt

How about such a code?
hokaccha/connect@1f7c870
hokaccha@8c0d514

then,

var express = require('express');
express.bodyParser.qs.decoder = UnescapeSJIS;
...

But, ISO-8859-1 decoder was not able to be found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants