Support unicode characters in authentication header #212

nikos · 2014-04-22T12:08:04Z

If a unicode char (here for example german umlaut ö = 0xc3), is part of the authentication header an error is thrown:

    $ http -a test:654ö21 example.com

    http: error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)

I am using Python 2.7.3 on a plain Ubuntu 12.04.4 LTS system.

sigmavirus24 · 2014-04-22T13:18:24Z

There was a similar issue raised against requests (https://github.com/kennethreitz/requests/issues/1926) 2 months ago. The important part from that is: @Lukasa's comment. In short: RFC 2616 only allows for characters in the Latin-1 encoding, if you want to pass unicode characters as part of a header value there are two options:

You yourself turn the unicode into a string of octets
httpie is modified to do 1. for you.

In short, this is not actually a bug in the implementation as we are being 100% compliant with the RFC.

sigmavirus24 · 2014-04-22T15:46:33Z

Actually, there's a third option that httpie can consider: the requests-toolbelt is considering adding functionality to handle this for users of requests.

If anyone's interested in contributing to this effort, please continue the discussion over there.

jkbrzt · 2014-04-24T10:31:17Z

Hm, and what about simply using UTF8? That seems to work for Opera.

http://stackoverflow.com/questions/702629/utf-8-characters-mangled-in-http-basic-auth-username

// Btw, thank you @sigmavirus24 for so often providing useful upstream context for HTTPie issues. It's very helpful 👍

Lukasa · 2014-04-24T10:34:53Z

Opera does it, but no-one else does. From the same question:

IE uses the default codepage.
Mozilla uses only the lower byte of character codepoints, which has the effect of encoding to ISO-8859-1 and mangling the non-8859-1 characters irretrievably... except when doing XMLHttpRequests, in which case it uses UTF-8
Safari and Chrome encode to ISO-8859-1, and fail to send the authorization header at all when a non-8859-1 character is used.

The real fix here was pointed out in IRC, which is this draft RFC coming out of the HTTPbis. When this draft becomes a standard, I'll happily implement support for it in requests.

jkbrzt · 2014-04-24T10:46:44Z

@Lukasa I see. It looks like the best solution (for HTTPie anyway) would be to fail with an informative message in case of non-ascii characters in basic auth credentials.

@nikos is there another HTTP client (CLI, web browser) which allows you to log in with these credentials?

sigmavirus24 · 2014-04-24T13:36:04Z

@jkbr I agree.

There is another user-agent that allows you to use UTF-8 (as can be discovered in the requests issue I linked): cURL. The problem as I see it is that if you just read the introduction to the draft RFC that @Lukasa linked, this is not really universally supported behaviour.

cURL does the following:

$  curl -u'foobar:abcö2' https://httpbin.org/get
{
  "url": "http://httpbin.org/get",
  "headers": {
    "User-Agent": "curl/7.30.0",
    "Accept": "*/*",
    "Authorization": "Basic Zm9vYmFyOmFiY8O2Mg==",
    "Connection": "close",
    "X-Request-Id": "48556e34-492b-4d58-b164-37cc8f9eb6e7",
    "Host": "httpbin.org"
  },
  "origin": "173.229.2.112",
  "args": {}
}

If you decode the parameter (using Python's base64 library) to the Basic authorization, you get: foobar:abc\xc3\xb62. If you use *nix's base64 command-line util, you get the original string back.

jkbrzt · 2014-04-24T14:22:20Z

@sigmavirus24 It looks like using UTF-8 & printing a warning message is the most pragmatic way to go. HTTPie users are likely to have previously used cURL.

sigmavirus24 · 2014-04-24T14:39:25Z

@jkbr I'm afraid that likely will not work:

>>> auth
('foobar', 'abc\xc3\xb62')
>>> ('%s:%s' % auth).encode('latin1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)

That's roughly what requests does when you pass it the auth tuple. If you want to support this, you may have to construct the header yourself:

>>> auth
('foobar', 'abc\xc3\xb62')
>>> base64.b64encode('%s:%s' % auth)
'Zm9vYmFyOmFiY8O2Mg=='

Given the vagueness of the specification around the basic authentication header, I wonder if the username/password actually have to be latin-1 encoded before they are base64 encoded. I'll have to research this. We may be able to relax this constraint in requests if so.

Lukasa · 2014-04-24T14:59:57Z

@sigmavirus24 We've already covered this in this discussion repeatedly: the specification provides no guidance as to text encoding because it was written by Americans at a time where text encoding was not a concern. The only thing that's safe is latin1, because that's the only text encoding ever mentioned with respect to headers in HTTP.

There is no "have to" here. Requests can absolutely decide to use UTF-8 if we wanted to, but I guarantee we'll break someone's running code where their webserver assumes that they'll be getting ISO 8859-1 but now start getting multibyte sequences from UTF-8.

Requests has made a choice and I'm pretty happy with it at the moment. Users such as httpie should absolutely feel free to override that choice so long as they're equally aware that they could break currently running code. =)

sigmavirus24 · 2014-04-24T15:04:08Z

@jkbr looks like you have your solution above then ;)

* Immediatelly convert all args from `bytes` to `str`. * Added `Environment.stdin_encoding` and `Environment.stdout_encoding` * Allow unicode characters in HTTP headers and basic auth credentials by encoding them using UTF8 instead of latin1 (#212).

jkbrzt · 2014-04-26T13:48:57Z

It turns out ö is actually part of latin1 and this particular error was a bug in HTTPie. It has been fixed and in addition to that, headers are now UTF8-encoded.

jkbrzt added the bug label Apr 22, 2014

sigmavirus24 mentioned this issue Apr 22, 2014

Request for Comments: Implement RFC 5987 requests/toolbelt#27

Closed

jkbrzt closed this as completed Apr 26, 2014

jkbrzt mentioned this issue Feb 5, 2015

py34 test failure: KeyError: 'Authorization' error in TestSession.test_session_unicode #282

Closed

keyan mentioned this issue Mar 9, 2015

send_file fails when filename contains unicode symbols pallets/flask#1286

Closed

kevin-brown mentioned this issue May 12, 2015

Invalid Unicode byte in Authorization token raises an DjangoUnicodeDecodeError encode/django-rest-framework#2928

Closed

santiagobasulto mentioned this issue May 10, 2018

Fixed #20147 -- Added request.headers. django/django#9925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support unicode characters in authentication header #212

Support unicode characters in authentication header #212

nikos commented Apr 22, 2014

sigmavirus24 commented Apr 22, 2014

sigmavirus24 commented Apr 22, 2014

jkbrzt commented Apr 24, 2014

Lukasa commented Apr 24, 2014

jkbrzt commented Apr 24, 2014

sigmavirus24 commented Apr 24, 2014

jkbrzt commented Apr 24, 2014

sigmavirus24 commented Apr 24, 2014

Lukasa commented Apr 24, 2014

sigmavirus24 commented Apr 24, 2014

jkbrzt commented Apr 26, 2014