Skip to content
This repository has been archived by the owner on Jan 13, 2021. It is now read-only.

Implement HTTP/1.1 support. #75

Closed
Lukasa opened this issue Aug 29, 2014 · 12 comments
Closed

Implement HTTP/1.1 support. #75

Lukasa opened this issue Aug 29, 2014 · 12 comments

Comments

@Lukasa
Copy link
Member

Lukasa commented Aug 29, 2014

Over time it has become increasingly clear that httplib/http.client are a liability to requests/urllib3. Given that we'll need a HTTP/1.1 stack and that httplib is a liability, we should look into writing a new one.

This issue is a long-term goal, but is open to track the desired features from such a rewrite. This should be a list of mistakes that httplib has made that we should not make.

Initial list:

  • Configurable file-like-object streaming size. httplib streams file objects much slower than cURL does because it uses quite small chunks. We should make this faster.
  • Support for 1XX responses.
  • Better API.
  • Support for not ruining file-objects when connections fall apart.
  • No support for HTTP/1.0 or earlier.
  • A clean mechanism for Upgrade:, such that the socket is provided in a known-good state along with information about the connection.
  • Better separation of socket and parser logic.

/cc @sigmavirus24 @shazow for more ideas.

@dimaqq
Copy link

dimaqq commented Aug 29, 2014

IIRC requests uses urlilib3 for that very purpose -- for example httplib.HTTPConnection reads response header one byte at a time, and requests (via urllib3) 8K at a time.

In other words, why not use urllib3?

@Lukasa
Copy link
Member Author

Lukasa commented Aug 29, 2014

@dimaqq You have the abstraction layer backwards.

The hierarchy is supposed to be: requests -> urllib3 -> hyper. urllib3 should build on top of us, not the other way around.

Note that because of its reliance on httplib urllib3 is subject to all the limitations I mentioned above.

@dstufft
Copy link

dstufft commented Aug 29, 2014

Here's a thing, httplib is poorly factored. There is basically zero reason why a http library should have it's connection management/socket code entwined with it's HTTP parser.

@piotr-dobrogost
Copy link

Linking with urllib3/urllib3/issues/58 as closely related.

@Lukasa
Copy link
Member Author

Lukasa commented Feb 18, 2015

Ok, here's a proposed basic design principle.

HTTP/1.1 can be thought of as a special-case of HTTP/2, with the following limitations:

  • Max concurrent streams forced to 1.
  • Header frames 'compress' into linewise output.
  • No frame headers.

This means we can conceptually implement HTTP/1.1 by having a special-case frame renderer. That allows the middle and top layers to be protocol-version agnostic, thinking in terms of streams, while the bottom layer simply changes how the data is rendered out. There are some requirements at the higher level to enforce certain behaviours (max concurrent streams etc.), but this represents probably the cleanest way to support both versions in the codebase, while allowing for transparent protocol version change.

@Lukasa
Copy link
Member Author

Lukasa commented Mar 1, 2015

@shazow Do you have any thoughts about what an ideal httplib replacement API would be?

@shazow
Copy link

shazow commented Mar 1, 2015

Aside from how you build/call requests, one big painpoint of httplib is the lack of clear granular state of a given connection/request at any given time, and poorly structured errors.

But yea, I agree with @dstufft, it would be best if there was a way to give some socket-like object and be like "ok treat this as HTTP v1.1, make request X to here" then have it let go of the object once the request is done.

@Lukasa
Copy link
Member Author

Lukasa commented Mar 1, 2015

What's the rationale for having the socket object not be owned by some kind of 'HTTP connection' object?

@shazow
Copy link

shazow commented Mar 1, 2015

It can be owned by whatever if it makes sense for it to be in that specific context, but if you're managing your own sockets (e.g. urllib3) then all you want is something that knows the protocol you want to speak (http v1.1 or whatever).

The layers you have:

  1. Socket setup and configuration (tls wrapping, etc)
  2. Connection pooling
  3. Retrying/timeouts/error handling
  4. Request building and response reading
  5. HTTP protocol

If in order to use 5 you need to relinquish 1 (and maybe others), then it makes the stuff in between very hard.

Also bonus points if it works with things other than sockets, for testing and other novel usecases.

@Lukasa
Copy link
Member Author

Lukasa commented Mar 1, 2015

Ok, so here's some notes.

Socket setup and configuration needs to be HTTP specific because of HTTP/2. Socket setup for TLS connections determines which type of HTTP can be spoken over that connection ahead of time. 1 and 5 are therefore tightly matched. All hyper's connection objects should be able to take socket objects provided from elsewhere, but they will always prefer to create them themselves.

I feel like the right abstraction here is for higher layers to manage connections, not sockets. A connection in this case is the local end of a HTTP state machine and its underlying transport (whatever that is). The state machine itself should know relatively little about the underlying transport: at most, it should believe it has send and recv methods (additional abstraction layers can be inserted if necessary).

What matters here, I think, is that hyper have a very clear semantic of what the transport layer should be for any connection (in terms of its API). Currently, that is fairly well defined: it needs a method called send, a method called recv, and a method called readline. This looks sockety, but is actually trivially defined in terms of other things (file wrappers, in-memory buffers, etc.), with the trickiest part being readline, which is really necessary so that the upper layers don't have to buffer data in order to work out where the hell a header line ends.

This would allow libraries like urllib3 to override the socket if necessary. However, once a socket has been handed to a hyper connection it really does need to own it from that point onward, because there is state inextricably tied up with it. What we should do is signal unambiguously whether a connection can safely be re-used or not, to make it easier to pool connections.

@Lukasa
Copy link
Member Author

Lukasa commented Mar 10, 2015

Alright, see #92.

@Lukasa
Copy link
Member Author

Lukasa commented Apr 3, 2015

Merged! \o/

@Lukasa Lukasa closed this as completed Apr 3, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants