Connection not closed when request is cancelled #253

nbraem · 2015-01-21T22:40:26Z

When making a lot of concurrent client requests to the same host using connection pooling, TCPConnector will create a lot of connections. When cancelling these client requests, not all connections are closed.

This is not easy to reproduce. I have a long running server process, and I can see the open sockets increasing over time.

Here's a gist that reproduces the conditions. When the code prints DONE, leave the process running and look at the open sockets (e.g. lsof -p | grep http). After the keepalive_timeout of 30 seconds, most connections are closed, but occasionally (1 in 5 runs for me), there is a dangling socket. You may need to change the sleep timeout for cancellation depending on how fast your connection is:
https://gist.github.com/nbraem/0e5178288ffd94372062

I have a fix, I'll send a pull request shortly.

fafhrd91 · 2015-01-21T22:50:57Z

try to add "yield from r.release()" after this line "r = yield from request('get', 'http://yahoo.com', connector=connector)"

fafhrd91 · 2015-01-21T22:51:43Z

here is proper make_request() function:

    @asyncio.coroutine
    def make_request():
        r = yield from request('get', 'http://yahoo.com', connector=connector)
        yield from r.release()
        finished.append(r)

nbraem · 2015-01-21T23:02:31Z

Same thing.

nbraem · 2015-01-22T09:38:50Z

Ran the fix overnight, here are graphs of the open sockets before and after the fix.

Before:

After:

kxepal · 2015-01-22T16:01:07Z

Cannot reproduce the issue. Well, actually I can create connections leak by using high request rate when connections are opens much faster than OS is able to clean them. Using yield from r.release() doesn't closes the connection, but turn it into TIME_WAIT state where it remains for some time until your OS will finally close it. That's ok. By explicit call r.close() you terminates the connection completely pushing it to CLOSED state. The proposed fix doesn't change the whole picture how me: nothing leaks.

P.S. aiohttp 0.14.1

nbraem · 2015-01-22T18:54:55Z

Ok, I'll see if I can make a new gist that can reproduce it, that resembles my setup more closely.

nbraem · 2015-01-22T20:01:06Z

Ok, here's a gist that more closely reproduces my problem, using this code, I can reproduce it every time:
https://gist.github.com/nbraem/05972f50d22d63869796

It creates an origin server and a proxy server. The proxy server will create many connections to the origin server. When it blows up, sockets are left open.

I hammer the proxy server like so:

ab -n 10000000 -c 1000 http://127.0.0.1:1234/

Thanks a lot for trying to help me, I really appreciate it!

fafhrd91 · 2015-01-22T20:08:09Z

i think this is related keep-alive, by default TCPConnector uses keepalive_timeout=30, try to reduce it to 1

nbraem · 2015-01-22T20:17:44Z

I'm callilng response.read() first and then response.release().

Just tried keepalive_timeout=1, same result though. It cleans up some sockets, but not all of them.

fafhrd91 · 2015-01-22T20:20:41Z

if keepalive_timeout helps then it is different problem. you open too many connections, aiohttp does not release connections fast enough. you may need to write custom TCPConnector that opens only limited number of connections, or you can recycle TCPConnector after some amount of processed requests.

nbraem · 2015-01-22T20:36:54Z

No, that's not it. Once the connection between the client (apachebench in this case) and the aiohttp server is lost, the aiohttp server will cancel the current request (throws a CancelledError in the handle_request coroutine), which causes the BaseConnector to lose references to some Connection objects. The underlying sockets of those Connection objects are never closed. I have a fix for that exact case. So if you have a look, you'll see how this can happen: nbraem@73016c4

This is not so far fetched, because connections to an aiohttp server can break and the server might also create a lot of connections itself, e.g. to a database or to a cache.

You are totally right that TCPConnector should have a configurable maximum of open connections per host, which would also solve my problem. I was planning on also doing that. I think it would be a great addition to aiohttp, because of the database connection use case.

fafhrd91 · 2015-01-22T20:42:53Z

i'm not sure why removing this line "self._conns.pop(key, None)" helps.

nbraem · 2015-01-22T20:54:11Z

I have a second commit in the pull request which is better: nbraem@5dd86d0

In the case I'm describing, self._conns[key] will contain an array of hundreds of open connections to the same host. But if one of them satisfies this condition:

if should_close or (reader.output and not reader.output.at_eof()):

Then the entire array of connections is popped, and the cleanup task will never be able to close the sockets (transports) when the keepalive_timeout hits. So you've lost the reference to the sockets, and they're never closed.

fafhrd91 · 2015-01-22T20:57:16Z

Ok, that's better

On Thursday, January 22, 2015, Nicolas Braem [email protected]
wrote:

I have a second commit in the pull request which is better: nbraem/aiohttp@5dd86d0
nbraem@5dd86d0

In the case I'm describing, self._conns[key] will contain an array of
hundreds of open connections to the same host. But if one of them satisfies
this condition:

if should_close or (reader.output and not reader.output.at_eof()):

Then the entire array of connections is popped, and the cleanup task will
never be able to close the sockets (transports) when the keepalive_timeout
hits. So you've lost the reference to the sockets, and they're never closed.

—
Reply to this email directly or view it on GitHub
#253 (comment).

connection leak fix for #253

asvetlov · 2015-01-23T07:18:35Z

Grr. I would like to have at lease one unittest for any bugfix, especially for very subtle one as the issue.

I've done it in d5e4d6a but please keep our test suite strict.

asvetlov · 2015-01-23T07:27:13Z

Oops, sorry @fafhrd91
I've missed your 54c5423

Issues aio-libs#253 and aio-libs#254 implemented a `_conns` key evince logic in the function that actually **adds** items to `_conns` Issue aio-libs#406 tweaked this logic even more, making early and eviction of reusable items in the pool possible. Here we put the key eviction logic where it belongs: in the method that **removes** items from the `_conns` pool.

Issues #253 and #254 implemented a `_conns` key evince logic in the function that actually **adds** items to `_conns` Issue #406 tweaked this logic even more, making early and eviction of reusable items in the pool possible. Here we put the key eviction logic where it belongs: in the method that **removes** items from the `_conns` pool.

lock · 2019-10-29T20:01:53Z

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

nbraem mentioned this issue Jan 21, 2015

connection leak fix for #253 #254

Merged

fafhrd91 closed this as completed in #254 Jan 22, 2015

fafhrd91 added a commit that referenced this issue Jan 22, 2015

Merge pull request #254 from nbraem/master

07cd477

connection leak fix for #253

asvetlov added a commit that referenced this issue Jan 23, 2015

Clarify issue #253 in the code, add tests

d5e4d6a

This was referenced Dec 3, 2018

client: GET request continues after cancelling the task #3426

Closed

Cannot properly close ws connection in one-sided way. #3443

Closed

lock bot added the outdated label Oct 29, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection not closed when request is cancelled #253

Connection not closed when request is cancelled #253

nbraem commented Jan 21, 2015

fafhrd91 commented Jan 21, 2015

fafhrd91 commented Jan 21, 2015

nbraem commented Jan 21, 2015

nbraem commented Jan 22, 2015

kxepal commented Jan 22, 2015

nbraem commented Jan 22, 2015

nbraem commented Jan 22, 2015

fafhrd91 commented Jan 22, 2015

nbraem commented Jan 22, 2015

fafhrd91 commented Jan 22, 2015

nbraem commented Jan 22, 2015

fafhrd91 commented Jan 22, 2015

nbraem commented Jan 22, 2015

fafhrd91 commented Jan 22, 2015

asvetlov commented Jan 23, 2015

asvetlov commented Jan 23, 2015

lock bot commented Oct 29, 2019

Connection not closed when request is cancelled #253

Connection not closed when request is cancelled #253

Comments

nbraem commented Jan 21, 2015

fafhrd91 commented Jan 21, 2015

fafhrd91 commented Jan 21, 2015

nbraem commented Jan 21, 2015

nbraem commented Jan 22, 2015

kxepal commented Jan 22, 2015

nbraem commented Jan 22, 2015

nbraem commented Jan 22, 2015

fafhrd91 commented Jan 22, 2015

nbraem commented Jan 22, 2015

fafhrd91 commented Jan 22, 2015

nbraem commented Jan 22, 2015

fafhrd91 commented Jan 22, 2015

nbraem commented Jan 22, 2015

fafhrd91 commented Jan 22, 2015

asvetlov commented Jan 23, 2015

asvetlov commented Jan 23, 2015

lock bot commented Oct 29, 2019