Flow control support for large transfers #19

gholt · 2014-03-22T05:16:31Z

Do you folks have a mailing list or IRC channel for aiohttp discussions? Or is creating Issues good enough?

My current thoughts lie in the aiohttp.client area. The HttpResponse looks to need some love with respect to large responses. It seems it would load a 5G download entirely into memory first if read() is called. Using content.read() directly is certainly okay, but I wonder if you have plans to improve this area already?

On the flip side, what about sending large requests? I'm pretty new to all of this (asyncio and aiohttp) so forgive me if this is obvious stuff to you folks. I've tried setting data to a file-like object from the standard open(path) but it just hangs for me. If I pass open(path).read() instead it works, but of course that's bad if it's a huge file.

Just to give some context, I'm working on an SDK to work as a client against OpenStack Swift / Rackspace Cloud Files, something I'm quite familiar with ;) as I've been working on that project for years now, but only in the Python 2.6 and 2.7 realm. This is my first real attempt with Python 3, specifically 3.4.

fafhrd91 · 2014-03-24T16:37:17Z

Hey Greg,

i think it is too early to make mailing list for aiohttp, i dont want to create another dead mailing list.
regarding big responses. we are using aiohttp for aws s3 integration, it works quite well. here is example of how to download big file:

def coro():
    resp = yield from aiohttp.request('get', 'http://...')

    with open('f.txt', 'wb') as f:
        while True:
            try:
                chunk = yield from resp.content.read()
                f.write(chunk)
            except aiohttp.EofStream:
                pass

    resp.close()

large requests are also supported, you can pass generator as data parameter:

def coro():
    resp = yield from aiohttp.request(
        'post', 'http://...', data=send_data(fname))
    yield from resp


def send_data(fname):
    with open(fname, 'rb') as f:
        while True:
            chunk = f.read(1024)
            if not chunk:
                break

            yield chunk

you can also chain get request with post request:

def coro():
    get_resp = yield from aiohttp.request('get', 'http://...')

    resp = yield from aiohttp.request(
        'post', 'http://...', data=send_data(get_resp))
    yield from resp


def send_data(resp):
    while True:
        try:
            chunk = yield from resp.content.read()
            yield chunk
        except aiohttp.EofStream:
            break

i think main problem right now is documentation

fafhrd91 · 2014-03-24T16:39:13Z

i created separate issue for file object problem #20

gholt · 2014-03-24T17:41:26Z

Oh cool, thanks for the info. I figured out the read side out but hadn't caught on to the send side; makes sense though.

gholt · 2014-03-30T19:51:54Z

I tried this technique and it was still gobbling up memory. I wrote a quick test script so maybe you can tell me where I've gone horribly wrong:

from asyncio import coroutine, get_event_loop
from aiohttp import EofStream, request


def send_data():
    with open('big1Gfile', 'rb') as fp:
        chunk = fp.read(65536)
        while chunk:
            yield chunk
            chunk = fp.read(65536)


@coroutine
def func():
    response = yield from request(
        'PUT',
        'https://host/path',
        headers={'x-auth-token': 'blah'},
        data=send_data(),
        chunked=True)
    try:
        while True:
            chunk = yield from response.content.read()
            print(chunk)
    except EofStream:
        pass
    response.close()


get_event_loop().run_until_complete(func())

gholt · 2014-03-30T20:03:44Z

I get similarly large memory usage when not using chunked transfer encoding as well.

When I GET a large file the memory usage is fine, but I wonder if that's just because I can write the response to disk much faster than the network can receive. In other words, if I was chaining the GET to a PUT to another, much slower, host if the memory usage would balloon?

gholt · 2014-03-30T20:08:30Z

Ah yeah, verified that a big GET with a slow reader of the response also uses a lot of memory. I've got to be doing something wrong but I'm not sure how to tell aiohttp how large its buffers may be:

from asyncio import coroutine, get_event_loop, sleep
from aiohttp import EofStream, request


@coroutine
def func():
    response = yield from request(
        'GET',
        'https://host/big1Gfile',
        headers={'x-auth-token': 'blah'})
    try:
        while True:
            chunk = yield from response.content.read()
            yield from sleep(1)
    except EofStream:
        pass
    response.close()


get_event_loop().run_until_complete(func())

fafhrd91 · 2014-03-31T17:25:27Z

Greg,

You pointed to real problems. Client part has to implement flow control subsystem.
asyncio already has flow control system, but aiohttp integration will take some time.
I'll try to come up with something during this week.

gholt · 2014-03-31T23:28:29Z

Ah okay. No rush at all as I'm not trying to use this on a production environment or anything yet. I had read a bit on the flow control and it seems like something you'd want to take your time on to get just right. Thanks for all your work!

fafhrd91 · 2014-03-31T23:40:33Z

I've just commited write flow control, could you try latest master for 'put' request.
read flow control is a bit tricky, i need more time for it.

gholt · 2014-04-01T00:15:54Z

Yes, looks good on the write end of things. Uploaded a 1G file and my process never exceeded 21m resident memory now. It figures that read flow control is more difficult and yet less commonly an issue, hah.

I was hitting my request timeout as I had set it to 60 thinking that was the time of no activity before expiring, but I now see it's the overall time of the request so I set that back to None. I don't suppose there's a no-activity timeout? I wonder if that's something I should put into my send_data generator somehow?

fafhrd91 · 2014-04-01T21:23:12Z

i've just commited read flow control. read control flow is useful serverside, for example if you use third party http service.

timeout is a timeout for sending request and receiving all headers. then you can use asyncio.wait_for to read response body.

gholt · 2014-04-02T03:41:10Z

Seems to be working like a charm. Thank you much!

lock · 2019-10-29T23:02:25Z

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

gholt closed this as completed Mar 24, 2014

gholt reopened this Mar 30, 2014

gholt changed the title ~~General discussion question~~ Flow control support for large transfers Mar 31, 2014

gholt closed this as completed Apr 2, 2014

lock bot added the outdated label Oct 29, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flow control support for large transfers #19

Flow control support for large transfers #19

gholt commented Mar 22, 2014

fafhrd91 commented Mar 24, 2014

fafhrd91 commented Mar 24, 2014

gholt commented Mar 24, 2014

gholt commented Mar 30, 2014

gholt commented Mar 30, 2014

gholt commented Mar 30, 2014

fafhrd91 commented Mar 31, 2014

gholt commented Mar 31, 2014

fafhrd91 commented Mar 31, 2014

gholt commented Apr 1, 2014

fafhrd91 commented Apr 1, 2014

gholt commented Apr 2, 2014

lock bot commented Oct 29, 2019

Flow control support for large transfers #19

Flow control support for large transfers #19

Comments

gholt commented Mar 22, 2014

fafhrd91 commented Mar 24, 2014

fafhrd91 commented Mar 24, 2014

gholt commented Mar 24, 2014

gholt commented Mar 30, 2014

gholt commented Mar 30, 2014

gholt commented Mar 30, 2014

fafhrd91 commented Mar 31, 2014

gholt commented Mar 31, 2014

fafhrd91 commented Mar 31, 2014

gholt commented Apr 1, 2014

fafhrd91 commented Apr 1, 2014

gholt commented Apr 2, 2014

lock bot commented Oct 29, 2019