Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow control support for large transfers #19

Closed
gholt opened this issue Mar 22, 2014 · 13 comments
Closed

Flow control support for large transfers #19

gholt opened this issue Mar 22, 2014 · 13 comments
Labels

Comments

@gholt
Copy link
Contributor

gholt commented Mar 22, 2014

Do you folks have a mailing list or IRC channel for aiohttp discussions? Or is creating Issues good enough?

My current thoughts lie in the aiohttp.client area. The HttpResponse looks to need some love with respect to large responses. It seems it would load a 5G download entirely into memory first if read() is called. Using content.read() directly is certainly okay, but I wonder if you have plans to improve this area already?

On the flip side, what about sending large requests? I'm pretty new to all of this (asyncio and aiohttp) so forgive me if this is obvious stuff to you folks. I've tried setting data to a file-like object from the standard open(path) but it just hangs for me. If I pass open(path).read() instead it works, but of course that's bad if it's a huge file.

Just to give some context, I'm working on an SDK to work as a client against OpenStack Swift / Rackspace Cloud Files, something I'm quite familiar with ;) as I've been working on that project for years now, but only in the Python 2.6 and 2.7 realm. This is my first real attempt with Python 3, specifically 3.4.

@fafhrd91
Copy link
Member

Hey Greg,

i think it is too early to make mailing list for aiohttp, i dont want to create another dead mailing list.
regarding big responses. we are using aiohttp for aws s3 integration, it works quite well. here is example of how to download big file:

def coro():
    resp = yield from aiohttp.request('get', 'http://...')

    with open('f.txt', 'wb') as f:
        while True:
            try:
                chunk = yield from resp.content.read()
                f.write(chunk)
            except aiohttp.EofStream:
                pass

    resp.close()

large requests are also supported, you can pass generator as data parameter:

def coro():
    resp = yield from aiohttp.request(
        'post', 'http://...', data=send_data(fname))
    yield from resp


def send_data(fname):
    with open(fname, 'rb') as f:
        while True:
            chunk = f.read(1024)
            if not chunk:
                break

            yield chunk

you can also chain get request with post request:

def coro():
    get_resp = yield from aiohttp.request('get', 'http://...')

    resp = yield from aiohttp.request(
        'post', 'http://...', data=send_data(get_resp))
    yield from resp


def send_data(resp):
    while True:
        try:
            chunk = yield from resp.content.read()
            yield chunk
        except aiohttp.EofStream:
            break

i think main problem right now is documentation

@fafhrd91
Copy link
Member

i created separate issue for file object problem #20

@gholt
Copy link
Contributor Author

gholt commented Mar 24, 2014

Oh cool, thanks for the info. I figured out the read side out but hadn't caught on to the send side; makes sense though.

@gholt gholt closed this as completed Mar 24, 2014
@gholt gholt reopened this Mar 30, 2014
@gholt
Copy link
Contributor Author

gholt commented Mar 30, 2014

I tried this technique and it was still gobbling up memory. I wrote a quick test script so maybe you can tell me where I've gone horribly wrong:

from asyncio import coroutine, get_event_loop
from aiohttp import EofStream, request


def send_data():
    with open('big1Gfile', 'rb') as fp:
        chunk = fp.read(65536)
        while chunk:
            yield chunk
            chunk = fp.read(65536)


@coroutine
def func():
    response = yield from request(
        'PUT',
        'https://host/path',
        headers={'x-auth-token': 'blah'},
        data=send_data(),
        chunked=True)
    try:
        while True:
            chunk = yield from response.content.read()
            print(chunk)
    except EofStream:
        pass
    response.close()


get_event_loop().run_until_complete(func())

@gholt
Copy link
Contributor Author

gholt commented Mar 30, 2014

I get similarly large memory usage when not using chunked transfer encoding as well.

When I GET a large file the memory usage is fine, but I wonder if that's just because I can write the response to disk much faster than the network can receive. In other words, if I was chaining the GET to a PUT to another, much slower, host if the memory usage would balloon?

@gholt
Copy link
Contributor Author

gholt commented Mar 30, 2014

Ah yeah, verified that a big GET with a slow reader of the response also uses a lot of memory. I've got to be doing something wrong but I'm not sure how to tell aiohttp how large its buffers may be:

from asyncio import coroutine, get_event_loop, sleep
from aiohttp import EofStream, request


@coroutine
def func():
    response = yield from request(
        'GET',
        'https://host/big1Gfile',
        headers={'x-auth-token': 'blah'})
    try:
        while True:
            chunk = yield from response.content.read()
            yield from sleep(1)
    except EofStream:
        pass
    response.close()


get_event_loop().run_until_complete(func())

@fafhrd91
Copy link
Member

Greg,

You pointed to real problems. Client part has to implement flow control subsystem.
asyncio already has flow control system, but aiohttp integration will take some time.
I'll try to come up with something during this week.

@gholt
Copy link
Contributor Author

gholt commented Mar 31, 2014

Ah okay. No rush at all as I'm not trying to use this on a production environment or anything yet. I had read a bit on the flow control and it seems like something you'd want to take your time on to get just right. Thanks for all your work!

@gholt gholt changed the title General discussion question Flow control support for large transfers Mar 31, 2014
@fafhrd91
Copy link
Member

I've just commited write flow control, could you try latest master for 'put' request.
read flow control is a bit tricky, i need more time for it.

@gholt
Copy link
Contributor Author

gholt commented Apr 1, 2014

Yes, looks good on the write end of things. Uploaded a 1G file and my process never exceeded 21m resident memory now. It figures that read flow control is more difficult and yet less commonly an issue, hah.

I was hitting my request timeout as I had set it to 60 thinking that was the time of no activity before expiring, but I now see it's the overall time of the request so I set that back to None. I don't suppose there's a no-activity timeout? I wonder if that's something I should put into my send_data generator somehow?

@fafhrd91
Copy link
Member

fafhrd91 commented Apr 1, 2014

i've just commited read flow control. read control flow is useful serverside, for example if you use third party http service.

timeout is a timeout for sending request and receiving all headers. then you can use asyncio.wait_for to read response body.

@gholt
Copy link
Contributor Author

gholt commented Apr 2, 2014

Seems to be working like a charm. Thank you much!

@gholt gholt closed this as completed Apr 2, 2014
@lock
Copy link

lock bot commented Oct 29, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Oct 29, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants