Flow control implementation problems #314

fafhrd91 · 2015-04-06T14:46:26Z

current implementation can not really control flow, i think flow control code needs to be part of FlowControlXXX classes.

@kxepal do you use FlowControlStreamReader or FlowControlDataQueue in your projects?

kxepal · 2015-04-06T14:50:44Z

I actually use FlowControlStreamReader for non chunked responses and FlowControlChunksQueue for chunked ones. The latter is based on FlowControlDataQueue with a few tweaks.

kxepal · 2015-04-06T15:03:24Z

If you want to refactor them - do it. I'll help with testing.

fafhrd91 · 2015-04-08T05:11:32Z

i pushed new flow control implementation @d026a469b74cd0879ef1cab16eb54fde7fcd8be1

what is interesting it shows significant increase in performance for high IO tasks.

here is test results for autobahn test suit on aiohttp 0.15.1

Case 9.1.1  21 ms
Case 9.1.2  86 ms
Case 9.1.3  316 ms
Case 9.1.4  1400 ms
Case 9.1.5  2731 ms
Case 9.1.6  6054 ms

results for master

Case 9.1.1  4 ms
Case 9.1.2  8 ms
Case 9.1.3  32 ms
Case 9.1.4  114 ms
Case 9.1.5  248 ms
Case 9.1.6  479 ms

@asvetlov @kxepal we need to run some tests

@kxepal i removed FlowControlChunksQueue, do you need it?

fafhrd91 · 2015-04-08T05:13:12Z

@GMLudo could you run your tests for master?

kxepal · 2015-04-08T10:05:02Z

@kxepal i removed FlowControlChunksQueue, do you need it?

well, yes (: because it was the only way to read chunk-by-chunk chunked responses without pain. Let me check however, if FlowControlDataQueue now is able to return EOF_MARKER in place of raising EofStream error.

kxepal · 2015-04-08T10:05:44Z

Results looks great! Nice work!

ludovic-gasc · 2015-04-08T14:04:21Z

@fafhrd91 I'm preparing to go to the PyCON-US, I'll do that as soon as possible.

fafhrd91 · 2015-04-08T17:27:22Z

@kxepal FlowControlChunksQueue is back

kxepal · 2015-04-08T20:43:08Z

@fafhrd91 thank you! all seems works perfect so far.

ludovic-gasc · 2015-04-09T21:46:53Z

@fafhrd91 I'm now at PyCON-US, I've just benchmarked quickly.
I've installed Cython and latest aiohttp master branch with the command pip install git+https://github.com/KeepSafe/aiohttp.git

On the pure JSON test (without DB interactions), it's clearly better, we have more or less 15% of better performances:

Previous aiohttp version:

lg@steroids:~$ wrk -t8 -c256 -d1m http://127.0.0.1:8008/json
Running 1m test @ http://127.0.0.1:8008/json
8 threads and 256 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    19.35ms   11.16ms 177.98ms   68.48%
    Req/Sec     1.72k   350.93     3.04k    68.77%
809602 requests in 1.00m, 149.79MB read
Requests/sec:  13493.25
Transfer/sec:      2.50MB

Latest version:

lg@steroids:~$ wrk -t8 -c256 -d1m http://127.0.0.1:8008/json
Running 1m test @ http://127.0.0.1:8008/json
8 threads and 256 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.75ms    8.63ms 242.11ms   72.95%
    Req/Sec     1.97k   265.30     3.08k    71.99%
930089 requests in 1.00m, 176.51MB read
Requests/sec:  15504.98
Transfer/sec:      2.94MB

Nevertheless, with pgsql benchmark, I've almost the same value, but I think the bottleneck isn't in aiohttp.

Thanks for this improvement, when you release a new version of aiohttp, I'll update FrameworkBenchmarks repository with this version.

fafhrd91 · 2015-04-15T21:05:54Z

@GMLudo could you run tests again. i added bunch of optimizations in @91f63992dbaabc8bf89cf17b100d1d15ad37322c

but make some changes to server code:

disable keep-alive, but add kernel level keep-alive settings
disable access log
make sure that server code runs on multiple processes

what is interesting is 10x difference with sync frameworks, on my local machine i see about 30% difference only.

ludovic-gasc · 2015-04-16T21:28:27Z

disable keep-alive, but add kernel level keep-alive settings

If I understand correctly your remark, to my knowledge, TCP keep-alive and HTTP keep-alive are two different things. Web seems to be agree with me: http://stackoverflow.com/questions/9334401/http-keep-alive-and-tcp-keep-alive
Nevertheless, could you provide me the kernel settings I need to change ?

disable access log

Ok, like you did in #330.

make sure that server code runs on multiple processes

Yes, I use API-Hour for that: https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Python/API-Hour/hello/etc/hello/api_hour/gunicorn_conf.py#L6

what is interesting is 10x difference with sync frameworks,

I don't understand this part of this sentence, how you find 10x value difference ?

on my local machine i see about 30% difference only.

To my experience, micro-benchmarks results != macro-benchmarks results, I had a lot of examples like with AsyncIO and aiouv. Theoretically, aiouv should be really faster that the standard AsyncIO loop, but finally, not a big change in a macro-benchmark. But, maybe I've missed something or I've made a mistake.

fafhrd91 · 2015-04-16T21:41:17Z

default kernel settings should be fine for benchmarks, most important is to disable keep-alive for aiohttp
this should be None:
https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Python/API-Hour/hello/hello/__init__.py#L43
this should be 0:
https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Python/API-Hour/hello/hello/__init__.py#L42

regarding 10x, here is text from your email from asyncio mailing list:

WSGI:

$ wrk -t8 -c256 -d1m http://192.168.2.100:8080/json
Running 1m test @ http://192.168.2.100:8080/json
  8 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.81ms    2.24ms  32.56ms   99.04%
    Req/Sec    20.49k     3.09k   52.56k    81.39%
  9300719 requests in 1.00m, 1.59GB read
Requests/sec: 155019.04
Transfer/sec:     27.05MB

API-Hour + aiohttp.web:

$ wrk -t8 -c256 -d1m http://192.168.2.100:8008/json
Running 1m test @ http://192.168.2.100:8008/json
  8 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    18.36ms   11.36ms 117.66ms   67.44%
    Req/Sec     1.79k   238.97     2.65k    74.02%
  854843 requests in 1.00m, 158.16MB read
Requests/sec:  14248.79
Transfer/sec:      2.64MB

ludovic-gasc · 2015-04-16T22:42:23Z

It's very interesting, because it's counter intuitive: The cost to reopen a socket between each HTTP request is less than to keep the socket open. It's contrary for sync frameworks. Maybe we should have a bottleneck in keepalive mechanism of aiohttp.
Ok
Ok
It's really awesome !!! Ok, now I understand your enthusiasm. Nevertheless, I really need to sleep right now, I'm still jetlagged. The complete setup to reproduce the exact same environment needs some time to avoid perturbations, I'lI do that as soon as possible tomorrow.
Don't worry, it's my top priority for tomorrow.

ludovic-gasc · 2015-04-16T22:42:51Z

BTW, thanks a lot for theses optimizations.

fafhrd91 · 2015-04-16T23:16:41Z

It's very interesting, because it's counter intuitive: The cost to reopen a socket between each HTTP request is less than to keep the socket open. It's contrary for sync frameworks. Maybe we should have a bottleneck in keepalive mechanism of aiohttp.
by turning off keep-alive, you just turn off aiohttp keep-alive subsystem. but aiohttp does not close connection if tcp_sockopt is set, so it just rely on system.

fafhrd91 · 2015-04-16T23:21:10Z

in general, asyncio timers are very expensive.

ludovic-gasc · 2015-04-17T21:03:29Z

I've just benchmarked in the same conditions that previous benchmark. It's really better (almost x2), but not the big improvement as you told me:

$ wrk -t8 -c256 -d1m http://192.168.2.100:8008/json
Running 1m test @ http://192.168.2.100:8008/json
8 threads and 256 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.38ms    9.39ms 161.06ms   85.96%
    Req/Sec     3.22k   519.56     5.74k    75.93%
1526538 requests in 1.00m, 289.71MB read
Requests/sec:  25443.75
Transfer/sec:      4.83MB

Maybe, I've missed something, I've did theses changes:

@@ -39,8 +39,8 @@ class Container(api_hour.Container):
     def make_servers(self):
         return [self.servers['http'].make_handler(logger=self.worker.log,
                                                   debug=self.worker.cfg.debug,
-                                                  keep_alive=self.worker.cfg.keepalive,
-                                                  access_log=self.worker.log.access_log,
+                                                  keep_alive=0,
+                                                  access_log=None,
                                                   access_log_format=self.worker.cfg.access_log_format),
                 servers.yocto_http.YoctoHttpJson,
                 servers.yocto_http.YoctoHttpText]

pip freeze result:

-e [email protected]:Eyepea/aiohttp.git@1e5f7134a42a4b9321332a2eeea8aadfb3faf946#egg=aiohttp-master

Any clue to track the difference between your setup and my setup ?
Do you use two differents computers for client and server parts ?

fafhrd91 · 2015-04-17T21:57:43Z

here is my aiohttp script:

import asyncio
import json
from aiohttp import web

def index(req):
    return web.Response(
        body=json.dumps({'message':'Hello, World!'}).encode('utf-8'))
    yield from ()


@asyncio.coroutine
def init(loop):
    app = web.Application(loop=loop)
    app.router.add_route('GET', '/json', index)

    handler = app.make_handler(keep_alive=0)
    srv = yield from loop.create_server(handler, '0.0.0.0', 8080)
    print("Server started at http://127.0.0.1:8080")
    return srv, handler


loop = asyncio.get_event_loop()
srv, handler = loop.run_until_complete(init(loop))

def main():
    try:
        loop.run_forever()
    except KeyboardInterrupt:
        loop.run_until_complete(handler.finish_connections())

main()

i run it in one process. i use pyramid for testing wsgi, i explicitly set number of gunicorn workers to 1.

ludovic-gasc · 2015-04-18T12:19:51Z

Ok, it makes sense.
In fact, you certainly use sync worker in gunicorn, the async meinheld worker gives you more performance.

Moreover, Pyramid is really slower compare to WSGI raw implementation, you can check the preliminary round 10 results of TechEmpower benchmarks (WARNING: This URL is normally semi-private, because they don't finish to validate all results for all frameworks. Don't share this URL). Nevertheless, for JSON test with Python, the results seem to be correct: http://www.techempower.com/benchmarks/previews/round10/#section=data-r10&hw=peak&test=json&l=1kw )

It isn't always a correct shortcut, but at least here, if I use a proportional rule between your results, my results, and TechEmpower results, we have more or less the same proportional rate.

Thank your for theses improvements, I'll integrate that in TechEmpower benchmark when the new release of aiohttp will be available.

fafhrd91 · 2015-04-18T15:11:14Z

ok, that makes sense

fafhrd91 · 2015-04-18T15:11:33Z

i guess we are done with optimizations :)

ludovic-gasc · 2015-04-18T20:24:45Z

@fafhrd91 Yes, indeed ;-)

…tp#314

lock · 2019-10-29T19:01:39Z

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

fafhrd91 closed this as completed Apr 18, 2015

ludovic-gasc added a commit to Eyepea/FrameworkBenchmarks that referenced this issue May 31, 2015

Improve aiohttp performances based on this discussion: aio-libs/aioht…

7c9f9b3

…tp#314

ludovic-gasc mentioned this issue May 31, 2015

A way to detect on runtime that C version of multidict is used ? #387

Closed

aio-libs-bot mentioned this issue Oct 18, 2018

Refactor cleanup_ctx implementation #3350

Closed

lock bot added the outdated label Oct 29, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flow control implementation problems #314

Flow control implementation problems #314

fafhrd91 commented Apr 6, 2015

kxepal commented Apr 6, 2015

kxepal commented Apr 6, 2015

fafhrd91 commented Apr 8, 2015

fafhrd91 commented Apr 8, 2015

kxepal commented Apr 8, 2015

kxepal commented Apr 8, 2015

ludovic-gasc commented Apr 8, 2015

fafhrd91 commented Apr 8, 2015

kxepal commented Apr 8, 2015

ludovic-gasc commented Apr 9, 2015

fafhrd91 commented Apr 15, 2015

ludovic-gasc commented Apr 16, 2015

fafhrd91 commented Apr 16, 2015

ludovic-gasc commented Apr 16, 2015

ludovic-gasc commented Apr 16, 2015

fafhrd91 commented Apr 16, 2015

fafhrd91 commented Apr 16, 2015

ludovic-gasc commented Apr 17, 2015

fafhrd91 commented Apr 17, 2015

ludovic-gasc commented Apr 18, 2015

fafhrd91 commented Apr 18, 2015

fafhrd91 commented Apr 18, 2015

ludovic-gasc commented Apr 18, 2015

lock bot commented Oct 29, 2019

Flow control implementation problems #314

Flow control implementation problems #314

Comments

fafhrd91 commented Apr 6, 2015

kxepal commented Apr 6, 2015

kxepal commented Apr 6, 2015

fafhrd91 commented Apr 8, 2015

fafhrd91 commented Apr 8, 2015

kxepal commented Apr 8, 2015

kxepal commented Apr 8, 2015

ludovic-gasc commented Apr 8, 2015

fafhrd91 commented Apr 8, 2015

kxepal commented Apr 8, 2015

ludovic-gasc commented Apr 9, 2015

fafhrd91 commented Apr 15, 2015

ludovic-gasc commented Apr 16, 2015

fafhrd91 commented Apr 16, 2015

ludovic-gasc commented Apr 16, 2015

ludovic-gasc commented Apr 16, 2015

fafhrd91 commented Apr 16, 2015

fafhrd91 commented Apr 16, 2015

ludovic-gasc commented Apr 17, 2015

fafhrd91 commented Apr 17, 2015

ludovic-gasc commented Apr 18, 2015

fafhrd91 commented Apr 18, 2015

fafhrd91 commented Apr 18, 2015

ludovic-gasc commented Apr 18, 2015

lock bot commented Oct 29, 2019