Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow control implementation problems #314

Closed
fafhrd91 opened this issue Apr 6, 2015 · 24 comments
Closed

Flow control implementation problems #314

fafhrd91 opened this issue Apr 6, 2015 · 24 comments
Labels

Comments

@fafhrd91
Copy link
Member

fafhrd91 commented Apr 6, 2015

current implementation can not really control flow, i think flow control code needs to be part of FlowControlXXX classes.

@kxepal do you use FlowControlStreamReader or FlowControlDataQueue in your projects?

@kxepal
Copy link
Member

kxepal commented Apr 6, 2015

I actually use FlowControlStreamReader for non chunked responses and FlowControlChunksQueue for chunked ones. The latter is based on FlowControlDataQueue with a few tweaks.

@kxepal
Copy link
Member

kxepal commented Apr 6, 2015

If you want to refactor them - do it. I'll help with testing.

@fafhrd91
Copy link
Member Author

fafhrd91 commented Apr 8, 2015

i pushed new flow control implementation @d026a469b74cd0879ef1cab16eb54fde7fcd8be1

what is interesting it shows significant increase in performance for high IO tasks.

here is test results for autobahn test suit on aiohttp 0.15.1

Case 9.1.1  21 ms
Case 9.1.2  86 ms
Case 9.1.3  316 ms
Case 9.1.4  1400 ms
Case 9.1.5  2731 ms
Case 9.1.6  6054 ms

results for master

Case 9.1.1  4 ms
Case 9.1.2  8 ms
Case 9.1.3  32 ms
Case 9.1.4  114 ms
Case 9.1.5  248 ms
Case 9.1.6  479 ms

@asvetlov @kxepal we need to run some tests

@kxepal i removed FlowControlChunksQueue, do you need it?

@fafhrd91
Copy link
Member Author

fafhrd91 commented Apr 8, 2015

@GMLudo could you run your tests for master?

@kxepal
Copy link
Member

kxepal commented Apr 8, 2015

@kxepal i removed FlowControlChunksQueue, do you need it?

well, yes (: because it was the only way to read chunk-by-chunk chunked responses without pain. Let me check however, if FlowControlDataQueue now is able to return EOF_MARKER in place of raising EofStream error.

@kxepal
Copy link
Member

kxepal commented Apr 8, 2015

Results looks great! Nice work!

@ludovic-gasc
Copy link
Contributor

@fafhrd91 I'm preparing to go to the PyCON-US, I'll do that as soon as possible.

@fafhrd91
Copy link
Member Author

fafhrd91 commented Apr 8, 2015

@kxepal FlowControlChunksQueue is back

@kxepal
Copy link
Member

kxepal commented Apr 8, 2015

@fafhrd91 thank you! all seems works perfect so far.

@ludovic-gasc
Copy link
Contributor

@fafhrd91 I'm now at PyCON-US, I've just benchmarked quickly.
I've installed Cython and latest aiohttp master branch with the command pip install git+https://github.com/KeepSafe/aiohttp.git

On the pure JSON test (without DB interactions), it's clearly better, we have more or less 15% of better performances:

Previous aiohttp version:

lg@steroids:~$ wrk -t8 -c256 -d1m http://127.0.0.1:8008/json
Running 1m test @ http://127.0.0.1:8008/json
8 threads and 256 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    19.35ms   11.16ms 177.98ms   68.48%
    Req/Sec     1.72k   350.93     3.04k    68.77%
809602 requests in 1.00m, 149.79MB read
Requests/sec:  13493.25
Transfer/sec:      2.50MB

Latest version:

lg@steroids:~$ wrk -t8 -c256 -d1m http://127.0.0.1:8008/json
Running 1m test @ http://127.0.0.1:8008/json
8 threads and 256 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.75ms    8.63ms 242.11ms   72.95%
    Req/Sec     1.97k   265.30     3.08k    71.99%
930089 requests in 1.00m, 176.51MB read
Requests/sec:  15504.98
Transfer/sec:      2.94MB

Nevertheless, with pgsql benchmark, I've almost the same value, but I think the bottleneck isn't in aiohttp.

Thanks for this improvement, when you release a new version of aiohttp, I'll update FrameworkBenchmarks repository with this version.

@fafhrd91
Copy link
Member Author

@GMLudo could you run tests again. i added bunch of optimizations in @91f63992dbaabc8bf89cf17b100d1d15ad37322c

but make some changes to server code:

  • disable keep-alive, but add kernel level keep-alive settings
  • disable access log
  • make sure that server code runs on multiple processes

what is interesting is 10x difference with sync frameworks, on my local machine i see about 30% difference only.

@ludovic-gasc
Copy link
Contributor

disable keep-alive, but add kernel level keep-alive settings

If I understand correctly your remark, to my knowledge, TCP keep-alive and HTTP keep-alive are two different things. Web seems to be agree with me: http://stackoverflow.com/questions/9334401/http-keep-alive-and-tcp-keep-alive
Nevertheless, could you provide me the kernel settings I need to change ?

disable access log

Ok, like you did in #330.

make sure that server code runs on multiple processes

Yes, I use API-Hour for that: https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Python/API-Hour/hello/etc/hello/api_hour/gunicorn_conf.py#L6

what is interesting is 10x difference with sync frameworks,

I don't understand this part of this sentence, how you find 10x value difference ?

on my local machine i see about 30% difference only.

To my experience, micro-benchmarks results != macro-benchmarks results, I had a lot of examples like with AsyncIO and aiouv. Theoretically, aiouv should be really faster that the standard AsyncIO loop, but finally, not a big change in a macro-benchmark. But, maybe I've missed something or I've made a mistake.

@fafhrd91
Copy link
Member Author

  1. default kernel settings should be fine for benchmarks, most important is to disable keep-alive for aiohttp
  2. this should be None:
    https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Python/API-Hour/hello/hello/__init__.py#L43
  3. this should be 0:
    https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Python/API-Hour/hello/hello/__init__.py#L42

regarding 10x, here is text from your email from asyncio mailing list:

WSGI:

$ wrk -t8 -c256 -d1m http://192.168.2.100:8080/json
Running 1m test @ http://192.168.2.100:8080/json
  8 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.81ms    2.24ms  32.56ms   99.04%
    Req/Sec    20.49k     3.09k   52.56k    81.39%
  9300719 requests in 1.00m, 1.59GB read
Requests/sec: 155019.04
Transfer/sec:     27.05MB

API-Hour + aiohttp.web:

$ wrk -t8 -c256 -d1m http://192.168.2.100:8008/json
Running 1m test @ http://192.168.2.100:8008/json
  8 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    18.36ms   11.36ms 117.66ms   67.44%
    Req/Sec     1.79k   238.97     2.65k    74.02%
  854843 requests in 1.00m, 158.16MB read
Requests/sec:  14248.79
Transfer/sec:      2.64MB

@ludovic-gasc
Copy link
Contributor

  1. It's very interesting, because it's counter intuitive: The cost to reopen a socket between each HTTP request is less than to keep the socket open. It's contrary for sync frameworks. Maybe we should have a bottleneck in keepalive mechanism of aiohttp.
  2. Ok
  3. Ok
  4. It's really awesome !!! Ok, now I understand your enthusiasm. Nevertheless, I really need to sleep right now, I'm still jetlagged. The complete setup to reproduce the exact same environment needs some time to avoid perturbations, I'lI do that as soon as possible tomorrow.
    Don't worry, it's my top priority for tomorrow.

@ludovic-gasc
Copy link
Contributor

BTW, thanks a lot for theses optimizations.

@fafhrd91
Copy link
Member Author

It's very interesting, because it's counter intuitive: The cost to reopen a socket between each HTTP request is less than to keep the socket open. It's contrary for sync frameworks. Maybe we should have a bottleneck in keepalive mechanism of aiohttp.
by turning off keep-alive, you just turn off aiohttp keep-alive subsystem. but aiohttp does not close connection if tcp_sockopt is set, so it just rely on system.

@fafhrd91
Copy link
Member Author

in general, asyncio timers are very expensive.

@ludovic-gasc
Copy link
Contributor

I've just benchmarked in the same conditions that previous benchmark. It's really better (almost x2), but not the big improvement as you told me:

$ wrk -t8 -c256 -d1m http://192.168.2.100:8008/json
Running 1m test @ http://192.168.2.100:8008/json
8 threads and 256 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.38ms    9.39ms 161.06ms   85.96%
    Req/Sec     3.22k   519.56     5.74k    75.93%
1526538 requests in 1.00m, 289.71MB read
Requests/sec:  25443.75
Transfer/sec:      4.83MB

Maybe, I've missed something, I've did theses changes:

@@ -39,8 +39,8 @@ class Container(api_hour.Container):
     def make_servers(self):
         return [self.servers['http'].make_handler(logger=self.worker.log,
                                                   debug=self.worker.cfg.debug,
-                                                  keep_alive=self.worker.cfg.keepalive,
-                                                  access_log=self.worker.log.access_log,
+                                                  keep_alive=0,
+                                                  access_log=None,
                                                   access_log_format=self.worker.cfg.access_log_format),
                 servers.yocto_http.YoctoHttpJson,
                 servers.yocto_http.YoctoHttpText]

pip freeze result:

-e [email protected]:Eyepea/aiohttp.git@1e5f7134a42a4b9321332a2eeea8aadfb3faf946#egg=aiohttp-master

Any clue to track the difference between your setup and my setup ?
Do you use two differents computers for client and server parts ?

@fafhrd91
Copy link
Member Author

here is my aiohttp script:

import asyncio
import json
from aiohttp import web

def index(req):
    return web.Response(
        body=json.dumps({'message':'Hello, World!'}).encode('utf-8'))
    yield from ()


@asyncio.coroutine
def init(loop):
    app = web.Application(loop=loop)
    app.router.add_route('GET', '/json', index)

    handler = app.make_handler(keep_alive=0)
    srv = yield from loop.create_server(handler, '0.0.0.0', 8080)
    print("Server started at http://127.0.0.1:8080")
    return srv, handler


loop = asyncio.get_event_loop()
srv, handler = loop.run_until_complete(init(loop))

def main():
    try:
        loop.run_forever()
    except KeyboardInterrupt:
        loop.run_until_complete(handler.finish_connections())

main()

i run it in one process. i use pyramid for testing wsgi, i explicitly set number of gunicorn workers to 1.

@ludovic-gasc
Copy link
Contributor

Ok, it makes sense.
In fact, you certainly use sync worker in gunicorn, the async meinheld worker gives you more performance.

Moreover, Pyramid is really slower compare to WSGI raw implementation, you can check the preliminary round 10 results of TechEmpower benchmarks (WARNING: This URL is normally semi-private, because they don't finish to validate all results for all frameworks. Don't share this URL). Nevertheless, for JSON test with Python, the results seem to be correct: http://www.techempower.com/benchmarks/previews/round10/#section=data-r10&hw=peak&test=json&l=1kw )

It isn't always a correct shortcut, but at least here, if I use a proportional rule between your results, my results, and TechEmpower results, we have more or less the same proportional rate.

Thank your for theses improvements, I'll integrate that in TechEmpower benchmark when the new release of aiohttp will be available.

@fafhrd91
Copy link
Member Author

ok, that makes sense

@fafhrd91
Copy link
Member Author

i guess we are done with optimizations :)

@ludovic-gasc
Copy link
Contributor

@fafhrd91 Yes, indeed ;-)

@lock
Copy link

lock bot commented Oct 29, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Oct 29, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants