-
-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slow http read performance on aws lambda #709
Comments
The fast version does not run correctly because of a missing requests import. But, even after fixing that, I still could not reproduce the problem.
|
Thanks for pointing out the typo, will fix in my original post. Tested locally on 3.9 to rule that out:
And double checked again on Lambda: fast
slow
As you can see the differential is negligible locally, huge on lambda. Not sure what is going on there. Possibly something to do with memory availability? |
I don't have much experience with Lambda, so it's difficult for me to comment. It's odd that the slow version is still more than twice as slow locally, though... Are you able to investigate why there is such a difference? There is a small chance that this difference is what's causing the huge slowdown on Lambda. The way I would approach this is:
|
I am able to reproduce this with s3 and https request as well. with python requests:
~ 10MB/s - I believe this is because I have the chunk size as 10MB sm_open:
~ 0.093 MB/s - I could try chunking like above but I wouldn't expect a slow down of this order of magnitude. |
Are you able to profile the code to work out where the time-consuming part is? It seems that downloading is slow, because you're using smart_open for the upload in both cases. If so, then we can probably eliminate the upload component altogether, and look for the problem in the download component. Also, ensure compression isn't causing the slow-down. By default, smart_open uses the file extension to transparently handle compression. |
Problem description
I am trying to stream data from a website using http directly into s3 using smart_open using an AWS Lambda function. Testing has shown that the http read with smart_open is much slower than the same function using requests directly, by about an order of magnitude, so the examples reflect that for simplicity of reproduction.
Tests on a local machine do not show the same discrepancy.
I may well be doing this wrong as I couldn't find an example of how to do this, but happy to contribute one if someone can put me right.
Steps/code to reproduce the problem
Fast version ~ 5 seconds
slow version ~ 170 seconds
Versions
Checklist
Before you create the issue, please make sure you have:
The text was updated successfully, but these errors were encountered: