-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync and cp hangs with a large amount of files #657
Comments
I get the exact same problem using Python 3.3.3 on Ubuntu 10.04 LTS from a bare-metal server to sync files from S3 to local disk. Double-ctrl-C will cancel out after which I can re-run the sync and get another batch of files, sometimes completing the sync, sometimes not. aws-cli 1.2.13 |
Hello daveadams |
I'm going to try to repro this issue. I want to make sure I'm getting as similar as an environment as possible. If I'm doing the math right, 100GB total size of 80k files works out to about 1.3MB for the average size of the file? Is the file transfer |
Hi Jamesls and thanks for the help I'm generating a storage currently using this script and will try it on different types vanilla instances and see if I can produce it on other m2.4xlarge instances. Please see the attached script
|
Thanks I'll give your script a shot and report back what I find. |
I can confirm the issue. Investigating why it's hanging. |
What instance type did you use?which virtualisation? |
On an m2.4xlarge. I believe I have a fix for this here, but I want to run more tests before I'm certain. |
Everything works great after installing aws cli from github. Great work!!!! Thanks |
I still see this issue. I have downloaded aws build from the installer just couple of days back. Thoughts on what could be wrong? |
Hi @vishwasg1974 |
I am using awscli version 1.3.23. My transfer is hanging after it says finished 945 of 946. Any suggestions? |
Hi, aws cli got upgraded to 1.4.x. We transfer daily a couple of TB from/to s3 with the cli. |
Hello
Whenever I'm using aws s3 cp/sync the process hangs after sometimes, no errors or warnings, it just hangs forever. Here are some remarks about the use case and observations
1- s3 cp/sync is executed to process large amount of files 2K-80K files, with a total size between 10G-100GB maybe
2- I'm using aws-cli/1.2.13 Python/2.6.6 Linux/2.6.32-279.1.1.el6.x86_64
3- The command is executed on a m2.4xlarge instance with centos 6.5
When the process starts, the IO gets checked, wait time is over 99%, and the load average after sometime reaches 44.
4 Hours later, the resources go back to normal with load average of almost 0 as well as almost 0% wait time
The other interesting remark is that it always stops short before the end, for example, try a set of files, let us call x, it will always hangs at:
"Completed 2712 of 2714 part(s) with 1 file(s) remaining"
Now trying a different set of files y, it will always have the last line:
"Completed 39411 of 39417 part(s) with 3 file(s) remaining"
No matter how many times I repeat the test, it will always stop at one of the gicen lines (depending on which file dataset). This is a consistent failure with a large amount of files
If I'm downloading just one huge file, lets say 100GB, then everything is fine.
Any ideas?
Thanks
Update1:
The bug does not appear when I use an instance of type c3.2xlarge using the exact same configurations to fire up the instance. This problem is still reproducable on m2.4xlarge
The main difference between the two instances is that m2.4xlarge is using PARAVIRT ami, while the c3.2xlarge is using a HVM.
The text was updated successfully, but these errors were encountered: