Listing contents of large s3 folders is slow #140

yoel-ross-zip · 2022-03-20T15:16:51Z

Hey,

Thanks for your work on this library, iv'e been using it for a while and its really nice.

Recently i ran into some issues with long load time for large s3 folders. I believe this is the result of repeated synchronous calls to the abstract lstat method. I have done some testing, and found that if making these calls with asyncio, using the s3fs._info method instead really speeds things up (like 20X faster on large folders).

I'm currently using a fork i made with these changes, and it works great. I opened a PR for you to consider: #139

I use this library quite a bit, and would be happy to put in the work to get this change merged.

Thanks again!

Joe

The text was updated successfully, but these errors were encountered:

danielfrg · 2022-03-23T12:41:36Z

Fixed thanks to your PR :)
Thanks!

aleny91 · 2022-06-28T12:19:21Z

@ziprjoe @danielfrg First of all, many thanks for your precious work! 😄
I've just installed this new modified version, because I noticed the same problem working with large directories.
Sadly, I'm facing an error. It seems that the file .s3keep is present in the bucket only at the highest level, but not in the subdirectories where it is also searched. Any suggestions?

yoel-ross-zip · 2022-06-29T12:32:59Z

Hey, should be a matter of catching the exception and ignoring it. In cases where there is no s3keep file, there isn't a way to show the last update time, so a dummy date will be displayed.
this PR should fix it: #143

fakhavan · 2023-04-24T06:50:12Z

@ziprjoe @danielfrg Firstly, I'd like to express my gratitude for your excellent work on this library. It has been incredibly useful for my use-case of connecting s3 with Jhub compared to the alternatives.

However, I've encountered an issue when using s3contents to connect to an S3 bucket with pre-existing directories. These directories aren't displayed in the UI unless I manually add a .s3keep file to each directory. Once I do this, the issue is resolved. I'm wondering if you are aware of the cause of this problem and if there's a way to use s3contents with a bucket that has pre-existing directories without having to manually add .s3keep files to each directory.

Thank you for your time and attention!

danielfrg · 2023-04-24T14:30:54Z

Hi @ziproje.

I think there are new ways to handle directories in S3 that do not require the placeholder files. I have not tested and to be honest I am not using this lib anymore.

I try to keep it updated but since I am not using it, it is behind on needed features and I dont expect I will be able to add new features in the near future. I basically just handle new releases from contributors at this point.

fbaldo31 · 2024-05-16T11:21:49Z

@ziprjoe @danielfrg Firstly, I'd like to express my gratitude for your excellent work on this library. It has been incredibly useful for my use-case of connecting s3 with Jhub compared to the alternatives.

However, I've encountered an issue when using s3contents to connect to an S3 bucket with pre-existing directories. These directories aren't displayed in the UI unless I manually add a .s3keep file to each directory. Once I do this, the issue is resolved. I'm wondering if you are aware of the cause of this problem and if there's a way to use s3contents with a bucket that has pre-existing directories without having to manually add .s3keep files to each directory.

Thank you for your time and attention!

I handle that with a script called in postStart lifecycle hook

file=$HOME/.dir.txt
# Save s3 directory tree
aws s3 ls --recursive s3://<bucket> | cut -c32- | xargs -d '\n' -n 1 dirname | uniq > $HOME/.dir.txt
touch .s3keep

while IFS= read -r folder; do
    aws s3 cp .s3keep s3://<bucket>/$folder/.s3keep
done < "$file"

AbdealiLoKo mentioned this issue May 20, 2022

Jupyterhub showing server error when number of files are large in s3 #126

Open

yoel-ross-zip mentioned this issue Jun 29, 2022

catch missing s3keep case #143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Listing contents of large s3 folders is slow #140

Listing contents of large s3 folders is slow #140

yoel-ross-zip commented Mar 20, 2022

danielfrg commented Mar 23, 2022

aleny91 commented Jun 28, 2022

yoel-ross-zip commented Jun 29, 2022

fakhavan commented Apr 24, 2023

danielfrg commented Apr 24, 2023

fbaldo31 commented May 16, 2024

Listing contents of large s3 folders is slow #140

Listing contents of large s3 folders is slow #140

Comments

yoel-ross-zip commented Mar 20, 2022

danielfrg commented Mar 23, 2022

aleny91 commented Jun 28, 2022

yoel-ross-zip commented Jun 29, 2022

fakhavan commented Apr 24, 2023

danielfrg commented Apr 24, 2023

fbaldo31 commented May 16, 2024