Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When retrieving data from S3, account for chunking #1085

Merged
merged 5 commits into from
May 15, 2023

Conversation

horsto
Copy link
Contributor

@horsto horsto commented May 12, 2023

There seems to be an issue with retrieval of larger ("external") files from S3 compatible buckets. I documented the error here: #1083. I brought it up as issue in the minio-api repository as well: minio/minio-py#1280.

The problem is that files can be returned as chunks via the minio API, such that .data cannot be called directly. Instead, .stream and subsequent concatenation of bytes seems to work for all cases. This behavior is already correctly implemented in fget() (

for d in data.stream(1 << 16):
)

@horsto
Copy link
Contributor Author

horsto commented May 12, 2023

This PR changes the .data method to .stream in the get() method for S3 (external) objects, which accommodates chunking of larger files (opposed to assuming that every / the whole file is loaded at once).

@dimitri-yatsenko
Copy link
Member

@horsto Would you merge this PR horsto#1 ?

pull from upstream and update changelog
@horsto
Copy link
Contributor Author

horsto commented May 15, 2023

Done!

@dimitri-yatsenko dimitri-yatsenko merged commit 97e34a6 into datajoint:master May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants