Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection broken: OSError(22, 'Invalid argument') when downloading larg(er) files #1280

Closed
horsto opened this issue May 11, 2023 · 5 comments

Comments

@horsto
Copy link

horsto commented May 11, 2023

Hi, I am using the minio python library (7.1.13) to retrieve objects from a Linode (Akamai) object store (us-east-1.linodeobjects.com). This works fine for smaller files (I encounter no issues at all), but larger files run into OSErrors like:

File ~/miniconda3/envs/octo_code/lib/python3.10/site-packages/datajoint/s3.py:71, in Folder.get(self, name)
     69 logger.debug("get: {}:{}".format(self.bucket, name))
     70 try:
---> 71     return self.client.get_object(self.bucket, str(name)).data
     72 except minio.error.S3Error as e:
     73     if e.code == "NoSuchKey":

File ~/miniconda3/envs/octo_code/lib/python3.10/site-packages/urllib3/response.py:306, in HTTPResponse.data(self)
    303     return self._body
    305 if self._fp:
--> 306     return self.read(cache_content=True)

File ~/miniconda3/envs/octo_code/lib/python3.10/site-packages/urllib3/response.py:566, in HTTPResponse.read(self, amt, decode_content, cache_content)
    563 flush_decoder = False
    564 fp_closed = getattr(self._fp, "closed", False)
--> 566 with self._error_catcher():
    567     data = self._fp_read(amt) if not fp_closed else b""
    568     if amt is None:

File ~/miniconda3/envs/octo_code/lib/python3.10/contextlib.py:153, in _GeneratorContextManager.__exit__(self, typ, value, traceback)
    151     value = typ()
    152 try:
--> 153     self.gen.throw(typ, value, traceback)
    154 except StopIteration as exc:
    155     # Suppress StopIteration *unless* it's the same exception that
    156     # was passed to throw().  This prevents a StopIteration
    157     # raised inside the "with" statement from being suppressed.
    158     return exc is not value

File ~/miniconda3/envs/octo_code/lib/python3.10/site-packages/urllib3/response.py:461, in HTTPResponse._error_catcher(self)
    457     raise ReadTimeoutError(self._pool, None, "Read timed out.")
    459 except (HTTPException, SocketError) as e:
    460     # This includes IncompleteRead.
--> 461     raise ProtocolError("Connection broken: %r" % e, e)
    463 # If no exception is thrown, we should avoid cleaning up
    464 # unnecessarily.
    465 clean_exit = True

ProtocolError: ("Connection broken: OSError(22, 'Invalid argument')", OSError(22, 'Invalid argument'))

This error occurs immediately when trying to access/ downloading the file (no delay) and it does seem to be limited to only larger files - I do not know where the cutoff on my side is.
The client connection is established via another library (https://github.com/datajoint/datajoint-python/blob/master/datajoint/s3.py)
Any ideas?

@balamurugana
Copy link
Member

---> 71 return self.client.get_object(self.bucket, str(name)).data this cannot be done.

get_object() returns urllib3 response. it is required to iterate to read data chunk till EOF.

@horsto
Copy link
Author

horsto commented May 12, 2023

Thanks, is there a code example for that?

@balamurugana
Copy link
Member

# Get data of an object.
try:
    response = client.get_object("my-bucket", "my-object")
    # Read data from response.
    while True:
        data = response.read(16*1024)  # read 16k bytes
        if not data:
            break
        print(data)
finally:
    response.close()
    response.release_conn()

@horsto
Copy link
Author

horsto commented May 12, 2023

Thanks!
I tried to replace self.client.get_object(self.bucket, str(name)).data with

with self.client.get_object(self.bucket, str(name)) as result:
    data = [d for d in result.stream()]
return b"".join(data)

(This does return the correct data for small and large files ...)

Is the .stream way of doing things here acceptable?

@balamurugana
Copy link
Member

As get_object() returns utllib3.Response, it is up to the user how to use it. However WRT your code snippet, you are loading entire object data into memory. If object size is too large, your program will crash to due OOM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants