Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Improvements #10780

Open
mmattel opened this issue Dec 17, 2024 · 1 comment
Open

S3 Improvements #10780

mmattel opened this issue Dec 17, 2024 · 1 comment

Comments

@mmattel
Copy link
Contributor

mmattel commented Dec 17, 2024

References: #10775 (S3 storage truncated to 250000MB per object without error)

We have some issues to solve:

  • envvar STORAGE_USERS_S3NG_PUT_OBJECT_PART_SIZE
    The envvar text needs an update (I will take care on this when v7 branches have been published.)
    Note that I already work on a S3 description update in the admin docs to reflect the referenced issue

    • We only print a default part size of 0 which is wrong because the underlaying library uses 16MB.
    • It is unclear if a setting can use human readble values or only a number
      (would be great if that exists)
  • We need to check what happens on the ocis side when the part quantity limit has been exceeded during an upload. Does it get any kind of error? Can we catch it, how to treat it.

    • S3 thinks that the 10K is the last one and assembles the final file
    • Does ocis continue to send parts? How do we handle that. Seems that S3 has no case for "no more parts allowed", see Error Code: NoSuchUpload
    • We should consider a test where we define a small part size (5MB is the smallest) and do a S3 upload to reach the parts quantity limit of 10.000. (Amazon S3 multipart upload limits)
  • Suggest to add two new envvars (instead hardcoding the values)

    • STORAGE_USERS_S3NG_PARTS_MAX_QUANTITY --> defaults to 10000
      (vender specific and subject to change)
    • STORAGE_USERS_S3NG_MAX_OBJECT_SIZE --> defaults to 5TiB
      (vender specific and subject to change)

    With that, we can pre-calculate if an S3 upload will fail and act accordingly before just trying it
    s3_upload_will_fail =
    (STORAGE_USERS_S3NG_PARTS_MAX_QUANTITY * STORAGE_USERS_S3NG_PUT_OBJECT_PART_SIZE / FileSize < 1) or (FileSize > STORAGE_USERS_S3NG_MAX_OBJECT_SIZE)

@kobergj @individual-it
@jvillafanez @micbar

@jvillafanez
Copy link
Member

envvar STORAGE_USERS_S3NG_PUT_OBJECT_PART_SIZE
The envvar text needs an update (I will take care on this when v7 branches have been published.)
Note that I already work on a S3 description update in the admin docs to reflect the referenced issue
We only print a default part size of 0 which is wrong because the underlaying library uses 16MB.
It is unclear if a setting can use human readble values or only a number
(would be great if that exists)

If the part size doesn't have a value (or it's 0), the library will use 16MB as part size. This is the library's decision.
We can set our own default value of 25MB (for example) and forward that to the library.

As far as I know, the value is set as the exact number of bytes, so for 25MB, "26214400" bytes must be set. The value is forwarded to the library without any conversion.

I think this is mostly a product decision. We can document something like "if no value is set or is 0, we'll use the library's default value (16MB)". We also can set our own default value as said above. For the size conversion, for now the value is forwarded, but we can convert the value to bytes if needed.

We need to check what happens on the ocis side when the part quantity limit has been exceeded during an upload. Does it get any kind of error? Can we catch it, how to treat it.
S3 thinks that the 10K is the last one and assembles the final file
Does ocis continue to send parts? How do we handle that. Seems that S3 has no case for "no more parts allowed", see Error Code: NoSuchUpload
We should consider a test where we define a small part size (5MB is the smallest) and do a S3 upload to reach the parts quantity limit of 10.000. (Amazon S3 multipart upload limits)

This is fully handled by the library, oCIS basically sends the file with some options (concurrent uploads, threads, part size...) and the library does its magic.
For this particular scenario, the library stops sending parts after the 10000th and doesn't return any error. There could be other scenarios (there is a config setting to disable concurrent uploads, for example) that the library could return an error under similar conditions (not tested at the moment).

From our side, I think the best option is to document that there is an (additional?) upload limit based on the configured part size. We'd also need to ensure that, if the upload is bigger than the limit, the request fails as soon as possible, preferably before start reading and transferring the file.

Suggest to add two new envvars (instead hardcoding the values)

STORAGE_USERS_S3NG_PARTS_MAX_QUANTITY --> defaults to 10000
(vender specific and subject to change)
STORAGE_USERS_S3NG_MAX_OBJECT_SIZE --> defaults to 5TiB
(vender specific and subject to change)

I don't think the envvars are a good idea. I don't see a good use case to reduce the maximum other than testing, and increasing the maximum will be useless because the limitation is in the server (S3). We can use constants in order to perform some calculations instead (if we want to reject too big files, for example).

We should have a list of "officially supported" S3 servers, which will likely have those limits. If we want to support a S3 server that doesn't have those limits we should either document that our client will follow the regular limits for the new server, or that the limits will be reduced globally in order to accommodate the new server. Another option could be to detect the server and apply the corresponding limit, but it might be complex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants