Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Multipart upload API to upload files larger than 5 GB #95

Closed
3 tasks done
ptirador opened this issue Oct 13, 2020 · 2 comments · Fixed by #129
Closed
3 tasks done

Use Multipart upload API to upload files larger than 5 GB #95

ptirador opened this issue Oct 13, 2020 · 2 comments · Fixed by #129
Labels
hacktoberfest Pre-selected issues for Hacktoberfest help wanted Extra attention is needed
Milestone

Comments

@ptirador
Copy link
Contributor

ptirador commented Oct 13, 2020

Task Description

The multipart upload API is designed to improve the upload experience for larger objects. You can upload an object in parts. These object parts can be uploaded independently, in any order, and in parallel. You can use a multipart upload for objects from 5 MB to 5 TB in size.

Tasks

The following tasks will need to be carried out:

  • First create a multipart upload with CreateMultipartUploadRequest and get the upload id.
  • Upload all the different parts of the object with the help of UploadPartRequest and CompletedPart.
  • Finally call completeMultipartUpload operation with CompletedMultipartUpload request to tell S3 to merge all uploaded parts and finish the multipart operation.

Task Relationships

This task:

Useful Links

Help

@carlspring carlspring added hacktoberfest Pre-selected issues for Hacktoberfest help wanted Extra attention is needed labels Oct 13, 2020
@ptirador ptirador changed the title Use Multipart upload to upload files larger than 5 GB. Use Multipart upload API to upload files larger than 5 GB. Oct 30, 2020
@carlspring carlspring changed the title Use Multipart upload API to upload files larger than 5 GB. Use Multipart upload API to upload files larger than 5 GB Nov 23, 2020
@ptirador
Copy link
Contributor Author

ptirador commented Dec 5, 2020

There are 2 kinds of APIs to perform this multi-part operation: high-level API and low-level API:
We should use the low-level API when we need to pause and resume multipart uploads, vary part sizes during the upload, or do not know the size of the upload data in advance. When we don't have these requirements, we should use the high-level API.

If we upload files in the S3 bucket using TransferManager(high-level API), which it's easier to integrate, we can upload an InputStream or File, and also we can use multipart upload in TransferManager by configuring MultipartUploadThreshold in TransferManager client. But for this client we need to provide the content-length of the file size in the PutObjectRequest before uploading the file to the S3 bucket. And when dealing with InputStream of a large file we might not get the content-length in the HTTP Response, so in that case, we cannot use TransferManager (high-level API).

So, in summary, I think low-level API seems more flexible and manageable, although it may be not so easy to integrate, but in the long run it is worth it.

cc @carlspring

@carlspring
Copy link
Owner

carlspring commented Dec 5, 2020

Hi,

Thanks for having a preliminary look, @ptirador! I don't know, if you've seen the existing pull requests in the Upplication upstream and whether there's anything we could use as inspiration from there?

@elerch , @markjschreiber,

Do you guys have any suggestions and recommendations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest Pre-selected issues for Hacktoberfest help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants