Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation: concurrent writes for non-S3 backends #2556

Closed
inigohidalgo opened this issue May 31, 2024 · 9 comments · Fixed by #2558
Closed

documentation: concurrent writes for non-S3 backends #2556

inigohidalgo opened this issue May 31, 2024 · 9 comments · Fixed by #2558
Labels
enhancement New feature or request

Comments

@inigohidalgo
Copy link
Contributor

Currently the documentation has a section that details a locking mechanism is needed for S3 to enable concurrent writing.

https://delta-io.github.io/delta-rs/usage/writing/writing-to-s3-with-locking-provider/

There are various mentions of concurrency throughout the docs, but having read through the docs a while ago I was left with the impression that "concurrent writing is only supported on S3, and you need a DynamoDB locking mechanism" when, in reality, concurrent writing is supported by default on (at least one) backends #2069 without needing that locking provider.

This is probably just my own lack of understanding of the delta protocol, but I think it would be good to make this clearer in the documentation, that concurrency is supported by default, and only S3 needs the locking mechanism.

@inigohidalgo inigohidalgo added the enhancement New feature or request label May 31, 2024
@ion-elgreco
Copy link
Collaborator

@inigohidalgo feel free to open a PR to make a change to the docs, contributions are always welcome! :)

@inigohidalgo
Copy link
Contributor Author

Sounds good. Is it safe to assume that all the backends listed here other than AWS support this by default?

I will probably just reword the write_deltalake docstring as that is what initially tripped me up, and add a small note at the start of this page https://delta-io.github.io/delta-rs/usage/writing/writing-to-s3-with-locking-provider/

@wjones127
Copy link
Collaborator

Is it safe to assume that all the backends listed here other than AWS support this by default?

Yup. Though IIRC, Minio, Cloudflare, and other S3-compatible stores will have the same issue, even though we actually could enable it for some of them.

@ion-elgreco
Copy link
Collaborator

@wjones127 Cloudflare R2 actually supports copy if not exists with custom headers, which we are able to pass through. But that's the only exception for an S3 implementation afaik

@wjones127
Copy link
Collaborator

wjones127 commented May 31, 2024

Cloudflare R2 actually supports copy if not exists with custom headers, which we are able to pass through

Ah cool. I know R2 and Minio support using custom headers, but didn't know we had already implemented the proper pass through for R2. Do we have support for Minio as well then?

@ion-elgreco
Copy link
Collaborator

Cloudflare R2 actually supports copy if not exists with custom headers, which we are able to pass through

Ah cool. I know R2 and Minio support using custom headers, but didn't know we had already implemented the proper pass through for R2. Do we have support for Minio as well then?

Hmm I didn't know minio supported custom headers, then I guess it should work as well (but can't confirm)

ion-elgreco pushed a commit that referenced this issue Jun 1, 2024
- It was unclear to me that concurrent writing was available by default
for non-S3 backends, so I am making the language clearer.
- I have also added an extra section showing that R2 and maybe MinIO can
enable concurrent writing
- Fixed a couple of unrelated formatting issues in the page I edited

closes #2556 

#2069 also had the same confusion
@rkunnamp
Copy link

rkunnamp commented Jul 6, 2024

Cloudflare R2 actually supports copy if not exists with custom headers, which we are able to pass through

Ah cool. I know R2 and Minio support using custom headers, but didn't know we had already implemented the proper pass through for R2. Do we have support for Minio as well then?

@wjones127 Do you know if there is any header that could be used for minio, for something similar to cf-copy-destination-if-none-match of R2 ?

for reference : the full list of headers supported by minio is here https://github.com/minio/minio/blob/cf371da346199ba39cbbdfbce37d7cfa0b594b85/internal/http/headers.go#L87

@ion-elgreco
Copy link
Collaborator

Cloudflare R2 actually supports copy if not exists with custom headers, which we are able to pass through

Ah cool. I know R2 and Minio support using custom headers, but didn't know we had already implemented the proper pass through for R2. Do we have support for Minio as well then?

@wjones127 Do you know if there is any header that could be used for minio, for something similar to cf-copy-destination-if-none-match of R2 ?

for reference : the full list of headers supported by minio is here https://github.com/minio/minio/blob/cf371da346199ba39cbbdfbce37d7cfa0b594b85/internal/http/headers.go#L87

    AmzCopySourceIfNoneMatch = "x-amz-copy-source-if-none-match" 

This one looks to me what you are looking for

@rkunnamp
Copy link

rkunnamp commented Jul 6, 2024

@ion-elgreco 'x-amz-copy-source-if-none-match' is supported by amazon s3 as well . If that could be used , then probably we do not need a locking provider for amazon s3 as well. So I am assuming that 'x-amz-copy-source-if-none-match' could not be used for getting around the locking requirement

as for R2 , R2 doc (https://developers.cloudflare.com/r2/api/s3/extensions/) reads

CopyObject already supports conditions that relate to the source object through the x-amz-copy-source-if-... headers as part of our compliance with the S3 API. In addition to this, R2 supports an R2 specific set of headers that allow the CopyObject operation to be conditional on the target object:cf-copy-destination-if-match
cf-copy-destination-if-none-match
cf-copy-destination-if-modified-since
cf-copy-destination-if-unmodified-since

Looking at minio code (https://github.com/minio/minio/blob/cf371da346199ba39cbbdfbce37d7cfa0b594b85/cmd/object-handlers-common.go), the copy object handler seems to be supporting only the headers supported by s3

However put object handler (whose code is in the same file linked above) seems to be supporting 'If-None-Match' . I was wondering if that can be used to get around the locking requirements of minio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants