Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic implementation #2

Merged
merged 1 commit into from
Jan 7, 2022
Merged

basic implementation #2

merged 1 commit into from
Jan 7, 2022

Conversation

seddonm1
Copy link
Collaborator

@seddonm1 seddonm1 commented Jan 5, 2022

ObjectStore implementation for the Amazon S3 API

this takes the feedback from the gist: https://gist.github.com/seddonm1/2fb5a6892989fe7bf246022a7bd586ee and adds tests against the standard test suite.

CI could be set up this way.

Comment on lines +172 to +174
self.api_call_attempt_timeout_seconds,
self.access_key.clone(),
self.secret_key.clone(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered passing in the already provisioned client here instead of recreating a new client in sync_chunk_reader?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that I was getting a deadlock if I did that as I think the AWS SDK (which is async) is bound to the runtime it is created in. Because we have to wrap the async functions with a new runtime to make them execute in the sync_chunk_reader they need a new client created each time.

Once chunk_reader is implemented (async) we can simplify the code significantly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha got it, it's unfortunate that they don't provide sync version and only supports tokio. But looks like upstream is working on abstracting out the runtime bit. Could you add a comment in the code to mention the reason for this workaround? I think this is something that needs to be refactored in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's good enough, thank you 👍

@houqp houqp requested review from matthewmturner and alamb January 6, 2022 00:52
@matthewmturner
Copy link
Collaborator

Good for me. I was able to test locally and run sql queries.

This gives us great baseline for now adding CI, which ill work on, and creating new issues.

@matthewmturner
Copy link
Collaborator

@alamb do you think youll have the chance to check this out today? I was hoping to just use this as a baseline for subsequent improvements which could be discussed in issues / other PRs.

@alamb
Copy link

alamb commented Jan 6, 2022

@alamb do you think youll have the chance to check this out today? I was hoping to just use this as a baseline for subsequent improvements which could be discussed in issues / other PRs.

@matthewmturner I wasn't really planning on reviewing this one (was focusing on getting arrow 7.0.0 ready to go and other stuff for IOx). Do you want me to review it (or are you waiting on someone to merge the PR)?

If you plan to work heavily in this repo, perhaps @houqp would be cool with giving you owner rights (we brought it outside the arrow / apache landscape to lower the bar for getting people access I think)

@matthewmturner
Copy link
Collaborator

@alamb was just pinging you as @houqp had requested review from you. I'm definitely happy to run with this I've already merged a couple commits.

@houqp anything in particular you wanted to be reviewed?

@alamb
Copy link

alamb commented Jan 6, 2022

I'll try and check it out over the next few days -- but please don't wait for me to merge 🚀 !

@matthewmturner
Copy link
Collaborator

I'll try and check it out over the next few days -- but please don't wait for me to merge 🚀 !

thank you @alamb 🙏

@houqp
Copy link
Member

houqp commented Jan 7, 2022

I have already given @matthewmturner and @seddonm1 owner access to this repo. Also tagged Andrew just in case he has any blocker comments. I think the repo right now is in a highly experimental state, so we can merge suboptimal solutions and iterate from there.

@matthewmturner matthewmturner merged commit 39caa37 into datafusion-contrib:main Jan 7, 2022
let bucket = self.bucket.clone();
let key = self.file.path.clone();

// once the async chunk file readers have been implemented this complexity can be removed
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't wait for async chunk file readers 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants