-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
basic implementation #2
Conversation
self.api_call_attempt_timeout_seconds, | ||
self.access_key.clone(), | ||
self.secret_key.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered passing in the already provisioned client here instead of recreating a new client in sync_chunk_reader
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that I was getting a deadlock if I did that as I think the AWS SDK (which is async) is bound to the runtime it is created in. Because we have to wrap the async
functions with a new runtime to make them execute in the sync_chunk_reader
they need a new client created each time.
Once chunk_reader
is implemented (async
) we can simplify the code significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha got it, it's unfortunate that they don't provide sync version and only supports tokio. But looks like upstream is working on abstracting out the runtime bit. Could you add a comment in the code to mention the reason for this workaround? I think this is something that needs to be refactored in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this comment https://github.com/datafusion-contrib/datafusion-objectstore-s3/pull/2/files#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759R242 but can add more if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's good enough, thank you 👍
Good for me. I was able to test locally and run sql queries. This gives us great baseline for now adding CI, which ill work on, and creating new issues. |
@alamb do you think youll have the chance to check this out today? I was hoping to just use this as a baseline for subsequent improvements which could be discussed in issues / other PRs. |
@matthewmturner I wasn't really planning on reviewing this one (was focusing on getting arrow 7.0.0 ready to go and other stuff for IOx). Do you want me to review it (or are you waiting on someone to merge the PR)? If you plan to work heavily in this repo, perhaps @houqp would be cool with giving you owner rights (we brought it outside the arrow / apache landscape to lower the bar for getting people access I think) |
I'll try and check it out over the next few days -- but please don't wait for me to merge 🚀 ! |
thank you @alamb 🙏 |
I have already given @matthewmturner and @seddonm1 owner access to this repo. Also tagged Andrew just in case he has any blocker comments. I think the repo right now is in a highly experimental state, so we can merge suboptimal solutions and iterate from there. |
let bucket = self.bucket.clone(); | ||
let key = self.file.path.clone(); | ||
|
||
// once the async chunk file readers have been implemented this complexity can be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't wait for async chunk file readers 👍
ObjectStore implementation for the Amazon S3 API
this takes the feedback from the gist: https://gist.github.com/seddonm1/2fb5a6892989fe7bf246022a7bd586ee and adds tests against the standard test suite.
CI could be set up this way.