Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot open a deltatable in S3 using AWS_PROFILE based credentials from a local machine #855

Closed
wouove opened this issue Sep 29, 2022 · 4 comments · Fixed by #986
Closed
Labels
bug Something isn't working

Comments

@wouove
Copy link

wouove commented Sep 29, 2022

Environment

Delta-rs version: 0.6.1

Binding: Python (using 3.10.3)

Environment:

  • Cloud provider: AWS
  • OS: MacOs
  • Other: AWS authentication via SSO.

Bug

What happened:
While trying to open a Delta Lake table from s3, I got the following error:

deltalake.PyDeltaTableError: Failed to load checkpoint: Failed to read checkpoint content: Generic S3 error: response error "request error", after 0 retries: error sending request for url (http://169.254.169.254/latest/api/token): error trying to connect: tcp connect error: Operation timed out (os error 60)

I realise this bug is a lot like the one reported in issue 854. Still, I'd like to report it separately, as the hardware and authentication flow is different in this issue.

What you expected to happen:
Credentials should be accessible from the AWS profile, which should be set by an env. variable AWS_PROFILE = "Your profile". The credentials are set for a profile once starting an AWS session through SSO.
I expected the code to have used this profile to get the AWS credentials. When using any method from boto3, to interact with AWS' APIs from Python, the credentials are taken from the profile. Hence, I expect the script to finish and just print the latest version of the table.

How to reproduce it:
Code:

from deltalake import DeltaTable
uri = "s3://bucket/key"
storage_options = {}
dt = DeltaTable(uri, storage_options=storage_options)
print(dt.version())

More details:
The only work-around was in issue 854, which is kinda ugly.

from deltalake import DeltaTable
uri = "s3://bucket/key"
storage_options = {}
storage_options["AWS_REGION"] = "your region"
storage_options["AWS_ACCESS_KEY_ID"] = "your key"
storage_options["AWS_SECRET_ACCESS_KEY"] = "your secret key"
storage_options["AWS_SESSION_TOKEN"] = "your token"
dt = DeltaTable(uri, storage_options=storage_options)
print(dt.version())
@wouove wouove added the bug Something isn't working label Sep 29, 2022
@wjones127
Copy link
Collaborator

wjones127 commented Sep 30, 2022

The underlying object store implementation does not plan on supporting this. It doesn't depend on the official AWS crates and doesn't want to re-implement something that complex, which is reasonable. apache/arrow-rs#2178

But perhaps for the Python bindings, we could have optional integration with boto3?

Implementation would be something like: if you don't pass in storage_options, but are using S3, get credentials from boto3:

from boto3 import Session

session = Session()
credentials = session.get_credentials()

(source)

Does that seem reasonable?

@houqp
Copy link
Member

houqp commented Sep 30, 2022

I really think we should have native SSO implementation since it's such a common development workflow. Doing it in python through boto3 is a reasonable short term workaround for python users, so I think we should do that. It would be even better if we can do this in the Rust core.

We have had many AWS auth issues reported since we switched to objectstore-rs, looks like its auth implementation is a bit broken :*(

@wouove have you tried downgrading to version 0.5.x?

@wouove
Copy link
Author

wouove commented Sep 30, 2022

Hi all,

Thank you for your quick replies :)
@wjones127 Your proposed solution works as well indeed. Let me implement that, it is a bit cleaner than what I used. In this way, it can you my aws profile.

from deltalake import DeltaTable
from boto3 import Session

session = Session()
credentials = session.get_credentials()
current_credentials = credentials.get_frozen_credentials()
storage_options = {}
uri = "s3://bucket/key"
storage_options = {}
storage_options["AWS_ACCESS_KEY_ID"] = current_credentials.access_key
storage_options["AWS_SECRET_ACCESS_KEY"] = current_credentials.secret_key
storage_options["AWS_SESSION_TOKEN"] = current_credentials.token
dt = DeltaTable(uri, storage_options=storage_options)
print(dt.version())

@houqp Downgrading to 0.5.x did not work, I got ModuleNotFoundError: No module named 'typing_extensions'.

@tustvold
Copy link

Proposed upstream workaround - apache/arrow-rs#2891

TBC I would still recommend aws-vault over using this feature, but this functionality may serve as an optional escape valve.

fvaleye added a commit that referenced this issue Dec 9, 2022
# Description
- Add the support of `AWS_PROFILE` environment variable for AWS S3
- Bump version of `object_store` to `0.5.2` 

# Related Issue(s)
- relate to apache/arrow-rs#3229
- relate to apache/arrow-rs#2891
- closes #855
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants