Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: harmonize and simplify storage configuration #1052

Merged
merged 29 commits into from
Jan 11, 2023

Conversation

roeap
Copy link
Collaborator

@roeap roeap commented Jan 5, 2023

Description

Recently we moved some of our storage configuration via a property bag upstream to the object_store crate. This allows us to simplify our configuration handling here and make S3 configuration consistent with azure and gcp.

I think as a follow up it would be great to migrate dynamodb_lock to using the official SDKs as well, and then see what we still need form the s3 storage options.

Related Issue(s)

closes #999

Documentation

@roeap roeap changed the title refactor: harmonize and simplify storage configuration feat: harmonize and simplify storage configuration Jan 5, 2023
@github-actions github-actions bot added the python label Jan 5, 2023
@github-actions github-actions bot removed the python label Jan 7, 2023
@github-actions github-actions bot added the python label Jan 9, 2023
@@ -31,7 +31,8 @@ def test_read_files(s3_localstack):
@pytest.mark.s3
@pytest.mark.integration
@pytest.mark.timeout(timeout=15, method="thread")
def test_s3_authenticated_read_write(s3_localstack_creds):
def test_s3_authenticated_read_write(s3_localstack_creds, monkeypatch):
monkeypatch.setenv("AWS_DEFAULT_REGION", "us-east-1")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were using the rusoto logic from our S3 storage options, which look in the envornment, then a profile file, and finally just defaults to us-east-1 as a region.

While slightly convenient, i felt this is a bit too much logic for us, and hoping, that the dynamo_lock crate will eventually also move to the official aws sdk, I also did not want to use the rusoto helper.

If this goes too much against the expectations of users, we can of course add some login for this.

The setting could of course also be passed with the storage options :)

Comment on lines -187 to -192
# Fails without any creds
with pytest.raises(PyDeltaTableError):
anon_storage_options = {
key: value for key, value in azurite_creds.items() if "ACCOUNT" not in key
}
write_deltalake(table_path, sample_data, storage_options=anon_storage_options)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure what we did before, but now AZURE_STORAGE_USE_EMULATOR will pick up the default account / key if none are specified - so its hard to create a failing state with the emulator now.

@roeap roeap marked this pull request as ready for review January 9, 2023 23:22
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 9, 2023
rust/src/builder.rs Outdated Show resolved Hide resolved
roeap and others added 2 commits January 10, 2023 01:26
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good cleanup, but have some questions on the purpose of StorageLocation.

rust/src/storage/config.rs Outdated Show resolved Hide resolved
}

/// A parsed URL identifying a storage location
/// for more information on the supported expressions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this meant to be here?

}
}

/// A parsed URL identifying a storage location
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we describe the difference between StorageLocation, object_store::path::Path, and url::Url? I admit I am a bit confused.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all we are getting are some extra methods, what if we defined a trait that was implemented for Url and Path? Would that be sufficient?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageLocation is more or less just a intermediate solution. We had StorageUrl and DeltaStorageConfig. My plan was (and is :)) to get rid of both. Especially since we moved the pre-fixing logic upstream into the PrefixObjectStore, we can get rid of that logic here as well.

The main reason I did not get rid of it in this PR yet is mainly that changes we getting too large and I had some concerns about breaking serialization. SO for now I just got rid of one (and for some reason renamed the other).

More generally speaking though I am thinking about how to most efficiently support absolute file paths in the delta log. In the most general case the log and files could reside in completely different locations (file systems) and I guess there would not even be a guarantee that all files use the same store. Not sure though if this is more fo an esoteric case, or if it is worth the effort to support scenarios like this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see. If we are planning on removing later, that sounds good to me. 👍 Thanks for the explanation.

@wjones127 wjones127 merged commit 3c99d97 into delta-io:main Jan 11, 2023
chitralverma pushed a commit to chitralverma/delta-rs that referenced this pull request Mar 17, 2023
# Description
Recently we moved some of our storage configuration via a property bag
upstream to the object_store crate. This allows us to simplify our
configuration handling here and make S3 configuration consistent with
azure and gcp.

I think as a follow up it would be great to migrate dynamodb_lock to
using the official SDKs as well, and then see what we still need form
the s3 storage options.

# Related Issue(s)

closes delta-io#999

# Documentation

<!---
Share links to useful documentation
--->

Co-authored-by: Will Jones <[email protected]>
@roeap roeap deleted the config branch April 7, 2023 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
delta-rs-crate documentation Improvements or additions to documentation python rust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide case-insensitive storage options in backend
2 participants