Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: typed commit info #1207

Merged
merged 5 commits into from
Mar 7, 2023
Merged

feat: typed commit info #1207

merged 5 commits into from
Mar 7, 2023

Conversation

roeap
Copy link
Collaborator

@roeap roeap commented Mar 4, 2023

Description

Another PR on the road to #632 - keeping it a draft, as it is based on #1206

While the commitInfo action is defined as completely optional, spark and delta-rs write at the very least interesting, but often also quite helpful information into the commit info. To make it easier to work with and centralize some conventions, we introduce a CommitInfo struct, that exposes some of the fields at the top level. Additionally we harmonize a bit between spark and delta-rs conventions.

Related Issue(s)

part of #632

Documentation

@github-actions github-actions bot added binding/rust Issues for the Rust crate rust labels Mar 4, 2023
@roeap roeap marked this pull request as ready for review March 6, 2023 11:29
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good improvement. It seems unfortunate the format of commitInfo isn't standardized in the protocol.

Comment on lines 452 to 454
/// Version number the commit corresponds to
#[serde(skip_serializing_if = "Option::is_none")]
pub version: Option<DeltaDataTypeVersion>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant?

Copy link
Collaborator Author

@roeap roeap Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, and nobody actually writes it to the commit info in the log. Initially I was following the spark implementation for the optimistic commits quite closely, where this field is being used. Not sure if we still do though. I'll remove it for now, and bring it back in case its needed in the follow up PR...

pub version: Option<DeltaDataTypeVersion>,
/// Timestamp in millis when the commit was created
#[serde(skip_serializing_if = "Option::is_none")]
pub timestamp: Option<DeltaDataTypeTimestamp>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, it would be cool if we swapped out DeltaDataTypeTimestamp for a type whose Debug / Display implementation is a formatted timestamp rather than an integer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, we may also be able to implement custom serde, so it is parsed as a datetime.

rust/src/action/mod.rs Outdated Show resolved Hide resolved
rust/src/action/mod.rs Outdated Show resolved Hide resolved
rust/src/action/mod.rs Outdated Show resolved Hide resolved
rust/src/action/mod.rs Outdated Show resolved Hide resolved
rust/src/action/mod.rs Outdated Show resolved Hide resolved
rust/src/action/mod.rs Outdated Show resolved Hide resolved
pub(crate) async fn prepare_commit(
storage: &dyn ObjectStore,
operation: &DeltaOperation,
actions: &mut Vec<Action>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed to a &mut instead of just an owned value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted this from the optimistic commit PR where we are using this in a loop and need access to the actions afterwards. That said, having this mutable in that scenario is likely not the best idea, and we need it only to add the commit info. Changed is so we just take a borrowed value now.

@roeap roeap merged commit 1d1a08c into delta-io:main Mar 7, 2023
@roeap roeap deleted the commit-info branch March 7, 2023 15:49
chitralverma pushed a commit to chitralverma/delta-rs that referenced this pull request Mar 17, 2023
# Description

Another PR on the road to delta-io#632 - ~~keeping it a draft, as it is based on
delta-io#1206~~

While the `commitInfo` action is defined as completely optional, spark
and delta-rs write at the very least interesting, but often also quite
helpful information into the commit info. To make it easier to work with
and centralize some conventions, we introduce a `CommitInfo` struct,
that exposes some of the fields at the top level. Additionally we
harmonize a bit between spark and delta-rs conventions.

# Related Issue(s)

part of delta-io#632 

# Documentation

<!---
Share links to useful documentation
--->

---------

Co-authored-by: Will Jones <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate rust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants