Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating the log from the state #141

Closed
dispanser opened this issue Mar 20, 2021 · 7 comments
Closed

Separating the log from the state #141

dispanser opened this issue Mar 20, 2021 · 7 comments
Labels
binding/rust Issues for the Rust crate enhancement New feature or request

Comments

@dispanser
Copy link
Contributor

In the current architecture, the concepts of the delta table and the delta log are available in one single abstraction, DeltaTable. To update to the newest state, one can call update() and the table state follows accordingly.

However, I can see several use cases, mostly around stream processing, that are not actually interested in the current table state, but instead in the stream of changes. For such a use case, subscribing to a stream of actions would probably provide the better abstraction.

Ultimately, DeltaTable is just a representation of the aggregated state of the log up to a specific commit, so it would just be another subscriber to the delta log changes.

I wonder if it would make sense to have a first-class citizen, DeltaLog, exposing commits as a stream of batches of actions.

@houqp
Copy link
Member

houqp commented Mar 20, 2021

I think that's a good idea for read only streaming use-case :) The log stream abstraction can expose both pull and push based interfaces. Then the existing DeltaTable implementation can be refactored into a pull based consumer. Read-only streaming consumers can use the push interface to react to new log entries in near realtime.

@houqp houqp added binding/rust Issues for the Rust crate enhancement New feature or request labels Mar 20, 2021
@dispanser
Copy link
Contributor Author

A pull-based interface would probably be good enough as a start. With a push-based interface, someone has to poll the underlying file system anyway (assuming something like inotify is not practical for cloud-based storage backends), and I would leave that to the consumer.

I'll play with this idea a bit to see if it leads anywhere.

@houqp
Copy link
Member

houqp commented Mar 22, 2021

Just to make sure we are talking about pull/push at the same abstraction level. In my first comment, I was referring to pull and push interfaces at the application level. For example, a pull interface, or API if you prefer, means the application code needs to explicitly call update method to pull latest changes from the transaction log at its own pace. While a push interface would let the application register callbacks or set up an event loop to process new log entries as them come in.

When it comes to actually implementing the push interface, we also have the choice of leveraging push or pull semantics at the storage level. For example, for local file system backend on Linux platform, we could leverage inotify to achieve real time push implementation. For backends like S3, we would need to actually pull new values from latest_version S3 object to simulate the push. There is also the option of using S3 event notification to achieve real time push with S3, but that's another topic ;)

@nfx
Copy link
Contributor

nfx commented May 24, 2021

@dispanser Azure Databricks has "autoloader" feature, that resembles inotify behavior. Did you already try that?

@nfx
Copy link
Contributor

nfx commented May 24, 2021

@houqp there's autoloader on aws as well :)

@roeap
Copy link
Collaborator

roeap commented Sep 7, 2022

related #661

@rtyler
Copy link
Member

rtyler commented Jan 6, 2024

I think the LogStore work that's been done recently I am going to close this out

@rtyler rtyler closed this as completed Jan 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants