Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment/dependencies graph #364

Merged

Conversation

zaychenko-sergei
Copy link
Contributor

No description provided.

Removed dependency query from DatasetRepository.
Integrated dependencies graph into GraphQL queries for upstream/downstream links.
Integrated dependencies graph into dataset deletion.
Reacting on `DatasetCreated` events.
@zaychenko-sergei zaychenko-sergei changed the base branch from master to experiment/dataset-update-flow December 13, 2023 21:52
@zaychenko-sergei zaychenko-sergei marked this pull request as ready for review December 14, 2023 15:06
@zaychenko-sergei zaychenko-sergei merged commit 56cfbd6 into experiment/dataset-update-flow Dec 14, 2023
4 of 5 checks passed
@zaychenko-sergei zaychenko-sergei deleted the experiment/dependencies-graph branch December 14, 2023 15:08
zaychenko-sergei added a commit that referenced this pull request Dec 14, 2023
* Simplest startup job to initialize dependencies graph.
* Removed dependency query from DatasetRepository.
* Integrated dependencies graph into GraphQL queries for upstream/downstream links.
* Integrated dependencies graph into dataset deletion.
* Reacting on `DatasetCreated` events.
* Implemented reaction of dependencies graph on changes in dataset inputs
* Implemented lazy vs eager dependencies initialization
zaychenko-sergei added a commit that referenced this pull request Dec 18, 2023
DatasetUpdateFlow => UpdateSchedule.
Added Update aggregate, representing a single instance of update process for a given dataset.

Formatter fix

Review: renamings

Review: externalized time source for event-sourcing aggregates

Review: compacted task events in update flow

Review: added update cancellation (before tasks scheduled)

Review: accepted suggestions to name delay reasons as start conditions, and secondary triggers as simply adding triggers

Merge corrections

In-memory implementation of repositories

Drafted update scheduler service.
ES: support optional aggregate loads.

Sketched `UpdateService` without tasks scheduling yet

Scheduler steps:
 - schedule update task on manual trigger
 - read all active auto-schedules at the beginning of the run process
 - react on schedule change events: update table of active schedules

Separate set in each in-memory event repository: quick return of query objects

Drafted Time Wheel concept

Connected time wheel and update service: initial scheduling and run loop

Drafted in-memory dependency graph service (based on petgraph library).
Scheduling downstream datasets when dataset update completes, respecting throttling period logic

Enqueue next auto-polling root dataset update when current update succeeds

Prototyped EventBus + added 1st demo link for schedule modification event

Minor event structure fixes

Simplified UpdateSchedule events

Connected task finish and dataset removal events.
Large DI changes in existing tests to support EventBus dependency

Concurrent execution of event handlers in dispatcher

Converted event bus handlers to traits. Registering handlers in the catalog.

Merge corrections

Review: renamed DatasetDeleted event

Review: avoid excessive events cloning

Shifted down 'get_queries' to update schedule's event store only

Async event handler combiner now collects all handlers results, before reducing the error for reporting

Resolved basic code review notes

Added `get_last_update` by dataset operation

Formatter fixed

dill 0.8 - replaced `builder_for` on `Component::builder()`

Integrated `dill::interface` feature and removed many explicit binds

Review: renamed task event classes, enum for dataset events

Review: reworked relevance of update schedule, using statuses instead (active, paused, stopped)

Review: allow update schedules to be re-added after dataset reincarnation with the same ID

Review: a few TODOs on performance improvements

Review: reimplemented TimeWheel using binary heap

Review: removed `pause` and `resume` methods in `UpdateSchedule` aggregate, use `set_schedule` only

Renamed update schedules => update configurations

Separated schedules and start conditions in update configurations

Generalized dataset flow configurations

System vs Dataset flow configurations

Not very smart, but a model of System and Dataset flows.
Scheduling service largely not implemented yet.

Generic-based flow events, state

Refactored flow configurations aggregate to generic events/state similarly

Code reuse approach for flow/flow-config aggregates based on trait extensions

Attempts to generalize flow configuration services (at least traits).
Folder reorganization in interface and in-mem crate.

Implemented generic in-memory event stores and integrated them into all current aggregates

Implemented SystemFlow in-memory repository

Implemented in-memory Flow service for all kinds of flows

Decomposing Flow service: extracted ActiveConfigsState

Decomposing Flow service: extracted PendingFlowsState

Compacted DatasetFlow & SystemFlow into Flow

Similarly compacted FlowConfiguration aggregate

Simplifications in FlowService

Review: 'flow-system' and 'flow-system-inmem' are final names

Review: 'flow-system' and 'flow-system-inmem' are final names

Review: improved enum all-value iteration methods in flow types

Review: specific => of_type

Review: removed duplicate OwnedDatasetFlowKey

Review: killed redundand feature flags

Review: tracing without formatting

Moved `DependencyGraphService` to core domain

Review: removed reundand field

Experiment/dependencies graph (#364)

* Simplest startup job to initialize dependencies graph.
* Removed dependency query from DatasetRepository.
* Integrated dependencies graph into GraphQL queries for upstream/downstream links.
* Integrated dependencies graph into dataset deletion.
* Reacting on `DatasetCreated` events.
* Implemented reaction of dependencies graph on changes in dataset inputs
* Implemented lazy vs eager dependencies initialization

Merge corrections

Test fix

Review: minor renamings
zaychenko-sergei added a commit that referenced this pull request Dec 20, 2023
DatasetUpdateFlow => UpdateSchedule.
Added Update aggregate, representing a single instance of update process for a given dataset.

Formatter fix

Review: renamings

Review: externalized time source for event-sourcing aggregates

Review: compacted task events in update flow

Review: added update cancellation (before tasks scheduled)

Review: accepted suggestions to name delay reasons as start conditions, and secondary triggers as simply adding triggers

Merge corrections

In-memory implementation of repositories

Drafted update scheduler service.
ES: support optional aggregate loads.

Sketched `UpdateService` without tasks scheduling yet

Scheduler steps:
 - schedule update task on manual trigger
 - read all active auto-schedules at the beginning of the run process
 - react on schedule change events: update table of active schedules

Separate set in each in-memory event repository: quick return of query objects

Drafted Time Wheel concept

Connected time wheel and update service: initial scheduling and run loop

Drafted in-memory dependency graph service (based on petgraph library).
Scheduling downstream datasets when dataset update completes, respecting throttling period logic

Enqueue next auto-polling root dataset update when current update succeeds

Prototyped EventBus + added 1st demo link for schedule modification event

Minor event structure fixes

Simplified UpdateSchedule events

Connected task finish and dataset removal events.
Large DI changes in existing tests to support EventBus dependency

Concurrent execution of event handlers in dispatcher

Converted event bus handlers to traits. Registering handlers in the catalog.

Merge corrections

Review: renamed DatasetDeleted event

Review: avoid excessive events cloning

Shifted down 'get_queries' to update schedule's event store only

Async event handler combiner now collects all handlers results, before reducing the error for reporting

Resolved basic code review notes

Added `get_last_update` by dataset operation

Formatter fixed

dill 0.8 - replaced `builder_for` on `Component::builder()`

Integrated `dill::interface` feature and removed many explicit binds

Review: renamed task event classes, enum for dataset events

Review: reworked relevance of update schedule, using statuses instead (active, paused, stopped)

Review: allow update schedules to be re-added after dataset reincarnation with the same ID

Review: a few TODOs on performance improvements

Review: reimplemented TimeWheel using binary heap

Review: removed `pause` and `resume` methods in `UpdateSchedule` aggregate, use `set_schedule` only

Renamed update schedules => update configurations

Separated schedules and start conditions in update configurations

Generalized dataset flow configurations

System vs Dataset flow configurations

Not very smart, but a model of System and Dataset flows.
Scheduling service largely not implemented yet.

Generic-based flow events, state

Refactored flow configurations aggregate to generic events/state similarly

Code reuse approach for flow/flow-config aggregates based on trait extensions

Attempts to generalize flow configuration services (at least traits).
Folder reorganization in interface and in-mem crate.

Implemented generic in-memory event stores and integrated them into all current aggregates

Implemented SystemFlow in-memory repository

Implemented in-memory Flow service for all kinds of flows

Decomposing Flow service: extracted ActiveConfigsState

Decomposing Flow service: extracted PendingFlowsState

Compacted DatasetFlow & SystemFlow into Flow

Similarly compacted FlowConfiguration aggregate

Simplifications in FlowService

Review: 'flow-system' and 'flow-system-inmem' are final names

Review: 'flow-system' and 'flow-system-inmem' are final names

Review: improved enum all-value iteration methods in flow types

Review: specific => of_type

Review: removed duplicate OwnedDatasetFlowKey

Review: killed redundand feature flags

Review: tracing without formatting

Moved `DependencyGraphService` to core domain

Review: removed reundand field

Experiment/dependencies graph (#364)

* Simplest startup job to initialize dependencies graph.
* Removed dependency query from DatasetRepository.
* Integrated dependencies graph into GraphQL queries for upstream/downstream links.
* Integrated dependencies graph into dataset deletion.
* Reacting on `DatasetCreated` events.
* Implemented reaction of dependencies graph on changes in dataset inputs
* Implemented lazy vs eager dependencies initialization

Merge corrections

Test fix

Review: minor renamings
zaychenko-sergei added a commit that referenced this pull request Dec 22, 2023
DatasetUpdateFlow => UpdateSchedule.
Added Update aggregate, representing a single instance of update process for a given dataset.

Formatter fix

Review: renamings

Review: externalized time source for event-sourcing aggregates

Review: compacted task events in update flow

Review: added update cancellation (before tasks scheduled)

Review: accepted suggestions to name delay reasons as start conditions, and secondary triggers as simply adding triggers

Merge corrections

In-memory implementation of repositories

Drafted update scheduler service.
ES: support optional aggregate loads.

Sketched `UpdateService` without tasks scheduling yet

Scheduler steps:
 - schedule update task on manual trigger
 - read all active auto-schedules at the beginning of the run process
 - react on schedule change events: update table of active schedules

Separate set in each in-memory event repository: quick return of query objects

Drafted Time Wheel concept

Connected time wheel and update service: initial scheduling and run loop

Drafted in-memory dependency graph service (based on petgraph library).
Scheduling downstream datasets when dataset update completes, respecting throttling period logic

Enqueue next auto-polling root dataset update when current update succeeds

Prototyped EventBus + added 1st demo link for schedule modification event

Minor event structure fixes

Simplified UpdateSchedule events

Connected task finish and dataset removal events.
Large DI changes in existing tests to support EventBus dependency

Concurrent execution of event handlers in dispatcher

Converted event bus handlers to traits. Registering handlers in the catalog.

Merge corrections

Review: renamed DatasetDeleted event

Review: avoid excessive events cloning

Shifted down 'get_queries' to update schedule's event store only

Async event handler combiner now collects all handlers results, before reducing the error for reporting

Resolved basic code review notes

Added `get_last_update` by dataset operation

Formatter fixed

dill 0.8 - replaced `builder_for` on `Component::builder()`

Integrated `dill::interface` feature and removed many explicit binds

Review: renamed task event classes, enum for dataset events

Review: reworked relevance of update schedule, using statuses instead (active, paused, stopped)

Review: allow update schedules to be re-added after dataset reincarnation with the same ID

Review: a few TODOs on performance improvements

Review: reimplemented TimeWheel using binary heap

Review: removed `pause` and `resume` methods in `UpdateSchedule` aggregate, use `set_schedule` only

Renamed update schedules => update configurations

Separated schedules and start conditions in update configurations

Generalized dataset flow configurations

System vs Dataset flow configurations

Not very smart, but a model of System and Dataset flows.
Scheduling service largely not implemented yet.

Generic-based flow events, state

Refactored flow configurations aggregate to generic events/state similarly

Code reuse approach for flow/flow-config aggregates based on trait extensions

Attempts to generalize flow configuration services (at least traits).
Folder reorganization in interface and in-mem crate.

Implemented generic in-memory event stores and integrated them into all current aggregates

Implemented SystemFlow in-memory repository

Implemented in-memory Flow service for all kinds of flows

Decomposing Flow service: extracted ActiveConfigsState

Decomposing Flow service: extracted PendingFlowsState

Compacted DatasetFlow & SystemFlow into Flow

Similarly compacted FlowConfiguration aggregate

Simplifications in FlowService

Review: 'flow-system' and 'flow-system-inmem' are final names

Review: 'flow-system' and 'flow-system-inmem' are final names

Review: improved enum all-value iteration methods in flow types

Review: specific => of_type

Review: removed duplicate OwnedDatasetFlowKey

Review: killed redundand feature flags

Review: tracing without formatting

Moved `DependencyGraphService` to core domain

Review: removed reundand field

Experiment/dependencies graph (#364)

* Simplest startup job to initialize dependencies graph.
* Removed dependency query from DatasetRepository.
* Integrated dependencies graph into GraphQL queries for upstream/downstream links.
* Integrated dependencies graph into dataset deletion.
* Reacting on `DatasetCreated` events.
* Implemented reaction of dependencies graph on changes in dataset inputs
* Implemented lazy vs eager dependencies initialization

Merge corrections

Test fix

Review: minor renamings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants