[RFC] Declarative Automation and the future of Auto-materialize Policies #22811

OwenKephart · 2024-07-02T16:00:44Z

OwenKephart
Jul 2, 2024
Maintainer

Introduction

In Dagster 1.8, we are releasing the successor to the AutoMaterializePolicy system, which we’re calling “Declarative Automation”.

We have several goals with these changes:

Flexibility and customization : Setting up assets to materialize in response to specific conditions should be simple and intuitive. Users should not have to “settle” for @schedule or @sensor to work around limitations of the asset-focused API.
Explainability and transparency: Users can understand what’s happening with the system, both from a high level overview as well as at a granular per-asset level.
Controllable blast radius: Users can be confident that they can control the feature and understand its failure conditions.
Testability: Users can write unit tests and confirm their automations function as expected under a variety of circumstances.

Docs for Declarative Automation can be found here.

Context

Over the past year and half, we’ve seen many users adopt the AutoMaterializePolicy abstraction, and find success with an asset-based orchestration model. However, there have been some persistent themes in the feedback regarding the current system.

“It’s challenging to customize”

When the system was first released, there were only two policies available to the user: AutoMaterializePolicy.eager() and AutoMaterializePolicy.lazy(). As more users adopted the system, it was clear that these options were too course-grained to capture many common use cases.

The advent of the AutoMaterializeRule system formalized some of the logic internal to the evaluation system, and exposed it to the user. This helped provide some levers for customization, but we’ve learned that this is fundamentally insufficient for achieving our vision for this product area.

Flexibility and customization: The rules-based system itself is impossible to configure for certain use cases, and generally is not composable. Any user with a use case that even slightly diverges from the well-traveled paths is stuck unless we added a new parameter to a rule, or a new rule entirely. The core primitives don’t stack well with each other, meaning each new rule that is added is highly specialized and has limited scope. To provide users with a truly customizable system, it was necessary to rethink the core abstractions.
Explainability and transparency: Building off of the above points, at a core level, the rules are forced to do much more than their names imply, causing great confusion and uncertainty. Under the hood, a rule such as AutoMaterializeRule.materialize_on_missing() needs to contain complex logic for handling state transitions to ensure that (e.g.) an asset doesn’t continually get requested if the previously requested run failed. This level of complexity makes it nearly impossible to build an intuition as to how the system is functioning, as it essentially “hides” critical inputs that the system is using to make its decisions.

“It’s challenging to operate at scale”

As another quick history lesson, the original AutoMaterializePolicy.eager() had no rate-limiting behavior at all. This meant that if you were to add a new partitioned asset with an eager policy to your code location, AMP would attempt to launch a run for every single partition of the asset (in essence, a “surprise backfill”). This particular issue was resolved with the addition of the max_materializations_per_minute parameter (which defaults to 1), but illustrates the general category of problem where a seemingly small change can result in a huge impact.

At a high level, users need to be able to be confident that their changes will have a defined and limited scope, and we believe we can do significantly better here.

Controllable blast radius:
- Because all policies are (by default) evaluated by a single centralized daemon, any error or slowdown in one part of the asset graph can grind all computation to a halt across an entire deployment. This means that disparate teams cannot take ownership of their own sections of the graph — there’s no way to independently fix, pause, or modify the way your assets are evaluated without impacting all other users. Users need some way of isolating independent evaluations at an operational level.
- Tying in with some of the points above, it needs to be possible to craft policies in ways that make strong guarantees about the situations where an asset can be materialized. The ability to put a blanket rate limit on an asset is helpful towards this goal, but is not sufficient.
Testability: At a certain level of complexity, it’s essential to be able to simulate how a policy will react to certain scenarios. Complicated dependency structures and partition mapping schemes make it hard to predict how a system of interconnected policies might react to specific stimuli, and the best way of ensuring correct behavior is through unit testing. We view testing as critical to building trust in an automation system.

Introducing: Declarative Automation

Declarative Automation is the term we are using to describe the new suite of interfaces we’ve designed to address the weaknesses of the Auto-materialization system. This term is intentionally general — we plan on expanding this system over time to support automating things such as asset observations and asset checks in addition to materializations.

There will be no breaking changes to AutoMaterializePolicy made in the Dagster 1.8 release and all existing code will continue to function, but the AutoMaterializePolicy and AutoMaterializeRule interfaces will be marked as deprecated. We will continue to support these APIs until at least Q1 2025, and are open to feedback on timeline.

However, we believe Declarative Automation will provide a vastly superior experience to current-day APIs, and will generally provide a superset of the capabilities offered today.

Automation Conditions

The core primitive of Declarative Automation is the AutomationCondition, which encodes a particular state that an asset may be in.

Conditions can be combined together using a variety of operators to build more complex expressions, allowing you to precisely describe the conditions under which an asset ought to be materialized.

Similarly to AMPs, you’ll be able to attach a condition to an asset as follows:

@asset(automation_condition=AutomationCondition.eager())
def my_asset(): ...

In common cases, you will not need to manually mix and match conditions, and can instead use one of the three “out-of-the-box” policies:

AutomationCondition.eager()

This policy is intended to be a drop-in replacement for the current-day AutoMaterializePolicy.eager(), and replicates its behavior with a couple key exceptions.

First, it is purely forward-looking. This means that it only reacts to events which happen after the policy is added to the asset, and will (by default) only materialize the latest partition of a time-partitioned asset. This handles the “surprise backfill” problem more elegantly than the max_materializations_per_minute parameter of today’s world.

Secondly, it will drop the skip_on_parent_outdated rule. This is a fairly complex bit of logic which prevents materializations in cases where ancestors are in an “unsynced” state. This requires recursing up the entire asset graph, which means that events happening far upstream can prevent assets from materializing in a timely manner. This was one of the more common rules for users to manually disable. In some sense, the main purpose of this rule was to prevent materializations when we knew a parent was going to get materialized again in the near future (under the assumption that all assets in the graph were on the “eager” policy). This functionality will be replaced by logic that prevents materializing the asset if any of its parents are currently in progress.

The net result of these changes is a policy which is simpler to understand, and aligns more closely with what we’ve observed people to expect the behavior to be.

AutomationCondition.on_cron(”@daily”)

One of the more common things we’ve observed users doing with the AutoMaterializeRule system is to create a sort of “distributed cron schedule”. Let’s take the following example:

Here, if we took the naive approach, and simply materialized the upstream assets exactly on the hour, and the downstream asset exactly every 3 hours, then the downstream asset would get kicked off before its parents had time to complete, and so would perpetually be executing on old data.

This issue can be solved by attempting to materialize each asset once per cron schedule tick, but only after all of its parent assets have updated since that tick. This lets you independently set cron schedules on individual assets without needing to worry about the specific cadences of its upstreams.

In the past, this would look like:

cron_policy = AutoMaterializePolicy.eager().with_rules(
    AutoMaterializeRule.materialize_on_cron("@daily"),
    AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron("@daily"),
).without_rules(
    AutoMaterializeRule.materialize_on_parent_updated(),
)

@asset(auto_materialize_policy=cron_policy)
def my_asset(): ...

We are now elevating this as a first-class use case with its own dedicated static constructor:

@asset(automation_condition=AutomationCondition.on_cron("@daily"))
def my_asset(): ...

AutomationCondition.any_downstream_conditions()

This is intended to serve as a replacement for and generalization of the existing AutoMaterializePolicy.lazy().

One of the core benefits of the original freshness-based scheduling system was the ability to materialize certain assets only when they are needed to satisfy downstream policies, rather than requiring you to set an explicit policy on each individual asset.

However, the way to achieve this with an AutoMaterializePolicy is highly-coupled with the (now-deprecated)FreshnessPolicy system. The idea was that you could simply define a target freshness and the system would automatically do what it needed to do to adhere to that requirement. While nice in theory, the fundamental issue is that there is an infinite spectrum of ways to satisfy any given FreshnessPolicy, with different tradeoffs between reducing the number of executions and reducing the likelihood of missing that freshness guarantee (for example, materializing the asset continuously would do a great job of meeting freshness requirements but is obviously not a desirable solution).

With that in mind, we decided to decouple these systems. Instead, users can express constraints on the required frequency of updates purely through AutomationConditions, and use AutomationCondition.any_downstream_conditions() on upstream assets to automatically “inherit” the condition(s) of downstream assets.

Take the following example:

In this case, we’ve defined that the downstream assets should run on some specific cadences (every three hours / daily), but the upstream assets only exist in order to enable these downstreams.

Rather than needing to explicitly figure out which assets need to update at which frequency to enable those downstreams, you can simply give those assets a any_downstream_conditions() automation condition, allowing those requirements to propagate upwards through the graph.

This helps reduce unnecessary computation for these assets, as they’ll exclusively be executed in cases where some downstream with an explicit policy needs to execute.

Customizing Conditions

While the above policies can handle many use cases, more specialized needs will always crop up. The core of the evaluation engine starts with simple conditions representing basic properties or statuses of an asset, for example:

AutomationCondition.missing(): True if the asset has never been materialized
AutomationCondition.in_progress(): True if there is an in-progress run targeting this asset
AutomationCondition.in_latest_time_window(<timedelta>): True for any time-window-partition of the asset from within the last (e.g.) 12 hours

These conditions may be composed together using the standard boolean operators, e.g.:

AutomationCondition.missing() & ~AutomationCondition.in_progress(): True if this asset has never been materialized and is not part of an in-progress run

A history of these evaluations can be viewed in the UI, giving you detailed information on exactly which sub-conditions were true on any given evaluation:

More complex operators also exist. For example:

AutomationCondition.any_deps_match(<condition>): True if any dependencies of this asset match an arbitrary condition
AutomationCondition.<condition>.since(<condition>): True if the first condition has become true since the second condition became true.

Example:

Let’s bring back the cron-based schedule policy from above:

cron_policy = AutoMaterializePolicy.eager().with_rules(
    AutoMaterializeRule.materialize_on_cron("@daily"),
    AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron("@daily"),
).without_rules(
    AutoMaterializeRule.materialize_on_parent_updated(),
)

There’s a lot going on here, but let’s re-implement this from scratch using the new AutomationCondition APIs. At a high level, we just want a condition that materializes an asset when the following things are both true:

A cron tick has passed since the last time the asset was requested (AutoMaterializeRule.materialize_on_cron)
All of the asset’s parents have been materialized since the last cron tick (AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron)

This can be implemented with AutomationConditions as follows:

  def get_cron_policy(cron_schedule: str) -> AutomationCondition:
      # detect ticks of the provided cron schedule
      cron_tick_passed = AutomationCondition.cron_tick_passed(cron_schedule)

      return (
          # cron tick has passed since the last time this was requested
          cron_tick_passed.since(AutomationCondition.newly_requested())
          # all deps have updated since the last cron tick
          & AutomationCondition.all_deps_match(
              AutomationCondition.newly_updated().since(cron_tick_passed)
          )
      )

In the rules-based system, you needed bespoke rules to separately handle each of these two cases, but here you can use simple components (cron_tick_passed, newly_updated) and use generic operators to combine them together.

We see this composability as a massive win in terms of flexibility. The combinatorial possibilities of this expression-based system are massive, and each individual condition added to the suite will automatically benefit from this framework. For example, all_deps_match and any_deps_match support .allow() and .ignore() methods, meaning you can target these conditions at specific parents:

# look for updates from any parent except "foo"
AutomationCondition.any_deps_match(AutomationCondition.newly_updated()).ignore(
    AssetSelection.keys("foo"),
)

These building blocks are significantly simpler and more powerful than the analogs in the AutoMaterializeRule system.

Unit Testing

As the flexibility of this feature increases, so does the need for building confidence that the condition you’ve created does what you expect. Dagster will provide unit-testing APIs to allow you to validate that your policy reacts as expected to various events:

from dagster import AutomationCondition, DagsterInstance, asset, evaluate_automation_conditions


def test_policy_blank_slate() -> None:
    @asset(automation_condition=AutomationCondition.eager())
    def a() -> None:
        return

    instance = DagsterInstance.ephemeral()

    # on the first tick, materialize because a is missing
    result = evaluate_automation_conditions(defs=[a], instance=instance)
    assert result.total_requested == 1

    # don't fire another materialization request on the next tick
    result = evaluate_automation_conditions(defs=[a], instance=instance, cursor=result.cursor)
    assert result.total_requested == 0

While this is somewhat bare-bones at the moment, we plan on continuing to expand this interface to make common testing patterns as ergonomic as possible, and would love your feedback on how you see yourself using this.

AutomationCondition Sensors

The centralized Daemon approach is risky (a failure or slowdown when evaluating any asset in the global asset graph impacts all other assets), and inflexible (it is impossible to target different behaviors at different sets of assets).

In 1.8, we'll be defaulting to a sensor-based approach. Each asset with an AutomationCondition defined will be handled by an AutomationConditionSensorDefinition, rather than a centralized Daemon. By default, a single AutomationConditionSensorDefinition is created per code location, and will target all assets within that code location.

However, in cases where a single code location handles multiple disparate concerns, it can be useful to fully isolate the operation of different sets of assets. To do so, you’ll be able to explicitly define multiple AutomationConditionSensorDefinition objects:

from dagster import Definitions, AutomationConditionSensorDefinition, AssetSelection

defs = Definitions(
    ...,
    sensors=[
        AutomationConditionSensorDefinition(
            "group_foo_automation",
            target=AssetSelection.groups("foo"),
        ),
        AutomationConditionSensorDefinition(
            "group_bar_automation",
            target=AssetSelection.groups("bar"),
        ),
    ],
)

These can be viewed in the "Sensors" tab in the UI like any other sensor:

What's Next...

The following features are not currently available, but are on the roadmap!

User-Defined Conditions

While we aim to satisfy the large majority of use cases with pre-built conditions, automation decisions fundamentally must be able to based off of arbitrary user code. This goes beyond mixing and matching a fixed set of “atoms” that are provided by the framework, as you sometimes need to be able to arbitrarily interact with external APIs or organization-specific business logic.

We are working to define the precise user API, but at the minimum, you will be able to define custom AutomationCondition objects which compose with the built in primitives. We’re interested in feedback here regarding how you would use this feature!

Mass-applying Conditions

We plan to make it easy to apply arbitrary properties (i.e. not just AutomationConditions) to large sets of assets. More details will be provided at a later date, but at a high level this would look something like the following:

defs = Definitions(assets=[]).map_asset_specs(
    lambda spec: spec.with_automation_condition(AutomationCondition.eager()),
)

Better Run Batching

The AMP system often ends up needing to perform “shadow backfills”, which separate out a single semantic intent into a set of independent runs. It might “know” that it needs to update an asset and all of its downstreams (particularly when using an eager() policy), but is unable to launch all of those assets in a single run.

The core reason behind this is that Dagster does not currently support creating runs which target assets with different PartitionsDefinitions in a first-class way. We are working on changing this, which will in turn allow the Declarative Automation system to neatly combine together runs into larger (and more logical) batches. In essence, AMP will be capable of emitting backfills, but these backfills will be more of a first-class object than they are in their current form.

More details will follow in a separate Github Discussion!

Call to Action

We’d love to hear your thoughts! We know that this is a large set of changes to a widely-used system, and are committed to making the migration path as smooth as possible. We strongly believe that the end result will be a significantly more stable and flexible system.

If you have concerns, questions, use cases you want addressed, or just generally have feedback regarding these changes, don’t hesitate to comment here, or reach out to us on Slack.

geoHeil · 2024-07-03T03:39:20Z

geoHeil
Jul 3, 2024

Sounds great. I wonder how this will perhaps support dbt and database views out of the box.

I.e materialize the views only on changes of the view definition but refresh downstream easily according to their AMP

8 replies

OwenKephart Jul 5, 2024
Maintainer Author

That being said, generic view support is certainly on our radar, it's just a pretty tricky problem to solve.

hello-world-bfree Aug 1, 2024

As far as dbt is concerned, I think it might be simpler to handle directly within the dbt integration rather than fit it within a more generic solution.

With how Dagster injects the appropriate models in the dbt cli selection parameter rather than acting as pass-through, we could use the manifest to identify views, check their current code version against their previous code version, and if it's changed include them in the select, if not, don't. This would allow users to write straight-forward dbt build/run select commands and not have to worry when the a lot of the models are views and getting materialized/rebuilt every day.

I pushed up this PR that implements the above. All testing works as expected! I know this would be absolutely huge for my org.

axellpadilla Aug 2, 2024

I support this, if possible, a preliminary just dbt integration can help to understand what is needed before a more generalized approach that includes every asset, any ideas ?

PD: What if I load the assets filtered by if they are views or not, I can add a tag, looks like Dagster already identifies code changes, so how can I generalize the code change detection on the asset factory to implement a simple if for each asset?

OwenKephart Aug 7, 2024
Maintainer Author

Hi @hello-world-bfree and @axellpadilla, thanks for the feedback here.

I left some more detailed thoughts on the PR itself, but my overall sense is that it is hard to solve this problem in a satisfactory way (meaning a way that we'd feel confident endorsing as part of the public API, not that it wouldn't be generally useful) without framework-level support. Happy to continue that conversation either here or on the PR!

axellpadilla Aug 9, 2024

Hi @OwenKephart , I think just adding a tag for code_version changed assets will help with a needed filter (assets that changed version on assets view) and also help to easy control this on the asset factory, I think this tagging could help also with a very much needed filter for failed assets. I tried finding a way to check for a boolean code changed flag to use on the factory to conditionally execute the cli command but couldn't, is there any way without framework change to use this information?

C0DK · 2024-07-03T05:48:01Z

C0DK
Jul 3, 2024

Quick note, the photos are inaccessible to me.

Other than that, it sounds super duper awesome! The composability seems to extend further and be in the spirit of all the greatness Dagster is already known for.

I am looking forward to playing around with these changes - and seeing if it solves some of the pains we've had with AutoMaterialization + Freshness (especially regarding partitioned assets).

I really love the ability to create multiple sensors for different subsets of assets, minimizing the reach of potential bugs.

1 reply

C0DK Jul 3, 2024

Images now work

Daniel-Vetter-Coverwhale · 2024-07-03T14:57:15Z

Daniel-Vetter-Coverwhale
Jul 3, 2024

I would really love it if in the new system an automaterialize sensor was created for each asset, or we at least got the ability to turn off auto-materialization for individual assets/asset chains. There are a few feature requests/issues around this that dive deeper into it - #22073, #15504, #18133

Setting those sensors up separately is a bit of a hassle, and I would think has problems with asset DAGs that cross code locations. If instead each asset can have auto-materialization turned on or off at the asset level, perhaps with a toggle in the automation tab on the definition, then you can avoid those concerns I believe.

5 replies

OwenKephart Jul 3, 2024
Maintainer Author

Hi @Daniel-Vetter-Coverwhale, thanks for that feedback -- this is definitely one of the longer-standing requests for this feature. The main issues with setting up a separate sensor per-asset are:

It, well, creates a lot of sensors which can be somewhat hard to manage in a sensible way via the UI
It's not possible for larger runs to be strung together, i.e. if you have A -> B -> C and they're all eager, you'd ideally like all of these assets to get kicked off in a single run just for comprehensibility. However if these assets all operate within their own independent sensors, those sensors each can only launch runs targeting their specific assets.

However, we've definitely considered the ability to turn off evaluation of specific assets within a sensor. The main issue here is just the complexity of having multiple different interacting levers that can impact if an individual asset is getting evaluated or not. At the end of the day, this is more of a UX problem than a fundamental technical limitation, and we've done some exploration on how to present this in a sensible way but have no hard plan at the moment.

You mention turning off automation for asset chains in addition to individual assets. This seems reasonable at a glance but can you elaborate a bit more on the types of situations you see yourself doing that, and how you would want to interact with the UI in order to achieve that result?

Daniel-Vetter-Coverwhale Jul 3, 2024

For the sensor management if you just make them special sensors and then only have those special sensors show up on the asset definition page (and not in the sensor page) in the automation tab that doesn't seem overwhelming to me. It seems from this RFC that they are already going to be a special kind of sensor, so that seems feasible to me.

I don't know that I would personally. I kind of like my assets to be run in separate jobs, or at least I don't have any particular attachment to them running in the same job especially as I use the k8s job executor by default in most of my code locations. I really like thinking about assets individually, and letting dagster and auto-materialization policies take care of the rest. I understand that there could definitely be cost-savings involved if the assets are launched in the same run in other execution environments though.
If it's for viewing purposes, dagster knows which assets are downstream and presumably why they were kicked off based on these auto-materialization policies, so the UI could group together runs that were chained within X amount of time, or were kicked off by whatever event. Something like I start from this run then follow it through to the next run that was kicked off by the materialization or observation of assets in this run. I think this sort of thing would be nice in general, though I know that the UX for it might be difficult.

So I think I wasn't very clear about the chains thing. I think one of the nicest things about turning off individual assets is that it automatically breaks a chain (or cuts a chain as per this comment - #15504 (comment)). So if asset A is messed up and I can turn it off, and not worry that it's tainting downstream assets. So the asset dependencies are not known a priori, or even in full by me or anyone else, but Dagster already has that information.

OwenKephart Jul 5, 2024
Maintainer Author

I think that all seems reasonable. I think we'd need the AutomationSensors to show up on the "Sensors" page (otherwise it might be quite hard to track them all down), but having an embedded toggle to turn that specific asset's sensor off within the Asset Details page makes a lot of sense to me, and something we can look into as we make some changes to that page.

We have also done some thinking on an "Asset Timeline" view, which would ideally be structured to give a high level overview of all asset executions in a way that would make it feasible to understand dependencies (regardless of Declarative Automation's involvement or not).

The bit about chains makes sense! So you'd still be toggling a single asset at a time, no need for a fancier toggling interface.

Thanks again for your feedback!

CSRessel Aug 7, 2024

I appreciate your reasoning on the complexity of this, and the value of the run batching (as well as performance) is the reason why I also would not want an individual sensor per-asset. It is definitely valuable to me to have the automation group many downstreams into one run by default.

However, I definitely want to add another voice that being able to disable automation on a per-asset basis would be a valuable feature!

OwenKephart Aug 7, 2024
Maintainer Author

Heard! This feature is definitely still on the table

erinov1 · 2024-07-09T18:31:13Z

erinov1
Jul 9, 2024

Are there any plans to revisit support for irregular non-cron schedules + time partitioning (that are known in advance) in light of this? Like on a list of business hour datetimes provided in a file or emitted by a some function.

1 reply

OwenKephart Jul 10, 2024
Maintainer Author

Hi @erinov1! Our intention is for those sorts of use cases (which require custom code) to be handled by user-defined conditions. I touch on it briefly in the RFC, but we want to settle on an API for users to define conditions that invoke arbitrary code. These would be of the same type as something like AutomationCondition.missing(), and would compose with the other conditions in the same way.

mudravrik · 2024-07-23T15:08:27Z

mudravrik
Jul 23, 2024

Great changes!

I might missed it but do you have any plans on customzing a list of parents monitored by conditions?
My usecase for this sounds not so rare, but maybe we are doing something wrong in overall design.
Anyway, usecase looks like this:

We have stable "dimension" asset A, which rarely get updated
We have two "facts" assets, B and C, with daily updates
We want to apply AM-policy to asset D, which is directly dependent on all three: A,B and C

Obvious "schedule" for D is to wait for both B and C to be updated, but do not wait for A, since it won't be updated anytime soon.
Right now, we can go two ways:

apply plain eager() and effectively materialized D twice - both after B and C. It works but not ideal in case of heavy computation in D obviously.
apply eager().with_rules(AutoMaterializeRule.skip_on_not_all_parents_updated()) which will never run D, since A is never updated. Not a good way, actually :)

Maybe anything in User-Defined Conditions will help us? :)

4 replies

OwenKephart Jul 23, 2024
Maintainer Author

Hi @mudravrik -- the AutomationCondition is able to handle this, although you need to rebuild things up a bit "from scratch" (providing some more mid-level abstractions is something on our radar). If we look at the definition of on_cron(), it's composed of a number of simpler conditions: https://sourcegraph.com/github.com/dagster-io/dagster/-/blob/python_modules/dagster/dagster/_core/definitions/declarative_automation/automation_condition.py?L321

The basic function of this condition is to materialize an asset after all of its parents have been updated (since a given cron tick). So setting the cron_schedule to "@daily" almost does what you want, but has the issue that it will be stuck waiting for an update to A forever. So the way to make it ignore A is with the .ignore() argument to any_deps_match. To do this, we can just copy-paste the existing definition of on_cron and make our own modified version:

    def my_automation_condition() -> AutomationCondition:
            cron_label = f"'{cron_schedule}' ({cron_timezone})"
            cron_tick_passed = AutomationCondition.cron_tick_passed(
                cron_schedule, cron_timezone
            ).with_label(f"tick of {cron_label} passed")
            all_deps_updated_since_cron = AutomationCondition.all_deps_match(
                AutomationCondition.newly_updated().since(cron_tick_passed)
                | AutomationCondition.will_be_requested()
            ).ignore(
                AssetSelection.keys("A"), # MODIFIED THIS LINE
            ).with_label(f"all parents updated since {cron_label}")
            return (
                AutomationCondition.in_latest_time_window()
                & cron_tick_passed.since(AutomationCondition.newly_requested())
                & all_deps_updated_since_cron
            ).with_label(f"on cron {cron_label}")

mudravrik Jul 24, 2024

Thank you for response! I should read on_cron documentation more carefully :)

Your solution looks great, however can we use AutomationCondition.all_deps_match().ignore() outside of on_cron context?

I mean, in our real case "daily" is more like "human-readable" description and technically pipeline is triggered by sensor looking into outer space source which updates roughly once a day.
In this case having separate cron-like schedule declaration seems a bit strange and probably may be bug-prone in case of upstreams are triggered twice a day by some external change.

OwenKephart Jul 26, 2024
Maintainer Author

@mudravrik What I'd say is that removing the cron_tick_passed concept means you'll need to have some concept to replace it, which is easier said than done.

Essentially, you want some policy that says "wait until all of my parents have materialized since ", to avoid a situation where you're materializing every time any parent updates. In the cron policy above, is a specific cron tick, which is nice because it sets a specific cadence to the expected updates.

One seemingly-promising alternative is to replace with "the last time I was updated", i.e. "wait until all my parents have materialized since the last time I was updated". While this sounds natural and desirable, this actually has a pretty serious issue in practice, which is that it's very easy for an asset with this policy to become "desynchronized" with its upstreams.

i.e. imagine you have an asset C with parents A and B.

Day 1:
- A materializes, but because B hasn't materialized, C waits -> good
- now B materializes, C kicks off -> also good
Day 2:
- B materializes, but because A hasn't materialized it waits -> good
- A attempts to materialize, but the run fails, so C is still waiting -> good
Day 3:
- A materializes successfully, so now both B and A have materialized more recently than C, C kicks off -> this is bad! it's running off of a combination of very recent data for A and yesterday's data for B
- B materializes successfully, but now C has been requested more recently than A, so it's waiting -> also bad, we're now stuck in an off cycle where C will need to wait for tomorrow's data from A to kick off.

Including the cron schedule introduces a simple defense from entering these sorts of cycles, and so even if your data isn't strictly-speaking cron-based, it can still be useful to use the on_cron AutomationCondition. If you want to handle cases where data could come in multiple times per day, you could dial the frequency of the cron schedule to once every 12 hours -- if there's a 12 hour period that doesn't have any updates from the parents, that's ok (no run will be kicked off), but if there is new data then you will still be able to respond to it.

axellpadilla Oct 26, 2024

Hi @mudravrik -- the AutomationCondition is able to handle this, although you need to rebuild things up a bit "from scratch" (providing some more mid-level abstractions is something on our radar). If we look at the definition of on_cron(), it's composed of a number of simpler conditions: https://sourcegraph.com/github.com/dagster-io/dagster/-/blob/python_modules/dagster/dagster/_core/definitions/declarative_automation/automation_condition.py?L321

The basic function of this condition is to materialize an asset after all of its parents have been updated (since a given cron tick). So setting the cron_schedule to "@daily" almost does what you want, but has the issue that it will be stuck waiting for an update to A forever. So the way to make it ignore A is with the .ignore() argument to any_deps_match. To do this, we can just copy-paste the existing definition of on_cron and make our own modified version:
    def my_automation_condition() -> AutomationCondition:
            cron_label = f"'{cron_schedule}' ({cron_timezone})"
            cron_tick_passed = AutomationCondition.cron_tick_passed(
                cron_schedule, cron_timezone
            ).with_label(f"tick of {cron_label} passed")
            all_deps_updated_since_cron = AutomationCondition.all_deps_match(
                AutomationCondition.newly_updated().since(cron_tick_passed)
                | AutomationCondition.will_be_requested()
            ).ignore(
                AssetSelection.keys("A"), # MODIFIED THIS LINE
            ).with_label(f"all parents updated since {cron_label}")
            return (
                AutomationCondition.in_latest_time_window()
                & cron_tick_passed.since(AutomationCondition.newly_requested())
                & all_deps_updated_since_cron
            ).with_label(f"on cron {cron_label}")

Hi, amazing that we can use AssetSelections, this exactly scenario could be simplified by adding a tag to all "just daily assets" and ignore all of them for the hourly condition alone. Tags are a very powerful way to filter and control the assets on UI either way. Having said that, it would be more integrated if we could ignore or include assets with (or without, on this example, ignore all assets without this same automation) a specific automation condition, maybe just adding this new selector to the asset selection framework.

Using tags and groups is a powerful way to control the automation without specifying a list of assets and I think this should be officially documented.

geoHeil · 2024-07-24T06:18:20Z

geoHeil
Jul 24, 2024

Also - please do not forget the story of unit testing. I have seen that some first steps for unit testing for the old API were sent here #22292 given that:

branch deploys do not have sensors and also no scheduling policies firing
having a better testing story
having a better E2E integration validation for how the assets (at enterprise scale, ~ 100k assets) might be firing and updating each other ideally from a simple environment i.e. branching and main - and nothing in between

could be dramatically simplified by this new API and these aspects ideally can be considered there as well

0 replies

geoHeil · 2024-07-25T10:14:16Z

geoHeil
Jul 25, 2024

It would be very neat if this new api will become blocking for asset checks as outlined here #22427

1 reply

OwenKephart Jul 26, 2024
Maintainer Author

Totally agreed -- this work is somewhat tied in with the broader goal of making asset checks themselves possible to automate using policies (i.e. you can set a check to run every hour, or when its parent materializes, etc.)

geoHeil · 2024-07-25T10:58:31Z

geoHeil
Jul 25, 2024

It would be quite neat if a default condition would be there to allow to update based on next business day (with a pluggable calendar)

2 replies

OwenKephart Jul 26, 2024
Maintainer Author

Are you imagining this being for partitioned assets specifically? i.e. some sort of lag such that the latest partition gets materialized X amount of time after it pops into existence?

Definitely seems like reasonable behavior. The plan is to build up some "middle layer" between fully-built in things like AutomationCondition.eager() and very close-to-the-metal things like AutomationCondition.in_latest_time_window() to make it simple to create policies like you're describing without having to do things purely "from scratch" (and without us creating a super wide range of built-in complicated policies)

geoHeil Jul 27, 2024

yes

CSRessel · 2024-08-07T18:20:10Z

CSRessel
Aug 7, 2024

Just because I didn't see it explicitly mentioned, the new declarative automation will still not work between different code locations? Assuming so because I see the behavior of the default daemon is still the same, with one AutomationSensor by default per code location, but want to confirm

Overall, definitely liking the design of the new approach to automation! Excited for the arrival of this

2 replies

OwenKephart Aug 7, 2024
Maintainer Author

Hi @CSRessel! Can you say a bit more about the expected cross-code-location behavior you're looking for?

Currently, if you have asset A in CodeLocation1, and asset B in CodeLocation2 (which depends on A, and has an AutomationCondition.eager()), materializations of A will cause B to materialize, which satisfies some definition of "working between different code locations" -- is there something else you're looking for here?

CSRessel Oct 9, 2024

Thanks, that was exactly what I was looking for 👍

matthias-Q · 2024-08-15T13:50:56Z

matthias-Q
Aug 15, 2024

I would appreciate a automation_condition that allows me to enable/disable automation on a certain stage (think development, integration, production).

In the previous versions of Dagster (using schedules + jobs to trigger assets) we have set the schedule state depending on the stage.

1 reply

OwenKephart Aug 15, 2024
Maintainer Author

Hi @matthias-Q ! I think AutomationConditionSensorDefinition may be what you want here, i.e.:

from dagster import AssetSelection, AutomationConditionSensorDefinition, Definitions

defs = Definitions(
    sensors=[
        AutomationConditionSensorDefinition("development_conditions", asset_selection=AssetSelection.groups("development")),
        AutomationConditionSensorDefinition("integration_conditions", asset_selection=AssetSelection.groups("integration")),
        ...
    ]
)

Each of these sensors would be evaluated independently and can be turned on or off independently

axellpadilla · 2024-10-30T20:20:57Z

axellpadilla
Oct 30, 2024

Hi, I understand that since part only works if the since condition was detected on a previous tick and partially understand how to build the automation but can't clearly translate the need to the code.
I wanted to know how can I declare something like this:
"(Update if all deps updated since a cron that is not part of a tick or if doesn't have deps) and cron passed but asset is still not updated since that cron", "do that after all in progress deps and if not already in progress or if the deps will be updated alongside the asset".
That part is a little difficult to understand, but is the main base, for example to standardize daily updates with the same automation being it the root asset or the last asset.

The main Idea is that it should still update for example after a downtime, even if cron tick evaluation is missing, if the asset or all daily assets weren't updated, run the request to do so.

2 replies

axellpadilla Nov 7, 2024

Hi OwenKephart

This is closer to a standard daily automation that can be used on all assets, but is missing the automation to handle when the assets have no dependencies because looks like root assets don't execute:

def my_daily_automation_condition() -> AutomationCondition:
        cron_schedule = get_for_current_env({"dev": "0 1 * * *", "prd": "0 0 * * *"})
        cron_timezone=default_timezone_teg
        cron_label = f"'{cron_schedule}' ({cron_timezone})"
        cron_tick_passed_since_last_handle = (AutomationCondition.cron_tick_passed(
                    cron_schedule, cron_timezone
                ).since_last_handled() | AutomationCondition.missing()).with_label(f"tick of {cron_label} passed")
        all_deps_updated_since_cron = AutomationCondition.all_deps_match(
            AutomationCondition.newly_updated().since(
                    AutomationCondition.cron_tick_passed(cron_schedule, cron_timezone)
                )
            | AutomationCondition.will_be_requested()
        ).ignore(
        selection=(
            AssetSelection.all()
            - AssetSelection.tag(key=tags_repo.Daily.key, value=tags_repo.Daily.value)
            )
        ).with_label(f"all same period parents updated since {cron_label}")
        return (
            AutomationCondition.in_latest_time_window()
            & cron_tick_passed_since_last_handle
            & all_deps_updated_since_cron #(all_deps_updated_since_cron | ~HasDependencies()) #??????????
            & ~AutomationCondition.in_progress()
            & ~AutomationCondition.any_deps_in_progress()
        ).with_label(f"on cron {cron_label}")

This custom automation that is missing from the core automations doesn't work, it even stops the evaluation without any warning or errors:

import dagster as dg

class HasDependencies(dg.AutomationCondition):
    @property
    def name(self) -> str:
        return "has_dependencies"
    
    def evaluate(self, context: dg.AutomationContext) -> dg.AutomationResult:
        asset_graph = context.asset_graph
        key = context.key
        dep_keys = asset_graph.get(key).parent_entity_keys
        if dep_keys:
            true_subset = context.candidate_subset
        else:
            true_subset = context.get_empty_subset()
        return dg.AutomationResult(true_subset=true_subset, context=context)

axellpadilla Nov 7, 2024

Follow Up, the main problem was that my sensor with user code true wasn't actually added to definitions (it had the same name so i didn't realize it), the final standardized automation is this (I recommend including IsRootExecutable as a base line automation):

Edit: Changed on_missing for newly_missing since last handled because it cancels the root executable with a check to deps missing and also cancels the time window loopback for partitions.

import dagster as dg

class IsRootExecutable(dg.AutomationCondition):

    @property
    def name(self) -> str:
        return "is_root_executable"
    
    def evaluate(self, context: dg.AutomationContext) -> dg.AutomationResult:
        root_keys = context.asset_graph.root_executable_asset_keys
        key = context.key
        if key in root_keys:
            true_subset = context.candidate_subset
        else:
            true_subset = context.get_empty_subset()
        return dg.AutomationResult(true_subset=true_subset, context=context)

and the cron automation standarized:

def my_cron_automation_condition(
    cron_schedule: str,
    ignored_deps_updated_selection: AssetSelection | None = None,
    lookback_delta: timedelta | None = None,
) -> AutomationCondition:
    """
    Returns an automation condition that checks if the cron schedule has passed and all dependencies are updated.

    The condition is met if:
    - The cron schedule has passed since the last handle.
    - The asset is not in progress and none of its dependencies are in progress.
    - All non ignored dependencies are updated since the last cron tick or will be requested.
    - The asset is a root executable or has updated dependencies.

    Args:
        cron_schedule (str): The cron schedule to use for the automation condition.
        ignored_deps_updated_selection (AssetSelection | None, optional): The dependencies to ignore. Defaults to None.
        lookback_delta (timedelta | None, optional): The time window to look back for updates. Defaults to None.

    Returns:
        AutomationCondition: The automation condition based on the provided cron schedule.
    """
    cron_timezone = default_timezone_teg
    cron_schedule_label = f"'{cron_schedule}' ({cron_timezone})"
    cron_tick_passed_since_last_handle = (
        AutomationCondition.cron_tick_passed(cron_schedule, cron_timezone)
        .since_last_handled()
        .with_label(f"cron_tick_passed: {cron_schedule_label}")
        | AutomationCondition.newly_missing().since_last_handled()
    )
    deps_updated_since_cron = AutomationCondition.all_deps_match(
        AutomationCondition.newly_updated().since(
            AutomationCondition.cron_tick_passed(cron_schedule, cron_timezone)
        )
        | AutomationCondition.will_be_requested()
    )
    if ignored_deps_updated_selection:
        deps_updated_since_cron = deps_updated_since_cron.ignore(ignored_deps_updated_selection)
    return (
        AutomationCondition.in_latest_time_window(lookback_delta=lookback_delta)
        & cron_tick_passed_since_last_handle
        & ~AutomationCondition.in_progress()
        & ~AutomationCondition.any_deps_in_progress()
        & (
            IsRootExecutable()
            | deps_updated_since_cron.with_label(
                f"dependencies_updated_since: {cron_schedule_label}"
            )
        )
    ).with_label(f"cron_schedule_passed_and_complied: {cron_schedule_label}")

v1gnesh · 2024-11-09T11:29:34Z

v1gnesh
Nov 9, 2024

Hello,

Can you point to the authoritative source of the latest documentation with working code, including coverage of even experimental features. It's really pissing me off trying to piece together sample code from PRs, docs-preview.dagster.io and docs.dagster.io.
For example, just give us a single page explaining declarative automation covering what is possible today (1.9.1).
Obviously, this discussion is a design doc and not all of it may be implemented yet.

Similarly, if something has been superseded (like AutomationCondition superseding manually defined sensors & schedules), please mark sensors & schedules with something in the docs page to say that it has been superseded in the current release.
Or better yet, just remove it from the current release's doc.
Declarative Automation in Concepts -> Automation, in docs.dagster.io, even now has the experimental tag.

I don't understand how all of your users are coping with this multiple sources of partial truth situation...

6 replies

v1gnesh Nov 20, 2024

@garethbrickman could you please help clarify this example for me -
https://docs-preview.dagster.io/guides/partitioning#dynamic-partitions
https://docs.dagster.io/concepts/partitions-schedules-sensors/partitioning-assets#dynamically-partitioned-assets

Is a @sensor and a define_asset job still required for working with dynamic partitions?
How does AutomationCondition relate to a sensor's minimum_interval_seconds?

garethbrickman Nov 20, 2024
Maintainer

Looking at the docs for DynamicPartitionsDefinition I don't think a sensor and asset job are strictly required as "Partitions can be added and removed using instance.add_dynamic_partitions and instance.delete_dynamic_partition methods." But they are the most common utilities to use for doing that.

I'm not sure about your second question.

v1gnesh Nov 21, 2024

But they are the most common utilities to use for doing that.

You mean sensor & asset job, right?

How does AutomationCondition relate to a sensor's minimum_interval_seconds?

If sensor & asset job is the recommended way to work with dynamic partitions (despite there being direct methods you refer to), and considering that AutomationCondition supersedes manually built schedules & jobs, can't AutoCon also not supersede the job of an asset job & sensor, in this context (of dynamic partition build-up)?

garethbrickman Nov 21, 2024
Maintainer

I think a blocker to that for now is the necessity for the sensor to return a SensorResult with the dynamic_partitions_requests. I'm not sure if AutomationConditionSensorDefinition can handle that case yet. cc @OwenKephart

v1gnesh Nov 22, 2024

Yes, thank you. Will be good to have this baked into AutoCon as priorities permit.

nsteins · 2024-11-20T22:33:23Z

nsteins
Nov 20, 2024

I'm struggling to fully understand the interaction between AutomationCondition.newly_updated and AutomationCondition.cron_tick_passed. AutomationCondition.newly_updated claims to return true if a partition has been updated "has been updated since the previous tick", but it is not clear to me if that is the same as the cron tick, or if it is referring to the automation_sensor tick.

I have an asset with daily partitions, that depends on multiple assets that also have daily partitions. I would like to update the asset for all of the partitions that have been updated in any of the upstream assets. I am doing so with the materialization conditions:

fifteen_mins_passed = AutomationCondition.cron_tick_passed(
    "*/15 * * * *", cron_timezone="UTC"
)
any_deps_updated = AutomationCondition.any_deps_match(
    AutomationCondition.newly_updated(),
)
fifteens_min_passed_and_any_deps_updated = fifteen_mins_passed & any_deps_updated

@asset(
...
    automation_condition=fifteens_min_passed_and_any_deps_updated
)
...

This works for many conditions, however I am finding that it can fail if multiple partitions are updated for a dependency in the same 15 minute window. If I materialize the partition 2022-10-01 and then materialize 2024-11-20 soon after, when the sensor evaluates, it will only materialize the 2024-11-20 partition.

Can you provide any info on what might be causing this problem and how to achieve the conditions we're looking for?

2 replies

OwenKephart Nov 20, 2024
Maintainer Author

Hi @nsteins ! newly_updated is referring to the automation_sensor tick, not the cron tick. So the condition you have essentially will only fire if an upstream updates at around the same time as the cron tick passes.

To directly achieve:

update the asset for all of the partitions that have been updated in any of the upstream assets

, you can use AutomationCondition.eager(), but I'm guessing that you're involving cron here to sortof let upstream updates "bunch up", is that right? As in, you may have multiple upstreams for a given partition update within a 15 minute boundary, and you'd want to just launch a single run, rather than multiple individual ones?

One thing to note is that the AutomationCondition.eager() does by default wait for in-progress upstreams to complete, which provides some protection for this scenario. Otherwise, you can use:

AutomationCondition.eager().without(
    # you need this bit to get the eager condition to target older time partitions
    AutomationCondition.in_latest_time_window(),
) & AutomationCondition.cron_tick_passed(...)

, as the eager condition stores information about what runs have been kicked off in response to upstream updates.

nsteins Nov 20, 2024

Thanks for the response @OwenKephart

I'm guessing that you're involving cron here to sortof let upstream updates "bunch up", is that right? As in, you may have multiple upstreams for a given partition update within a 15 minute boundary, and you'd want to just launch a single run, rather than multiple individual ones?

Yes, that is correct

Eager did not work for me, I think because it had additional conditions that I do not wish to require. I do not need all upstream assets to be materialized, and I worry that if I wait for no upstream assets to be in progress, then my sensor will never actually execute. I have a large number of upstream assets that are updating frequently, but they generally target the same 1-3 dates, so this asset allows us to generate metadata in a limited time range. However when we backfill older dates, we want to capture that as well.

Looking at the code for eager(), it appears to just be a composite of other conditions. I could use without but since I want to remove more than I keep I ended up with:

fifteen_mins_passed = AutomationCondition.cron_tick_passed(
    "*/15 * * * *", cron_timezone="UTC"
)
any_deps_updated = AutomationCondition.any_deps_updated().since_last_handled() & ~AutomationCondition.in_progress()
fifteens_min_passed_and_any_deps_updated = fifteen_mins_passed & any_deps_updated

So far, this appears to be working as desired, though I will need to test more scenarios. I believe the missing piece here was the since_last_handled condition, which appears to be looking at all updated partitions since the last time that partition was updated, which matches the conditions I am looking for.

I would definitely suggest adding documentation and examples of this function, or possibly make it a default behavior when interacting with cron conditions, as I think Dagster's paradigm for sensors + cron is not intuitive unless you are familiar with dagster's internals. My understanding of the issue is that while I may want this sensor to evaluate every 15 minutes, internally, the sensor is evaluating every minute, but returning False because of the cron condition. So newly_updated is only true for partitions materialized in the one minute window prior to when the cron condition evaluates. This is not the way I would assume based on previous experience with cron schedules, without a deeper knowledge of how dagster implements sensors.

[RFC] Declarative Automation and the future of Auto-materialize Policies #22811

OwenKephart Jul 2, 2024 Maintainer

Introduction

Context

“It’s challenging to customize”

“It’s challenging to operate at scale”

Introducing: Declarative Automation

Automation Conditions

AutomationCondition.eager()

AutomationCondition.on_cron(”@daily”)

AutomationCondition.any_downstream_conditions()

Customizing Conditions

Unit Testing

AutomationCondition Sensors

What's Next...

User-Defined Conditions

Mass-applying Conditions

Better Run Batching

Call to Action

Replies: 13 comments · 35 replies

OwenKephart Jul 5, 2024 Maintainer Author

OwenKephart Aug 7, 2024 Maintainer Author

OwenKephart Jul 3, 2024 Maintainer Author

OwenKephart Jul 5, 2024 Maintainer Author

OwenKephart Aug 7, 2024 Maintainer Author

OwenKephart Jul 10, 2024 Maintainer Author

OwenKephart Jul 23, 2024 Maintainer Author

OwenKephart Jul 26, 2024 Maintainer Author

OwenKephart Jul 26, 2024 Maintainer Author

OwenKephart Jul 26, 2024 Maintainer Author

OwenKephart Aug 7, 2024 Maintainer Author

OwenKephart Aug 15, 2024 Maintainer Author

garethbrickman Nov 20, 2024 Maintainer

garethbrickman Nov 21, 2024 Maintainer

OwenKephart Nov 20, 2024 Maintainer Author

OwenKephart
Jul 2, 2024
Maintainer

Replies: 13 comments 35 replies

OwenKephart Jul 5, 2024
Maintainer Author

OwenKephart Aug 7, 2024
Maintainer Author

OwenKephart Jul 3, 2024
Maintainer Author

OwenKephart Jul 5, 2024
Maintainer Author

OwenKephart Aug 7, 2024
Maintainer Author

OwenKephart Jul 10, 2024
Maintainer Author

OwenKephart Jul 23, 2024
Maintainer Author

OwenKephart Jul 26, 2024
Maintainer Author

OwenKephart Jul 26, 2024
Maintainer Author

OwenKephart Jul 26, 2024
Maintainer Author

OwenKephart Aug 7, 2024
Maintainer Author

OwenKephart Aug 15, 2024
Maintainer Author

garethbrickman Nov 20, 2024
Maintainer

garethbrickman Nov 21, 2024
Maintainer

OwenKephart Nov 20, 2024
Maintainer Author