[Plugin]: Add example of per model dbt execution #3135

brandon-segal · 2022-12-06T15:50:34Z

Provide a documented example of how to do substantial orchestration of a dbt task. I was interested to see if I could replicate the logic in this proposed Airlfow DAG orchestration that orchestrates a dbt project. In the workflow they do the following:

Runs dbt compile to create a fresh copy of manifest.json (Where dependencies are stored)
Reads the model selectors defined in the YAML file
Uses the dbt ls command to list all of the models associated with each model selector in the YAML file
Turns the dbt DAG from manifest.json into a Graph object with the networkx library
Uses the methods available on the Graph object to figure out the correct set of dependencies for each group of models defined in the YAML file
Writes the dependencies for each group of models (stored as a list of tuples) to file
Create an Airflow DAG for each group of models based on the given dependencies
This DAG is then registered with their orchestrator

Is this kind of logic possible using something like @dynamic workflows to dynamically generate the tasks during execution or through a script that generates it prior to registration?

The text was updated successfully, but these errors were encountered:

timle2 · 2023-03-08T16:41:54Z

I've set up a blog post on this topic, here's the relevant section to this discussion. This is something I've been interested in doing for a long time, and we set up a demo of this (which is the write up) over the summer using containerTasks. I think this can easily be converted to dbtrun tasks as well though.

Perhaps the dbt DAG export could be explicitly written as a dbt plugin task. But I've also submitted a feature request to make 'execution plan exports' a standard feature in dbt here. Otherwise creating the DAG in Flyte can be easily accomplished with ImperativeWorkflows

Here's a summary of the steps to get the DAG

Generate manifest, and graph (see dbt.graph; it uses networkx.classes.digraph.DiGraph)
Parse selectors, get the selection spect, apply to manifest to get affected nodes, apply to graph to select just selected nodes
Create a GraphQueue (dbt.graph.queue) from the graph
Iterate across the queue (it's ordered by score, based on topological sort) to get run order. Pull any parents, if they exist for each step.
(dbt 1.5 is doing major rewrites to the interface, so maybe this gets easier; I've also submitted a feature request to make 'execution plan exports' a standard feature in dbt (here)[https://github.com/[CT-2272] [Feature] Export/list DAG execution plan dbt-labs/dbt-core#7137]
In this way, we get an execution plan that mirrors the run process of dbt with maximum fidelity.

This execution plan can then be built in Flyte, with ImperativeWorkflows. In my article I've done that with ContainterTasks, but I think it's simple to convert over to DBTRun tasks if someone wanted!
Here's a POC for how that could start to look with dbttasks

from flytekitplugins.dbt.task import DBTRun
from flytekitplugins.dbt.schema import DBTRunInput
from flytekit.core.workflow import ImperativeWorkflow



DBT_PROJECT_DIR = "/Users/timothyl/git/tim-flyte-test/dbt_demo_project"
DBT_PROFILES_DIR = "/Users/timothyl/git/tim-flyte-test/dbt_demo_project"
DBT_PROFILE = "bq-oauth"

input_ = DBTRunInput(
        project_dir=DBT_PROJECT_DIR,
        profiles_dir=DBT_PROFILES_DIR,
        profile=DBT_PROFILE,
        select=["tag:something"])

task_1 = DBTRun(name="test-task")
task_2 = DBTRun(name="test-task2")


wb3 = ImperativeWorkflow(name='imperative_dbt_demo')

task1_task_id = wb3.add_entity(task_1, input = input_)
task2_task_id = wb3.add_entity(task_2, input = input_)

task2_task_id.runs_before(task1_task_id)

github-actions · 2023-12-04T00:06:48Z

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable.
Thank you for your contribution and understanding! 🙏

someshfengde · 2024-10-01T09:29:11Z

@davidmirror-ops can you assign this issue to me?

davidmirror-ops · 2024-10-01T10:47:39Z

@someshfengde yes! Please let us know soon if you have questions.

someshfengde · 2024-10-01T10:55:40Z

yes sure thanks for assigning :)

brandon-segal changed the title ~~[Plugin]:~~ [Plugin]: Add example of per model dbt execution Dec 6, 2022

cosmicBboy added documentation Improvements or additions to documentation plugins Plugins related labels (backend or frontend) labels Dec 19, 2022

github-actions bot added the stale label Dec 4, 2023

davidmirror-ops added the hacktoberfest label Sep 27, 2024

davidmirror-ops mentioned this issue Sep 28, 2024

Flyte Hacktoberfest 2024: issues and guidelines #5783

Closed

56 tasks

davidmirror-ops assigned someshfengde Oct 1, 2024

davidmirror-ops removed the stale label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Plugin]: Add example of per model dbt execution #3135

[Plugin]: Add example of per model dbt execution #3135

brandon-segal commented Dec 6, 2022

timle2 commented Mar 8, 2023 •

edited

Loading

github-actions bot commented Dec 4, 2023

someshfengde commented Oct 1, 2024

davidmirror-ops commented Oct 1, 2024

someshfengde commented Oct 1, 2024

[Plugin]: Add example of per model dbt execution #3135

[Plugin]: Add example of per model dbt execution #3135

Comments

brandon-segal commented Dec 6, 2022

timle2 commented Mar 8, 2023 • edited Loading

github-actions bot commented Dec 4, 2023

someshfengde commented Oct 1, 2024

davidmirror-ops commented Oct 1, 2024

someshfengde commented Oct 1, 2024

timle2 commented Mar 8, 2023 •

edited

Loading