-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1732] [Feature] Add Native Persistent UDF Materialization #451
Comments
@anaghshineh Thanks for taking the initiative to open the issue, and the accompanying PR! I am very supporting of adding native support for UDFs within dbt, and doing it in a consistent way (as much as possible!) on all adapters for data warehouses that support persistent UDFs. It seems like you've done your research, and have a good feel for what some of the existing patterns and sentiments are in the wider community. This feels even more important now that dbt supports Python (dbt-labs/dbt-core#5741). On some data platforms (e.g. Snowpark), UDFs are more than just an ergonomic way to call Python functions from SQL — they're also an important tool for achieving performant, parallelizable Python. On the flip side, this gets us over the traditional hesitation around UDFs: If it's not SQL (Python, JavaScript), it's probably gross & slow. And if it's just SQL, why not a macro, which dbt will clearly "compile" (template) to its "source code" representation? Nowadays, there are clearly good reasons to define and use Python UDFs, and we can give people the ability to do it in files with a You've asked some specific questions, which are fair. I have an even bigger question down below.
If we're going to implement this by building it into "dbt Core" (
The way we initially "supported" temporary UDFs on The big questionShould "UDF" be a type of model materialization, or a new node type in its own right? And is this any different for scalar versus tabular UDFs? (Some prior discussion in this older issue: #132.) The upside of implementing this as a custom materialization is expediency: You were able to do it just with mostly Jinja + SQL, and just in the adapter plugin, without having to modify I worry, though, that it breaks the mental model (!) of what a dbt model is: a query that returns a dataset, which can be compiled & previewed. (UDTFs are a bit closer to fitting the bill, if you ignore the fact that they can take arguments. They're almost like saved CTEs, or ephemeral models.) The appeal of separating "modeling" logic from "materialization" logic is that you can have the same basic SQL, and then just switch the materialization from The upside of implementing this as a totally new node type: No risk of confusion between what's a model, and what's a function. Like seeds or snapshots, functions would still be "materialized" in the database, they could be referenced in other models/functions/etc, and all the same patterns about node selection +
That, in turn, looks like starting by writing some throwaway code, without any expectation or guarantee that it's going to be merged. Is continuing down this path, and doing some additional discovery work to figure out the answer to this question (model materialization vs. new node type), something you'd be interested in, over the next weeks/months? If so, let's talk more about what an ongoing collaboration might look like! If not, and this is as far as you want to take it for now — I completely understand. (cc @lostmygithubaccount @ChenyuLInx - remember when we spiked this last August? internal Notion link) |
Hi! I have read through this issue and posted yesterday a question in the dbt Community Forum about this. What's the temporary way you recommend calling persistent functions from a dbt model? Just hardcoding project_id and dataset_id? Thanks! |
Question solved in the link! |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
I think this is an independent experimental implementation in a real-world public project https://tempered.works/posts/2024-02-19-udf-dbt-models/ (mentioned on Slack channel #i-made-this) |
Is this your first time submitting a feature request?
Describe the feature
There should be a native materialization for persistent UDFs in BigQuery. This materialization will enable users to create UDFs via
dbt run
and to track them appropriately within their DAGs and lineage graphs.Previously, BigQuery only supported temporary UDFs. The community wrote several issues related to the handling of temporary BigQuery UDFs. For example:
Support BigQuery UDFs (and other ddl) by pulling them out of "create or replace" dbt-core#1879
support UDFs on BigQuery dbt-core#1112
However, BigQuery now supports persistent UDFs. We should have a native materialization for persistent BigQuery UDFs! This issue - dbt-labs/dbt-core#136 - was created a while ago for cross-database UDF handling. The issue at hand, however, specifically concerns persistent BigQuery UDFs.
Describe alternatives you've considered
A few popular alternatives discussed throughout the community:
models
directory with theCREATE OR REPLACE
statement in the SQL. Usedbt
to compile the model SQL and reference the UDF models viaref
. This way, UDF models show up in the lineage graph. However, UDF creation is handled outside ofdbt run
Who will this benefit?
dbt
users who leverage persistent BigQuery UDFs and want to track them in their DAGs and lineage graphs. This is particularly useful for those who leverage UDFs to define reusable, core business logic.Are you interested in contributing this feature?
Yes! I've started working on this. Interested to see what people think.
Anything else?
Some questions:
The text was updated successfully, but these errors were encountered: