Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent UDF Materialization #454

Closed
wants to merge 8 commits into from

Conversation

anaghshineh
Copy link

@anaghshineh anaghshineh commented Jan 1, 2023

resolves #451

Description

Adds udf materialization in support of BQ persistent SQL UDFs. This materialization allows users to declare SQL UDFs as models and manage them using dbt commands, like dbt ls, dbt compile, and dbt run.

The new materialization takes two optional configuration arguments - args and return_type. The former is an array of dictionaries. Each individual dictionary represents a single UDF argument. A single dictionary consists of name and type keys. The name key specifies the name of the argument. The type key specifies the type of the argument (e.g.,. INT64 or STRING). An array is used to preserve the order of the arguments provided - the order in which arguments are listed in the array will be the order in which they are declared as arguments for the associated UDF. The return_type configuration argument specifies the type of the item returned by the UDF (e.g., STRING or STRUCT<domain STRING, path STRING>).

Checklist

@cla-bot
Copy link

cla-bot bot commented Jan 1, 2023

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @anaghshineh

@anaghshineh anaghshineh changed the title Udf materialization Persistent UDF Materialization Jan 1, 2023
@cla-bot cla-bot bot added the cla:yes label Jan 3, 2023
@anaghshineh anaghshineh marked this pull request as ready for review January 4, 2023 00:15
@dataders dataders requested review from dbeatty10, a team and Fleid January 4, 2023 16:20
Comment on lines +595 to +602
def get_bq_routine(self, database: str, schema: str, identifier: str) -> google.cloud.bigquery.Routine:
"""Get a BigQuery routine (UDF) for a schema/model."""
conn = self.get_thread_connection()
# backwards compatibility: fill in with defaults if not specified
database = database or conn.credentials.database
schema = schema or conn.credentials.schema
routine_ref = self.routine_ref(database, schema, identifier)
return conn.handle.get_routine(routine_ref)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason you can't use info_schema.routines? If we convert these python functions to SQL macro equivalents, we could publish this materialization as a dbt package (at least for an trial period).
https://cloud.google.com/bigquery/docs/information-schema-routines

Copy link
Contributor

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @anaghshineh this is so cool! Before we think about merging this code in, I'd love to share this with the community. Would you be interesting in forking jaffle shop to show how this new materialization would be used? Maybe even recording a quick demo showing how the DAG changes?

another random thought. what happens if you {{ ref() }} a UDF from another model again?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 5, 2023

@anaghshineh This is awesome, thanks so much for taking the time & initiative to contribute. I haven't had a chance to play around with this yet, but to @dataders' point, it might already be in a place where community members could copy-paste macros (or install from a package!) to try this out and give early feedback. I think that type of feedback would be really valuable, given that we're talking about a pretty meaningful addition to the dbt user experience on modern data platforms that support persistent UDFs.

To that end, I left a big comment + question over on the linked issue: #451 (comment)

@Fleid
Copy link
Contributor

Fleid commented Feb 1, 2023

Hi @anaghshineh, I'm going to close that PR for now.

We haven't reach a clear consensus on what to do with UDFs yet, and next step would be to build momentum towards that following @dataders' advice.
Please let us know if you need help getting there, if you do want to keep pushing on that topic.

This is fantastic work that you did here, and I feel bad about closing this PR.
Now that comes with the territory of dbt being an opinionated tool - @jtcohen6 explained in details the context on this specific issue - which we all love, but still, it's not a great feeling.

We will leverage your work if/when the situation evolves.

@Fleid Fleid closed this Feb 1, 2023
@brabster
Copy link

brabster commented Feb 19, 2024

I think this is an independent experimental implementation in a real-world public project https://tempered.works/posts/2024-02-19-udf-dbt-models/ (mentioned on Slack channel #i-made-this)

eg. https://github.com/brabster/pypi_vulnerabilities/blob/main/models/published/udfs/matches_multi_spec.sql

@brabster
Copy link

brabster commented Jul 2, 2024

Note discussion #10395 on support for UDFs as a materialization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1732] [Feature] Add Native Persistent UDF Materialization
6 participants