Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt get_url_parameter to work with SparkSQL #11

Open
1 of 5 tasks
foundinblank opened this issue Apr 19, 2021 · 2 comments
Open
1 of 5 tasks

Adapt get_url_parameter to work with SparkSQL #11

foundinblank opened this issue Apr 19, 2021 · 2 comments

Comments

@foundinblank
Copy link
Contributor

Describe the bug

The get_url_parameter() macro breaks on Spark SQL (Databricks). I've come up with a replacement macro that'll work on SparkSQL and am wondering if I could contribute that fix.

Steps to reproduce

This was triggered when setting up Google Ads which uses get_url_parameter() macros: https://github.com/fivetran/dbt_google_ads_source/blob/master/models/stg_google_ads__final_url_performance.sql#L30-L34.

Expected results

I expected no errors to be thrown and UTM parameters to be parsed out per the model definition.

Actual results

Model fails to build with the error message:

Runtime Error in model stg_google_ads__final_url_performance (models/stg_google_ads__final_url_performance.sql)
  Database Error
    Error running query: java.util.regex.PatternSyntaxException: Illegal Unicode escape sequence near index 2
    \utm_content=
      ^

It passes when using this local macro as a replacement (stored in our /macros folder) which overwrites dbt_util's macro:

{# SparkSQL-compatible version of dbt_utils.get_url_parameter #}

{%- macro default__get_url_parameter(field, url_parameter) -%}

{%- set formatted_url_parameter = "'" + url_parameter + "='" -%}

nullif(split(split(parse_url({{ field }}, 'QUERY'), {{ formatted_url_parameter }})[1],'&')[0], '')

{%- endmacro -%}

System information

packages.yml

packages:
  - package: fishtown-analytics/dbt_utils
    version: 0.6.4
  - package: fishtown-analytics/spark_utils
    version: 0.1.0
  - package: fishtown-analytics/dbt_external_tables
    version: 0.6.2
  - package: fivetran/google_ads
    version: 0.2.0
  - git: "https://github.com/netlify/segment.git"
    revision: master

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: Databricks)

The output of dbt --version:

installed version: 0.19.1
   latest version: 0.19.1

Up to date!

Plugins:
  - spark: 0.19.1

Are you interested in contributing the fix?

I'm happy to contribute my macro which works on SparkSQL. If there's a way for dbt_utils to know which database or adapter it's running on, it could pass the appropriate macro?

@clrcrl clrcrl transferred this issue from dbt-labs/dbt-utils Apr 19, 2021
@clrcrl
Copy link

clrcrl commented Apr 19, 2021

Just transferred this to the spark-utils repo, since I think we'll want to contribute the fix here rather than on dbt utils!

@jtcohen6
Copy link
Collaborator

@clrcrl Thanks for transferring!

@foundinblank I think this could be fixed by the improvements to spark__split_part in spark-utils v0.2.0 (just released last week). Could you try upgrading your version of spark-utils, and see if that works any better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants