Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expectations break when expecting a Struct in BigQuery #166

Closed
JakoboEU opened this issue Sep 1, 2023 · 11 comments
Closed

expectations break when expecting a Struct in BigQuery #166

JakoboEU opened this issue Sep 1, 2023 · 11 comments
Labels
enhancement New feature or request

Comments

@JakoboEU
Copy link

JakoboEU commented Sep 1, 2023

Generating an expected result such as:

    {% call dbt_unit_testing.expect() %}
        SELECT
            DATE('2023-02-11') AS snapshot_date,
            STRUCT(
                '01'                AS event_id,
                DATE('2022-01-01')  AS created_on,
                'Sam'         AS name
            ) AS event
        UNION ALL
        SELECT
            DATE('2023-02-11') AS snapshot_date,
            STRUCT(
                '02'                AS event_id,
                DATE('2022-01-01')  AS created_on,
                'Bob'         AS name
            ) AS event
    {% endcall %}

Results in the following SQL being run against BigQuery:

select * FROM 
      (select count(1) as expectation_count from (
        select count(1) as count, `snapshot_date`,`event` from (
        SELECT
            DATE('2023-02-11') AS snapshot_date,
            STRUCT(
                '01'                AS event_id,
                DATE('2022-01-01')  AS created_on,
                'Sam'         AS name
            ) AS event
        UNION ALL
        SELECT
            DATE('2023-02-11') AS snapshot_date,
            STRUCT(
                '02'                AS event_id,
                DATE('2022-01-01')  AS created_on,
                'Bob'         AS name
            ) AS event
    ) as s group by `snapshot_date`,`event`

Which generates the following error:

Grouping by expressions of type STRUCT is not allowed at [21:37]

Because group by snapshot_date, event is illegal due to event being a struct.

dbt_unit_testing version:

  - package: EqualExperts/dbt_unit_testing
    version: 0.3.2

dbt version:

Running with dbt=1.5.1
@dbrtly
Copy link

dbrtly commented Oct 31, 2023

I get exactly the same issue.

I think if we use the bigquery function to_json_string the comparison would get around this problem.

@psousa50 psousa50 added the enhancement New feature or request label Jan 11, 2024
@psousa50
Copy link
Collaborator

Please take a look at the latest release (0.4.0) and check if it fixes this issue.

Thank you!

@JakoboEU
Copy link
Author

I still get exactly the same behaviour. The generated SQL being run against bigquery has a group by struct column. This produces the bigquery error:

"Grouping by expressions of type STRUCT is not allowed" .

@JakoboEU
Copy link
Author

@dbrtly solution would work; however, the test subject would need modifying to use TO_JSON_STRING as well. I guess if the library replaced STRUCT( ... ) with TO_JSON_STRING(STRUCT( ...)) in both the expectation and the test subject then this would resolve the issue. Seems a bit hacky though having to change the test subject.

@psousa50
Copy link
Collaborator

Hi @JakoboEU, can you post here the model and test that you are running? Did you apply a column transformation to all of the struct columns in the model?

@JakoboEU
Copy link
Author

@psousa50 I used the same test as in the issue description.

{% call dbt_unit_testing.expect() %}
SELECT
    DATE('2023-02-11') AS snapshot_date,
    STRUCT(
    '01'                AS event_id,
    DATE('2022-01-01')  AS created_on,
    'Sam'         AS name
    ) AS event
UNION ALL
SELECT
    DATE('2023-02-11') AS snapshot_date,
    STRUCT(
    '02'                AS event_id,
    DATE('2022-01-01')  AS created_on,
    'Bob'         AS name
    ) AS event
    {% endcall %}

which gets compiled to

/* {"app": "dbt", "dbt_version": "1.6.6", "profile_name": "default", "target_name": "local", "node_id": "test.xxxxxxxxx"} */

    select * FROM 
      (select count(1) as expectation_count from (
        select count(1) as count, `snapshot_date`, `event` from (
SELECT
    DATE('2023-02-11') AS snapshot_date,
    STRUCT(
    '01'                AS event_id,
    DATE('2022-01-01')  AS created_on,
    'Sam'         AS name
    ) AS event
UNION ALL
SELECT
    DATE('2023-02-11') AS snapshot_date,
    STRUCT(
    '02'                AS event_id,
    DATE('2022-01-01')  AS created_on,
    'Bob'         AS name
    ) AS event
    ) as s group by `snapshot_date`, `event`
  
      ) as exp) as exp_count,
      (select count(1) as actual_count from (
        select count(1) as count, `snapshot_date`, `event` from ( 
      with
      `src_xxx` as (
      

    
    select * from `xxx`.`xxx`.`xxx` where false)

    select * from (

WITH
raw_xxx AS (
  SELECT
      to_hex(md5(cast(coalesce(cast(guid as 
    string
), '') || '-' || coalesce(cast(last_modified_date as 
    string
), '') as 
    string
))) AS event_id,
      guid AS review_id,
      ....
  FROM `src_xxx`
)
SELECT * EXCEPT(row_num)
FROM (
  SELECT
    *,
    row_number() OVER (PARTITION BY event_id) AS row_num
  FROM raw_xxx
)
WHERE row_num = 1 
 ) as t ) as s group by `snapshot_date`, `event`
  
      ) as act) as act_count

@psousa50
Copy link
Collaborator

HI @JakoboEU, I would like to see the full test, the dbt-unit-testing.test macro, where you define your column transformations unless you are defining them in dbt_project.yml

@JakoboEU
Copy link
Author

Hi @psousa50 , I am not aware of any transformations.

The whole test looks like

{{
    config(
        tags=['unit-test', 'unit_test'],
        run_date="2023-02-11"
    )
}}

{% call dbt_unit_testing.test('xxx', 'blah blah blah') %}
    {% call dbt_unit_testing.expect() %}
        # as above
    {% endcall %}
{% endcall %}

And I don't have anything defined in my dbt_project.yml

@psousa50
Copy link
Collaborator

Oh, I see the problem now 🙂 For you to be able to use STRUCT columns you need to apply a column transformation to those columns, as explained in the README here. Please refer to this use case for instructions on how to do it.

If you have any issue implementing this please let me know, I'll try to assist you

@JakoboEU
Copy link
Author

That's the ticket.

Thank you. Works well.

@psousa50
Copy link
Collaborator

Cool 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants