Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for BigQuery adapter #172

Merged
merged 18 commits into from
Aug 30, 2022
Merged

Conversation

charles-astrafy
Copy link
Contributor

No description provided.

@charles-astrafy charles-astrafy had a problem deploying to Approve Integration Tests August 12, 2022 14:11 Failure
@charles-astrafy charles-astrafy had a problem deploying to Approve Integration Tests August 12, 2022 14:11 Failure
@charles-astrafy charles-astrafy mentioned this pull request Aug 12, 2022
@NiallRees
Copy link
Contributor

Thank you @charles-astrafy ! Would you mind adding a description with any key design decisions or things to be aware of in this? That would be super helpful!

@charles-astrafy
Copy link
Contributor Author

charles-astrafy commented Aug 12, 2022

@NiallRees

Main design decisions:

  • In each "create_" macros, a BigQuery macro has been added
  • Biggest design change is in the "upload_" macros. Current code could not accommodate BigQuery syntax seamlessly. Main macro name in each "upload_" file remained but content of that function is now to dispatch to various adapters to get the SQL content that will allow to insert records.
  • The "insert_into_metadata" macro has also been decoupled and put within the "upload_results" macro to keep the code DRY.
  • A logical "if" in case DML SQL is empty has been added to the macro "insert_into_metadata_table".

All those changes are in commit 0d5e732.

Other commits details:

8f8286b: Feature for incremental logic in staging tables. Nice one in case you want to materialize your staging tables; the logic is present in the SQL but won't be applied if you keep those models as views.
40cc3ef: Add bytes_processed column in model_executions models. Relevant information for BigQuery.
628f680: Add configurations for integration tests for BigQuery. All code for testing the BigQuery adapter.
6b9d101: some README update.
bc9ea30: only consider bytes_processed column if adapter type is BigQuery

Let me know if you need any further explanation and we can jump in a call if you prefer.

@charles-astrafy charles-astrafy had a problem deploying to Approve Integration Tests August 15, 2022 13:55 Failure
@charles-astrafy charles-astrafy had a problem deploying to Approve Integration Tests August 15, 2022 13:55 Failure
README.md Outdated
Comment on lines 64 to 78
You can also implement your own logic by overriding the default [generate_database_name](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-custom-databases)
and [generate_schema_name](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-custom-schemas) macros.

Note that the model materializations are defined in this package's `dbt_project.yml` as views. Incremental logic is set up in the
staging models and if you want to leverage this materialization, you need to specify the following configuration in the `dbt_project.yml`
of your project:

```yml
models:
...
dbt_artifacts:
staging:
+materialized: incremental
...
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this as it's not within scope of the PR if ok :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure ;) and will put it in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NiallRees : removed from the PR

README.md Outdated Show resolved Hide resolved
@NiallRees NiallRees had a problem deploying to Approve Integration Tests August 26, 2022 14:50 Failure
@NiallRees NiallRees had a problem deploying to Approve Integration Tests August 26, 2022 14:50 Failure
@NiallRees NiallRees had a problem deploying to Approve Integration Tests August 26, 2022 14:50 Failure
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 26, 2022 14:50 Inactive
@NiallRees
Copy link
Contributor

Hey @charles-astrafy left a couple of comments and got the BQ integration tests running now. Let me know if I can be of any help to getting this over the line and I appreciate your patience!

@charles-astrafy
Copy link
Contributor Author

Thank you for your comments. Will block some time this weekend to work on it and will keep you posted as soon as those small changes are done.

Great job for the BQ integration tests.


{% set insert_into_table_query %}
insert into {{database_name}}.{{ schema_name }}.{{ table_name }}
VALUES
Copy link

@adrpino adrpino Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @charles-astrafy I was just playing around with your branch, and I got this query from from the statement that stores the tests (I've generalized the info yo you can play with it in your BQ instance):

insert into project.dbt_artifacts.tests
VALUES

select
  1,
  2,
  3,
  4,
  parse_json(5),
  6,
  7,
  parse_json(8)
from values
(
  '061f741c-c350-4774-9343-c409f0518d48', 
  'test.project.test_name', 
  '2022-08-24 20:46:12.337589+00:00', 
  'test_name', 
  '["model.project.model_name"]', 
  'project', 
  'path/to/model.sql', 
  '[]' 
)

However this query produces an error, saying that

Syntax error: INSERT target cannot have an alias at [2:1]

Is there something I'm doing wrong?

Copy link
Contributor Author

@charles-astrafy charles-astrafy Aug 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrpino : I think you are doing smth wrong because the DML that insert values in the raw table for BigQuery does not use the syntax
insert into {{database_name}}.{{ schema_name }}.{{ table_name }} select ...... from values
but it uses instead the syntax:
insert into {{database_name}}.{{ schema_name }}.{{ table_name }} VALUES

So it seems you are not running the code with the BigQuery adapter as it is taking the code from the default adapter.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @charles-astrafy thanks!

is there any way in which I have to enforce the BQ adapter? isn't this done by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @adrpino ,

The adapter that dbt will use is defined via your profiles.yaml file. You define it in the "type" parameter of the target definition within that profiles.yaml file (see screenshot in attachment).

Let me know if you still have issues.

image

@charles-astrafy charles-astrafy temporarily deployed to Approve Integration Tests August 29, 2022 12:50 Inactive
@charles-astrafy charles-astrafy temporarily deployed to Approve Integration Tests August 29, 2022 15:01 Inactive
@charles-astrafy charles-astrafy temporarily deployed to Approve Integration Tests August 29, 2022 15:01 Inactive
@charles-astrafy charles-astrafy temporarily deployed to Approve Integration Tests August 29, 2022 15:01 Inactive
@charles-astrafy charles-astrafy temporarily deployed to Approve Integration Tests August 29, 2022 15:01 Inactive
@NiallRees NiallRees had a problem deploying to Approve Integration Tests August 29, 2022 15:17 Failure
@NiallRees NiallRees had a problem deploying to Approve Integration Tests August 29, 2022 15:17 Failure
@NiallRees NiallRees had a problem deploying to Approve Integration Tests August 29, 2022 15:17 Failure
@NiallRees NiallRees had a problem deploying to Approve Integration Tests August 29, 2022 15:17 Failure
README.md Outdated
Comment on lines 39 to 42
4. In case you don't want the artifacts to be uploaded for certain dbt commands (for instance 'dbt compile'), add the following conditional:
`on-run-end: "{% if not var('dbt_artifacts_upload_results', False) %}{{ dbt_artifacts.upload_results(results) }}{% endif %}"`
If you then run the following command by providing a variable to the command, the 'on-run-end' will not execute:
`dbt compile --vars 'artifact: disable'`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to remove this for now but feel free to add to a separate PR @charles-astrafy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will put it in a separate PR.

README.md Outdated Show resolved Hide resolved
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 29, 2022 15:19 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 29, 2022 15:19 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 29, 2022 15:19 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 29, 2022 15:19 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 29, 2022 15:42 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 29, 2022 15:42 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 29, 2022 15:42 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 29, 2022 15:42 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 30, 2022 06:38 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 30, 2022 06:38 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 30, 2022 06:38 Inactive
@NiallRees NiallRees temporarily deployed to Approve Integration Tests August 30, 2022 06:38 Inactive
@NiallRees NiallRees merged commit 3a648f0 into brooklyn-data:main Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants