Support BigQuery #6

yu-iskw · 2021-03-11T22:27:38Z

The dbt package is awesome. I would like to do the same thing for BigQuery. However, I am not sure we can realize the same with only run_query, because for instance we have to upload artifacts files to GCS in the case of BigQuery. As far as I know, there is no statement type to put it to GCS or BigQuery directly.

NOTE

Unfortunately, there is nothing like DDL to load data from local. It would be worthwhile asking that to Google Cloud.

https://cloud.google.com/bigquery/docs/batch-loading-data#bq

The text was updated successfully, but these errors were encountered:

aaronsteers · 2021-06-10T18:58:35Z

I'm curious also about supporting other sources. Is the copy of the raw data the primary blocker? Or said another way, are there any other platform-specific code which could not be written generically?

And for the copy operation, any chance could we trick/hijack the seed/snapshot capability to get a similar result. There must be a generic api layer for data ingestion in order to make seed capabilities work across platforms. (I know this is a longshot, and probably not a critical priority, but I'm curious to others' thoughts on this.) Thanks!

NiallRees · 2022-01-11T11:26:26Z

My initial thoughts are that we could extend the adapter for each warehouse to add an upload function which is accessible through a
{% do adapter.upload_file(file_path, destination) %}
or similar. That function can then just call any API methods needed to load the data into the warehouse.

From @jtcohen6:

dbt-bigquery connects using google's python clients, and I know those also support uploading JSON files. I tried doing this way back when, hacking into some methods intended for seeds, and got pretty close. I do think it requires need a few (probably simple) changes to the adapter code.

Brooklyn Data can take this on in the next few weeks, or contributions are very welcome! See https://github.com/dbt-labs/dbt-bigquery.

pgoslatara · 2022-02-06T19:07:25Z

From @jtcohen6:

dbt-bigquery connects using google's python clients, and I know those also support uploading JSON files. I tried doing this way back when, hacking into some methods intended for seeds, and got pretty close. I do think it requires need a few (probably simple) changes to the adapter code.

@NiallRees Do you have a link for this quote? I'm interested in seeing if there is more background to this statement and my googling skills have failed me when I tried to find this quote.

NiallRees · 2022-02-06T21:09:31Z

@NiallRees Do you have a link for this quote? I'm interested in seeing if there is more background to this statement and my googling skills have failed me when I tried to find this quote.

Hi, this was in a message so I don't. You could definitely ask for some pointers over on the issue in dbt-bigquery though! dbt-labs/dbt-bigquery#102

tuftkyle · 2022-03-22T18:51:50Z

It looks like the macro to upload files to BigQuery has been completed and will be included in dbt v1.1 . Maybe we can start working on a port now?

pgoslatara · 2022-03-23T09:34:11Z

@tuftkyle Thanks for the interest in this! I've been looking into this issue for a few weeks (I'm the author MR you reference), I think the next step is still within dbt-bigquery, the artifacts produced by dbt are JSON whereas BigQuery supports NDJSON files for uploads (mentioned in this comment). I'm not sure how to convert JSON to NDJSON using python, I'm also not sure how to handle periods (".") in the keys in manifest.json as BigQuery does not support column names with periods. Maybe we can replace periods with double underscores, or remove them? Welcome opinions on this as I don't know the best approach!

NiallRees · 2022-03-23T19:23:33Z

Hey @pgoslatara - the artifacts produced by dbt can be considered newline delimited, they just have one object per file, so we should be good to go there. On the periods (".") in the keys, that is fine as long as we use the JSON subscript operator in BigQuery e.g. manifest['key.with.periods'].

To go about this, I'd suggest first attempting to upload the artifacts into a BigQuery table (or table for each artifact type if needed), and from there working out how to get them into the format required by https://github.com/brooklyn-data/dbt_artifacts/blob/main/models/staging/stg_dbt__artifacts.sql. After that, it should just be a case of making the models compatible with both BigQuery and Snowflake.

I'm happy to be as involved as required so please let me know if you'd like more help :)

pgoslatara · 2022-04-12T06:56:39Z

@NiallRees I've submitted MR153 for this, currently working through some suggested changes.

Can you elaborate on the "as long as we use the JSON subscript operator" element of your comment? I'm not sure I follow this fully. If an artifact is uploaded as a STRING I understand that this can be converted to a JSON data type (although this is still in preview and requires enrollment to access). The blocker I see is that after this conversion there is still a period in the key name and I'm not sure if BigQuery can handle that (as I don't have access to these features yet).

Add tests, exposures and snapshots verticals

NiallRees · 2022-07-21T21:05:26Z

Hey again all - we've been busy reimplementing the package, opening it up to be compatible with more adapters, without having to come up with a warehouse specific way of uploading the artifacts - by avoiding the artifacts altogether. We now are uploading the graph and results context variables.

In the new world, adding BigQuery compatibility involves implementing a BigQuery version of each warehouse dispatch-able macro defined in https://github.com/brooklyn-data/dbt_artifacts/tree/main/macros. Let us know if you'd be interested in contributing those changes!

charles-astrafy · 2022-07-22T07:02:24Z

Awesome, I will be working on making it compatible for BigQuery and will start working on it tomorrow. Anyone up for pair programming on this, please let me know.

NiallRees · 2022-08-01T12:26:06Z

How are you getting on here @charles-astrafy? Shout if I can help at all!

charles-astrafy · 2022-08-01T18:00:21Z

Sorry have been sidetracked a bit with other stuff. I have time to work on it tomorrow and will give an update tomorrow EOD.

NiallRees · 2022-08-01T20:01:11Z

All good @charles-astrafy no pressure from over here!

charles-astrafy · 2022-08-02T16:18:50Z

@NiallRees started to work on it as of today. Will keep you posted and will make time on a daily basis in the coming days. Might need your guidance/help if I encounter some blockers but all good at the moment.

NiallRees · 2022-08-02T16:39:57Z

Awesome @charles-astrafy!

charles-astrafy · 2022-08-10T08:16:37Z

@NiallRees Just a small update. I have been doing good progress but quite some refactoring needed to make it work with BigQuery. For instance the upload macros are at the moment generic with "SELECT .... FROM values ( ...) , ( ...)". This syntax does not work for BigQuery so I am adding a dispatch abstraction layer for those macros. I should have a pull request ready by Friday.

NiallRees · 2022-08-10T08:52:01Z

Sounds great @charles-astrafy, appreciate your efforts! Really looking forward to seeing how you get it working.

NiallRees · 2022-08-10T08:52:24Z

clicked the wrong button 🙈

charles-astrafy · 2022-08-12T14:12:36Z

@NiallRees --- Pull request done.

#172

adrpino · 2022-08-24T10:54:52Z

Hi all! Sorry for jumping in! This could be very helpful as well for me :D

Is this only needing review? is it fully functional already @charles-astrafy?

Thanks!

NiallRees mentioned this issue Jan 11, 2022

Add macro/run-operation for uploading files from local filesystem to BigQuery. dbt-labs/dbt-bigquery#102

Closed

JCZuurmond mentioned this issue Mar 23, 2022

[CT-411] [Feature] Seed json files dbt-labs/dbt-core#4936

Closed

1 task

NiallRees added a commit that referenced this issue Jul 21, 2022

Merge pull request #6 from brooklyn-data/tests_exposures_snapshots

1d0da9c

Add tests, exposures and snapshots verticals

yu-iskw mentioned this issue Jul 22, 2022

Adding upload_run_results macro dbt-labs/dbt-bigquery#153

Closed

4 tasks

jecolvin added the enhancement New feature or request label Aug 2, 2022

NiallRees closed this as completed Aug 10, 2022

NiallRees reopened this Aug 10, 2022

charles-astrafy mentioned this issue Aug 29, 2022

Support for BigQuery adapter #172

Merged

jaypeedevlin closed this as completed Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support BigQuery #6

Support BigQuery #6

yu-iskw commented Mar 11, 2021 •

edited

Loading

aaronsteers commented Jun 10, 2021 •

edited

Loading

NiallRees commented Jan 11, 2022

pgoslatara commented Feb 6, 2022

NiallRees commented Feb 6, 2022

tuftkyle commented Mar 22, 2022

pgoslatara commented Mar 23, 2022

NiallRees commented Mar 23, 2022

pgoslatara commented Apr 12, 2022

NiallRees commented Jul 21, 2022

charles-astrafy commented Jul 22, 2022

NiallRees commented Aug 1, 2022

charles-astrafy commented Aug 1, 2022

NiallRees commented Aug 1, 2022

charles-astrafy commented Aug 2, 2022

NiallRees commented Aug 2, 2022

charles-astrafy commented Aug 10, 2022

NiallRees commented Aug 10, 2022

NiallRees commented Aug 10, 2022

charles-astrafy commented Aug 12, 2022 •

edited

Loading

adrpino commented Aug 24, 2022

Support BigQuery #6

Support BigQuery #6

Comments

yu-iskw commented Mar 11, 2021 • edited Loading

NOTE

aaronsteers commented Jun 10, 2021 • edited Loading

NiallRees commented Jan 11, 2022

pgoslatara commented Feb 6, 2022

NiallRees commented Feb 6, 2022

tuftkyle commented Mar 22, 2022

pgoslatara commented Mar 23, 2022

NiallRees commented Mar 23, 2022

pgoslatara commented Apr 12, 2022

NiallRees commented Jul 21, 2022

charles-astrafy commented Jul 22, 2022

NiallRees commented Aug 1, 2022

charles-astrafy commented Aug 1, 2022

NiallRees commented Aug 1, 2022

charles-astrafy commented Aug 2, 2022

NiallRees commented Aug 2, 2022

charles-astrafy commented Aug 10, 2022

NiallRees commented Aug 10, 2022

NiallRees commented Aug 10, 2022

charles-astrafy commented Aug 12, 2022 • edited Loading

adrpino commented Aug 24, 2022

yu-iskw commented Mar 11, 2021 •

edited

Loading

aaronsteers commented Jun 10, 2021 •

edited

Loading

charles-astrafy commented Aug 12, 2022 •

edited

Loading