-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support BigQuery #6
Comments
I'm curious also about supporting other sources. Is the And for the copy operation, any chance could we trick/hijack the seed/snapshot capability to get a similar result. There must be a generic api layer for data ingestion in order to make seed capabilities work across platforms. (I know this is a longshot, and probably not a critical priority, but I'm curious to others' thoughts on this.) Thanks! |
My initial thoughts are that we could extend the adapter for each warehouse to add an upload function which is accessible through a From @jtcohen6:
Brooklyn Data can take this on in the next few weeks, or contributions are very welcome! See https://github.com/dbt-labs/dbt-bigquery. |
@NiallRees Do you have a link for this quote? I'm interested in seeing if there is more background to this statement and my googling skills have failed me when I tried to find this quote. |
Hi, this was in a message so I don't. You could definitely ask for some pointers over on the issue in dbt-bigquery though! dbt-labs/dbt-bigquery#102 |
It looks like the macro to upload files to BigQuery has been completed and will be included in dbt v1.1 . Maybe we can start working on a port now? |
@tuftkyle Thanks for the interest in this! I've been looking into this issue for a few weeks (I'm the author MR you reference), I think the next step is still within |
Hey @pgoslatara - the artifacts produced by dbt can be considered newline delimited, they just have one object per file, so we should be good to go there. On the periods (".") in the keys, that is fine as long as we use the JSON subscript operator in BigQuery e.g. manifest['key.with.periods']. To go about this, I'd suggest first attempting to upload the artifacts into a BigQuery table (or table for each artifact type if needed), and from there working out how to get them into the format required by https://github.com/brooklyn-data/dbt_artifacts/blob/main/models/staging/stg_dbt__artifacts.sql. After that, it should just be a case of making the models compatible with both BigQuery and Snowflake. I'm happy to be as involved as required so please let me know if you'd like more help :) |
@NiallRees I've submitted MR153 for this, currently working through some suggested changes. Can you elaborate on the "as long as we use the JSON subscript operator" element of your comment? I'm not sure I follow this fully. If an artifact is uploaded as a STRING I understand that this can be converted to a JSON data type (although this is still in preview and requires enrollment to access). The blocker I see is that after this conversion there is still a period in the key name and I'm not sure if BigQuery can handle that (as I don't have access to these features yet). |
Add tests, exposures and snapshots verticals
Hey again all - we've been busy reimplementing the package, opening it up to be compatible with more adapters, without having to come up with a warehouse specific way of uploading the artifacts - by avoiding the artifacts altogether. We now are uploading the graph and results context variables. In the new world, adding BigQuery compatibility involves implementing a BigQuery version of each warehouse dispatch-able macro defined in https://github.com/brooklyn-data/dbt_artifacts/tree/main/macros. Let us know if you'd be interested in contributing those changes! |
Awesome, I will be working on making it compatible for BigQuery and will start working on it tomorrow. Anyone up for pair programming on this, please let me know. |
How are you getting on here @charles-astrafy? Shout if I can help at all! |
Sorry have been sidetracked a bit with other stuff. I have time to work on it tomorrow and will give an update tomorrow EOD. |
All good @charles-astrafy no pressure from over here! |
@NiallRees started to work on it as of today. Will keep you posted and will make time on a daily basis in the coming days. Might need your guidance/help if I encounter some blockers but all good at the moment. |
Awesome @charles-astrafy! |
@NiallRees Just a small update. I have been doing good progress but quite some refactoring needed to make it work with BigQuery. For instance the upload macros are at the moment generic with "SELECT .... FROM values ( ...) , ( ...)". This syntax does not work for BigQuery so I am adding a dispatch abstraction layer for those macros. I should have a pull request ready by Friday. |
Sounds great @charles-astrafy, appreciate your efforts! Really looking forward to seeing how you get it working. |
clicked the wrong button 🙈 |
@NiallRees --- Pull request done. |
Hi all! Sorry for jumping in! This could be very helpful as well for me :D Is this only needing review? is it fully functional already @charles-astrafy? Thanks! |
The dbt package is awesome. I would like to do the same thing for BigQuery. However, I am not sure we can realize the same with only
run_query
, because for instance we have to upload artifacts files to GCS in the case of BigQuery. As far as I know, there is no statement type to put it to GCS or BigQuery directly.NOTE
Unfortunately, there is nothing like DDL to load data from local. It would be worthwhile asking that to Google Cloud.
The text was updated successfully, but these errors were encountered: