-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: bytes processed should be logged for all dbt commands that perform queries #2747
Comments
This makes sense to me! I think the long-term answer here still looks like giving users more control via configurable log output (#2580). In the meantime, I see the value of reporting bytes processed for queries that can be sneakily expensive, such as schema tests or freshness checks that scan massive tables. I think the change would need to happen in the "everything else" query result: In particular, you'll want to pull in
Given that "if present" caveat, it'd be a good idea to add some handling in the event that Take a swing at this, and let me know if you run into any trouble! |
@jtcohen6 Is the STATUS produced in this |
Looking at the BigQuery logs, I noticed that the |
@alepuccetti @bodschut. That sounds like a good approach. I knew this team would find a way :) |
Hi guys I've done some digging in the code and unfortunately I'm afraid it won't be as simple as we think :-) Let me explain what I found. As you can see in the For a normal I'm pretty sure that the The issue with the As you can see, the adapter returns both the status string (catched here in the So, I think we will need to adapt how this status string is passed up the call chain, but as we're stepping out of the bigquery plugin and into the core dbt code, I'm a bit reluctant to start changing stuff... So @jtcohen6 could you maybe look for someone with knowledge of this part of the core code to see what we could do here? |
Good catch @bodschut, and sorry for leading you astray further up. I believe this is the code responsible for printing data + schema test results: To include bytes processed, I think we'd need to change @beckjake Do you have thoughts about how we could incorporate structured representation of integer values (rows affected, byes processed) into the proposed run results spec? I wouldn't say that the way we've done this for model run results today is all that good, either: |
The proposal in #2493 already handles the "status string vs status integer" issue, so that's nice. For the bytes process/rows affected aspect, I see a couple ways we can do it:
I don't have strong feelings either way, both are probably valid. The first option is easier to implement and probably doesn't have to be a breaking change for adapters, the second option is probably more maintainable in the long run, gives more flexibility for adapters to implement custom result types, and probably is better for consumers on some level since they can know what they're consuming. We could probably start with the first and transition to the second, though it'll be unpleasant to maintain if we have to preserve the schema and the total development time will surely be much longer than either of the choices on their own. |
IIRC, after discussing this live, we decided we'd opt for:
The trickiness will be in surfacing that data to stdout. While this isn't a blocker for #2493, it's highly related. I'm removing "good first issue" and "bigquery"—this gets into our core plumbing—and pulling it into v0.19. Edit: Eventually, we may want this to be a list of dictionaries ( |
#2961 adds an adapter-specific dict to |
Describe the feature
With the release of dbt 0.18.0, launching a
dbt run
command against a bigquery database will output the bytes processed when a model is successfully deployed. However, to get a full picture of the amount of bytes dbt processes on bigquery (and hence to get an idea of the cost of running a specific dbt command), this should also be logged when launchingdbt test
,dbt source snapshot-freshness
anddbt run-operation
commands.Additional context
For
dbt run
commands, this feature was discussed in #2526, it would be nice to extend this to also log this number for the other commands mentioned above.Who will this benefit?
This is a bigquery specific issue.
Are you interested in contributing this feature?
Sure, happy to see how we could add this.
The text was updated successfully, but these errors were encountered: