-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add new bq job timeout and retry config #50
add new bq job timeout and retry config #50
Conversation
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Hui Zheng.
|
c363bb1
to
c5f8874
Compare
hello @jtcohen6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hui-zheng These changes look good to me.
As with all connection-related issues, it's difficult to stand up integration tests that replicate behavior in the wild. If you've been running this branch locally / in your deployment environment with success, that's a pretty strong vote of confidence for me. Is that something you've been able to do?
If so, could you also add an entry to the changelog (dbt-bigquery 1.0.0 (Release TBD)
> Features
), and add yourself to the list of contributors?
After merging, the last step is to update the prerelease docs: https://next.docs.getdbt.com/reference/warehouse-profiles/bigquery-profile#optional-configurations
'retries': 'job_retries', | ||
'timeout_seconds': 'job_execution_timeout_seconds', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clever! Aliases also affect node-level configs, but I don't see a reason why we can't do it this way
I haven't tested out this exact PR in our deployment environment. We currently use dbt v0.20.2, in which the dbt-bigquery is still part of dbt-core. We have a similar local patch fix to it. I am happy to test this PR out. Could you let me know which stable version of dbt-core it could pair up for testing? Could you give me some direction on how to install this |
@hui-zheng This change will be going into dbt-bigquery v1.0.0, which is still in prerelease (and pairs with the dbt-core v1.0.0 prereleases). So there isn't a stable version (yet) to test in production. By installing this package locally ( As it is, these changes look good. Given that we're coming up on v1.0.0-rc1, we may merge to include and test more in the release candidate. |
Going to close and reopen just to trigger adapter integration tests |
I try
Full error message, Click to expand!
My branch is up to date with My python versions
Do you know how to resolve it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hui-zheng I don't know exactly what went wrong there, but it seems like an issue with trying to install dbt-bigquery
in an environment that already has dbt-core==1.0.0b1
. The simplest solution might be to clear away the virtual environment, create a fresh one, and try again from there.
@@ -323,17 +330,24 @@ def open(cls, connection): | |||
return connection | |||
|
|||
@classmethod | |||
def get_timeout(cls, conn): | |||
def get_job_execution_timeout_seconds(cls, conn): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some integration test failures like:
Unhandled error while executing ...
'BigQueryConnectionManager' object has no attribute 'get_timeout'
We need to replace get_timeout
→ get_job_execution_timeout_seconds
within load_dataframe
, to fix dbt seed
:
dbt-bigquery/dbt/adapters/bigquery/impl.py
Line 700 in bc6113c
timeout = self.connections.get_timeout(conn) |
Plus one unit test:
self.connections.get_timeout = lambda x: 100.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could look into that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…:hui-zheng/dbt-bigquery into enhancement/bq-job-retry-timeout-configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated Changelog
@@ -323,17 +330,24 @@ def open(cls, connection): | |||
return connection | |||
|
|||
@classmethod | |||
def get_timeout(cls, conn): | |||
def get_job_execution_timeout_seconds(cls, conn): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@jtcohen6 , How do I update the docs? |
I triggered the integration test to re-run. (We had some intermittent failures over the past few days due to our sandbox BQ environment.) It looks like there are some flake8 failures (simple), and failures in the unit tests related to mock code (less simple). One of the failures looks like:
Unfortunately, I can't dig in deeper right now. It should be somewhat easier to run + reproduce reproduce those unit test failures locally. Let me know if you're still stuck later this week, and I can try to dive in.
Docs updates happen in this repo: https://github.com/dbt-labs/docs.getdbt.com I recommend opening an issue, and then (if you feel up to it) a PR targeting |
@hui-zheng Sorry about that, I tried merging in the recent changes from There's still a handful of unit tests failing:
The issue is that this PR has changed the function signature of
|
@jtcohen6 sorry for the delay in responding to you. a bit busy lately. will fix those testings once I have time |
…retry-timeout-configs
@jtcohen6 I haven't updated the docs, let me know if it's something I shall do before this PR gets merged or after. |
update docs according the changes in dbt-labs/dbt-bigquery#50
I also updated the docs. Please see it here. dbt-labs/docs.getdbt.com#962 Please let me know if anything else is required. |
Hi @hui-zheng seems like this branch is falling behind to main and is failing some testing checks now, if you could update it to main would be very much appreciated. |
Hi @McKnight-42, I just bring it up to date. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work on this!
update docs according the changes in dbt-labs/dbt-bigquery#50
update docs according the changes in dbt-labs/dbt-bigquery#50
* Update bigquery-profile.md update docs according the changes in dbt-labs/dbt-bigquery#50 * Update upgrading-to-1-0-0.md * Add versioning logic. Edit * Update migration guides * PR feedback, corrections Co-authored-by: Hui Zheng <[email protected]>
* add new bq job timeout and retry config * fix * fix * update changelog * fix * Fix merge * fix test_query_and_results * 2nd attempt to fix test_query_and_results * fixed other test cases * fix aliases * update changelog * update changelog * fix linting Co-authored-by: Jeremy Cohen <[email protected]>
resolves #45
Description
Description
This PR provides a fine-grained control of the timeout and retry to bq query with four dbt-profile configs
These settings would allow us to control the timeouts of step 1 and step 2 of the BigQuery on their own ( See below), hence maximizing the chances to mitigate different kinds of intermittent errors.
For example, we could set the configs below to fail faster at the step of BQ job creation, while allowing queries with long-running results.
NOTE:
job_execution_timeout_seconds is the renaming of the previous timeout config.
job_retries is the renaming of the previous retries config.
Context
at the core, BigQuery query is made by two steps in dbt
In the first step, client.query() submits a query to BQ JobInsert API server, when succeeded, BQ server creates a new BigQuery query job, and return the query job id back to the client as part ofquery_job object. This step shall be very quick, normally under a few seconds. however, in some rare cases, it would take much longer and might even up to 4 minutes according to the BigQuery engineering team.
In the 2nd step, query_job.result() await for the BigQuery to execute (running) the query and return the results to the client as an iterator. depending on the complexity of the query, this step could takes long, from tens of seconds to tens of minutes.
Checklist
CHANGELOG.md
and added information about my change to the "dbt-bigquery next" section.