Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-4236] Add num_retries to MySqlToGoogleCloudStorageOperator #5043

Closed
wants to merge 5 commits into from

Conversation

ryanyuan
Copy link
Contributor

@ryanyuan ryanyuan commented Apr 5, 2019

Add num_retries to MySqlToGoogleCloudStorageOperator

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow-4236 issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-XXX
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@codecov-io
Copy link

codecov-io commented Apr 5, 2019

Codecov Report

Merging #5043 into master will decrease coverage by 0.12%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5043      +/-   ##
==========================================
- Coverage   76.36%   76.24%   -0.13%     
==========================================
  Files         471      466       -5     
  Lines       30290    30102     -188     
==========================================
- Hits        23130    22950     -180     
+ Misses       7160     7152       -8
Impacted Files Coverage Δ
airflow/contrib/operators/mysql_to_gcs.py 90.14% <100%> (+0.06%) ⬆️
airflow/contrib/hooks/cloudant_hook.py 0% <0%> (-100%) ⬇️
airflow/utils/helpers.py 82.51% <0%> (-0.36%) ⬇️
airflow/contrib/executors/kubernetes_executor.py 63.17% <0%> (-0.21%) ⬇️
airflow/contrib/kubernetes/worker_configuration.py 95.72% <0%> (-0.18%) ⬇️
airflow/utils/db.py 90.29% <0%> (-0.1%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 98.59% <0%> (-0.04%) ⬇️
.../kubernetes_request_factory/pod_request_factory.py 100% <0%> (ø) ⬆️
...example_dags/example_kubernetes_executor_config.py 0% <0%> (ø) ⬆️
airflow/contrib/hooks/gcp_api_base_hook.py 84.76% <0%> (ø) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b93f264...d32beb9. Read the comment docs.

@OmerJog
Copy link
Contributor

OmerJog commented Apr 5, 2019

Thanks for picking this up.
The ticket mentions that this is an issue with other operators as well (though not mention it specifically)

CassandraToGoogleCloudStorageOperator
PostgresToGoogleCloudStorageOperator
S3ToGoogleCloudStorageOperator
AdlsToGoogleCloudStorageOperator

are also don't allow to set the num_retries of the upload method in the GoogleCloudStorageHook

Can you you address them as well? If not, it's better to link this PR to a new Jira ticket which will be a sub-task of https://issues.apache.org/jira/browse/AIRFLOW-4236

@kaxil
Copy link
Member

kaxil commented Apr 5, 2019

Better to address similar issues in one PR then to split it across many

@ryanyuan
Copy link
Contributor Author

ryanyuan commented Apr 5, 2019

@OmerJog Cool. I will get it done.

@mik-laj
Copy link
Member

mik-laj commented Apr 5, 2019

Other operators have a fixed number of retry attempts.





).execute(num_retries=5)

I'm afraid it's a good idea to give the user the option to configure this parameter in this way. Too many parameters will make it difficult to use the operator. In my opinion, this value should be configured in the connection settings instead of the specific use, but fixed value is good enough in this case.

I working on similar mechanism for Google Cloud AI operators: https://github.com/PolideaInternal/airflow/pull/85/files
I want to support retrying in reaction to exceeding the temporary quota. I would like the use of all GCP operators to be similar, so it is important for me to elaborate one variant.

CC: @potiuk

@ryanyuan
Copy link
Contributor Author

ryanyuan commented Apr 7, 2019

@kaxil Any opinions on what @mik-laj suggested above?

@ashb
Copy link
Member

ashb commented Apr 8, 2019

This parameter is confusingly similar to BaseOperator.retries parameter.

If we were go have this paramater the docs would need to be much clearer about what is "internal" to one task attempt compared to number of attempts, without changing the name on BaseOperator.

@kaxil
Copy link
Member

kaxil commented Apr 8, 2019

I agree with @mik-laj . It would be ideal to have a uniform number across all the operators. We can have a MAX retries in connection, a default value in hooks, and can be overriden in operator. Keep in mind that we also have a retry for all airflow operators as well so that we don't conflict.

@ashb
Copy link
Member

ashb commented Apr 8, 2019

Another example of "internal" retries

def poll_query_status(self, query_execution_id, max_tries=None):

@OmerJog
Copy link
Contributor

OmerJog commented Apr 8, 2019

I think having a default value in connection is a good approach.

@OmerJog
Copy link
Contributor

OmerJog commented Apr 11, 2019

@kaxil your PR has removed the num_retries:
#5054
You specified that the new library handle the multipart by it's own but you didn't mention what will be with the num_retries

@kaxil
Copy link
Member

kaxil commented Apr 11, 2019

@OmerJog Sorry, I should have included it in the Uploading.md

Upload requests will be automatically retried.

From https://github.com/googleapis/google-cloud-python/blob/master/storage/google/cloud/storage/blob.py#L87 :

"`num_retries` has been deprecated and will be removed in a future "
    "release. The default behavior (when `num_retries` is not specified) when "
    "a transient error (e.g. 429 Too Many Requests or 500 Internal Server "
    "Error) occurs will be as follows: upload requests will be automatically "
    "retried. Subsequent retries will be sent after waiting 1, 2, 4, 8, etc. "
    "seconds (exponential backoff) until 10 minutes of wait time have "
    "elapsed. At that point, there will be no more attempts to retry."

@OmerJog
Copy link
Contributor

OmerJog commented Apr 14, 2019

@kaxil thx. So it sounds like this PR is no longer needed. The new fix in your PR solves the problem.

@ryanyuan
Copy link
Contributor Author

@OmerJog Yes, I will close this PR and create another PR to let the user set a default number of request retries in GCP connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants