-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature backoff and delay #680
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Logic looks solid to me, just a few nits, and noting the failing automated checks.
Also noting that we don't have unit tests for this module, so raised #681.
Thanks for the review @ramo-j . I've addressed your suggestions and push the changes. Please let me know if there is anything else I can do regarding the automated checks. |
Should be good to go on the linting tests now @ramo-j , thanks! |
@@ -14,7 +14,9 @@ | |||
"name": "GCPLogsCollector", | |||
"args": { | |||
"project_name": "@project_name", | |||
"filter_expression": "logName=projects/@project_name/logs/cloudaudit.googleapis.com%2Factivity timestamp>\"@start_date\" timestamp<\"@end_date\"" | |||
"filter_expression": "@filter_expression", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in other recipes: Any arguments to be interpolated (that is, @arg_name
) into a recipe need to be added into the "args"
section. In this recipe, that would be filter_expression
, backoff
and delay
.
Also, do we want to change up the filter expression here anyway? I didn't author the relevant module, so I'm not sure why the original filter_expression
takes the form it does, but I assume there must have been a reason. How does changing this affect log collection? I notice data/recipes/gcp_logging_collect.json
has the form you are changing to, so I assume it's probably fine, but we should make sure we're not breaking any expected behaviour.
["filter_expression", "Filter expression to use to query GCP logs. See https://cloud.google.com/logging/docs/view/query-library for examples.", "resource.type = 'gce_instance'"], | ||
["--backoff", "If API query limits are exceeded, retry with an increased delay between each query to try complete the query at a slower rate.", false], | ||
["--delay", "Number of seconds to wait between each query to avoid hitting API query limits", 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my other comment, you'll just need to copy these additions into the other recipes you've changed.
… to the recipe list documentation
This PR adds two optional arguments to the gcp_logging_collect recipe: backoff and delay. These are introduced to help manage GCP's logging api limits, which I have regularly exceeded when using dftimewolf.
Both are disabled by default, so should not impact existing dftimewolf workflows.
Backoff is a boolean value, which if True, will retry log collection if the GCP logging API quota is exceeded. Due to the way the logging API handles subsequent requests after exceeding an the API limit (#679), a new cloud logging client must be created for the re-try. This means the collection must start again. If no delay was configured, the collection will be retried with a delay of 1s between each request. If a delay was configured, the collection will be retried with a delay of 2x the set value.
Delay is the number of seconds to delay between each API request.
When dftimewolf if run without delay or backoff arguments, and the API limit is exceeded, the output will look like this:
When backoff and delay are specified, the output will look like this:
Please note that I had to break up the bulk of the code in the 'Process' function into distinct functions, however most of the code base is the same. I had to break it up because I needed a way to re-call the functions to generate a new logging client, and set a new output path, once the api limit was exceeded.