Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature backoff and delay #680

Merged
merged 14 commits into from
Dec 22, 2022
Merged

Conversation

bskeggs
Copy link
Collaborator

@bskeggs bskeggs commented Dec 2, 2022

This PR adds two optional arguments to the gcp_logging_collect recipe: backoff and delay. These are introduced to help manage GCP's logging api limits, which I have regularly exceeded when using dftimewolf.

Both are disabled by default, so should not impact existing dftimewolf workflows.

Backoff is a boolean value, which if True, will retry log collection if the GCP logging API quota is exceeded. Due to the way the logging API handles subsequent requests after exceeding an the API limit (#679), a new cloud logging client must be created for the re-try. This means the collection must start again. If no delay was configured, the collection will be retried with a delay of 1s between each request. If a delay was configured, the collection will be retried with a delay of 2x the set value.

Delay is the number of seconds to delay between each API request.

When dftimewolf if run without delay or backoff arguments, and the API limit is exceeded, the output will look like this:

poetry run dftimewolf gcp_logging_collect 'projectname' 'timestamp >= "2022-11-30T00:00:00Z" AND timestamp <= "2022-12-08T08:00:00Z"'
  [ dftimewolf       ] Debug log: /tmp/dftimewolf-run-20221202_081136_l_1y_kd9.log
  [ GCPLogsCollector ] backoff False
  [ GCPLogsCollector ] delay 0.0
  [ GCPLogsCollector ] Hit quota limit requesting GCP logs.
  [ GCPLogsCollector ] Exponential backoff was not enabled, so query has exited.
  [ GCPLogsCollector ] The collection is most likely incomplete.
  [ GCPLogsCollector ] Downloaded logs to /tmp/tmpjfwq0jhv.jsonl

When backoff and delay are specified, the output will look like this:

poetry run dftimewolf gcp_logging_collect 'projectname' 'timestamp >= "2022-11-30T00:00:00Z" AND timestamp <= "2022-12-08T08:00:00Z"' --backoff --delay 1
Messages
  [ dftimewolf       ] Debug log: /tmp/dftimewolf-run-20221202_081057_1ph_z9ws.log
  [ GCPLogsCollector ] backoff True
  [ GCPLogsCollector ] delay 1.0
  [ GCPLogsCollector ] Hit quota limit requesting GCP logs.
  [ GCPLogsCollector ] Retrying in 60 seconds with a slower query rate.
  [ GCPLogsCollector ] Due to the GCP logging API, the query must restart from the beginning
  [ GCPLogsCollector ] Setting up new logging client.
  [ GCPLogsCollector ] Restarting query with an API request rate of 1 per 2s
  [ GCPLogsCollector ] Downloaded logs to /tmp/tmpfr1nz782.jsonl

Please note that I had to break up the bulk of the code in the 'Process' function into distinct functions, however most of the code base is the same. I had to break it up because I needed a way to re-call the functions to generate a new logging client, and set a new output path, once the api limit was exceeded.

Copy link
Collaborator

@ramo-j ramo-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Logic looks solid to me, just a few nits, and noting the failing automated checks.

Also noting that we don't have unit tests for this module, so raised #681.

dftimewolf/lib/collectors/gcp_logging.py Outdated Show resolved Hide resolved
dftimewolf/lib/collectors/gcp_logging.py Outdated Show resolved Hide resolved
dftimewolf/lib/collectors/gcp_logging.py Outdated Show resolved Hide resolved
dftimewolf/lib/collectors/gcp_logging.py Outdated Show resolved Hide resolved
dftimewolf/lib/collectors/gcp_logging.py Outdated Show resolved Hide resolved
dftimewolf/lib/collectors/gcp_logging.py Outdated Show resolved Hide resolved
@bskeggs
Copy link
Collaborator Author

bskeggs commented Dec 13, 2022

Thanks for the review @ramo-j . I've addressed your suggestions and push the changes. Please let me know if there is anything else I can do regarding the automated checks.

@bskeggs
Copy link
Collaborator Author

bskeggs commented Dec 13, 2022

Should be good to go on the linting tests now @ramo-j , thanks!

@bskeggs bskeggs requested a review from ramo-j December 21, 2022 08:19
@@ -14,7 +14,9 @@
"name": "GCPLogsCollector",
"args": {
"project_name": "@project_name",
"filter_expression": "logName=projects/@project_name/logs/cloudaudit.googleapis.com%2Factivity timestamp>\"@start_date\" timestamp<\"@end_date\""
"filter_expression": "@filter_expression",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other recipes: Any arguments to be interpolated (that is, @arg_name) into a recipe need to be added into the "args" section. In this recipe, that would be filter_expression, backoff and delay.

Also, do we want to change up the filter expression here anyway? I didn't author the relevant module, so I'm not sure why the original filter_expression takes the form it does, but I assume there must have been a reason. How does changing this affect log collection? I notice data/recipes/gcp_logging_collect.json has the form you are changing to, so I assume it's probably fine, but we should make sure we're not breaking any expected behaviour.

Comment on lines 24 to 26
["filter_expression", "Filter expression to use to query GCP logs. See https://cloud.google.com/logging/docs/view/query-library for examples.", "resource.type = 'gce_instance'"],
["--backoff", "If API query limits are exceeded, retry with an increased delay between each query to try complete the query at a slower rate.", false],
["--delay", "Number of seconds to wait between each query to avoid hitting API query limits", 0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my other comment, you'll just need to copy these additions into the other recipes you've changed.

@ramo-j ramo-j merged commit cfa6ca2 into log2timeline:main Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants