Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean-up log retention rules #663

Closed
9 tasks done
robrap opened this issue May 28, 2024 · 5 comments
Closed
9 tasks done

Clean-up log retention rules #663

robrap opened this issue May 28, 2024 · 5 comments
Assignees

Comments

@robrap
Copy link
Contributor

robrap commented May 28, 2024

AC:

Notes:

  • Review both the edx-logs and main index to see what needs to be adjusted.
  • The main index has an edx rule for deployment:edx. It has other edx rules as well.
  • All temp retention rules should be removed, or ticketed based on blockers? Their name could get the ticket link. Review them all.
  • See Clean-up log retention rules #663 (comment) for more notes.
@robrap robrap added this to Arch-BOM May 28, 2024
@robrap robrap converted this from a draft issue May 28, 2024
@jristau1984 jristau1984 moved this to Groomed in Arch-BOM Jul 1, 2024
@jristau1984 jristau1984 moved this from Groomed to Backlog in Arch-BOM Jul 2, 2024
@robrap
Copy link
Contributor Author

robrap commented Aug 21, 2024

I was considering closing this, but I learned from this search on
service:edx-* index:main that we were capturing edx logs outside of index:edx-logs. The following issues exist:

  • I thought I learned that each index has a catch-all of 100% retention, unless otherwise noted, but it seems logs can go through multiple indexes if no retention rule is hit. We need to move the 100% catch all for edx services from main to edx-logs. Can you have a rule that matches all, so you don't need to repeat the index rule?
  • edx-edxapp-lms-frontend and edx-edxapp-lms-cms are not matching because frontend is in the wrong place. Should be edx-frontend-*. See code here. Does this need a separate ticket? It looks like the application name can be changed in the UI, but probably should be renamed in both places near the same time.
  • Once this is fixed, do we want a monitor to ensure no edx services land in index:main?
  • Once this is all fixed, hopefully the edx-specific rules under index:main are showing nothing in the live view, and we can disable them, and eventually delete them.

@robrap robrap changed the title Final-ish log retention tuning Clean-up log retention rules Aug 21, 2024
@robrap robrap moved this from Backlog to Ready For Development in Arch-BOM Aug 21, 2024
@robrap robrap self-assigned this Aug 26, 2024
@robrap robrap moved this from Ready For Development to In Progress in Arch-BOM Aug 26, 2024
@robrap
Copy link
Contributor Author

robrap commented Aug 26, 2024

Started Slack thread around renaming legacy edxapp RUM services, and cross-posted to several channels. Need to determine if we can complete this as part of this ticket, or if we want to create a temporary fix to the index rule and ticket completing that work elsewhere.

@robrap
Copy link
Contributor Author

robrap commented Aug 26, 2024

I'm having trouble with Live Tail in some of the retention rules, so I posted a support ticket here: https://help.datadoghq.com/hc/en-us/requests/1820640.

@robrap
Copy link
Contributor Author

robrap commented Aug 30, 2024

I just created #786, because it would be very useful for RUM service renaming in this ticket. I may end up picking that up.

@robrap
Copy link
Contributor Author

robrap commented Sep 6, 2024

Updates made:

  • Updated Sept ?, 5:07pm: Changed index:edx-logs query to: aws_account_name:_edx* OR (service:edx-* -aws_account_name:*).
    • Captures lms and cms frontend, that have "frontend" in a different part of the name.
    • Captures lms, cms, and other logs that don't have the associated aws_account_name for some reason.
  • Deleted index:edx-logs filter with service:*edx-frontend-app-account, since it had no matches other than "edx-frontend-app-account", which would be caught be the "edx services" filter.
  • Updated index:edx-logs filter for service:monitoring-scripts to exclude all (not just multi-line) and dropped the "TEMP" designation.
  • Updated index:edx-logs filter "[TEMP] Exclude invalid traceparent." with link to ticket for resolving.
  • Deleted index:main filter for deployment:edx. No logs were found for this search.
  • Deleted index:main "EdX Clusters" filter that was disabled.
  • Deleted index:main "edx services" filter using aws_account_name:_edx-prod. No logs were found for this search.
  • Deleted index:main filter using service:tmccormack-local-dev.
  • Deleted new monitor for service:edx-* in index:main, because other teams are using the edx- prefix.

@robrap robrap closed this as completed Sep 9, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Arch-BOM Sep 9, 2024
@jristau1984 jristau1984 moved this from Done to Done - Long Term Storage in Arch-BOM Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done - Long Term Storage
Development

No branches or pull requests

1 participant