Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occassionally ingesters fail to clear a head block #569

Closed
joe-elliott opened this issue Mar 4, 2021 · 0 comments · Fixed by #570
Closed

Occassionally ingesters fail to clear a head block #569

joe-elliott opened this issue Mar 4, 2021 · 0 comments · Fixed by #570
Assignees

Comments

@joe-elliott
Copy link
Member

Describe the bug
After successfully flushing sometimes ingesters fail to clear the head block like they should. The sequence of events is normally:

  • Fail to flush head block
  • Run out of disk space filling the next head block
  • OOM
  • Restart and replay WAL

Sometimes on restart the ingester replays the wal, flushes it and continues on its merry way. Sometimes the ingester enters a crash loop attempting to replay the WAL and requires manual intervention.

image

When Tempo is in this state we can see that the filesystem considers the file deleted:

lsof -n | grep deleted
1	/tempo	/var/tempo/wal/7a5357b8-13fd-47b9-80c0-15c7279958a1:1 (deleted)
1	/tempo	/var/tempo/wal/38e10487-d4fa-4fc5-9c6a-63a01ae5ffa8:1 (deleted)
1	/tempo	/var/tempo/wal/completed/898aff05-a7e9-4ff5-ace2-28eee5070971:1 (deleted)
1	/tempo	/var/tempo/wal/completed/e3279e02-ff5d-4d50-b05f-fdf48d32e03b:1 (deleted)

Likely there is a code path in the ingester that accidentally leaves a handle open to the head block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants