Occassionally ingesters fail to clear a head block #569

joe-elliott · 2021-03-04T18:36:11Z

Describe the bug
After successfully flushing sometimes ingesters fail to clear the head block like they should. The sequence of events is normally:

Fail to flush head block
Run out of disk space filling the next head block
OOM
Restart and replay WAL

Sometimes on restart the ingester replays the wal, flushes it and continues on its merry way. Sometimes the ingester enters a crash loop attempting to replay the WAL and requires manual intervention.

When Tempo is in this state we can see that the filesystem considers the file deleted:

lsof -n | grep deleted
1	/tempo	/var/tempo/wal/7a5357b8-13fd-47b9-80c0-15c7279958a1:1 (deleted)
1	/tempo	/var/tempo/wal/38e10487-d4fa-4fc5-9c6a-63a01ae5ffa8:1 (deleted)
1	/tempo	/var/tempo/wal/completed/898aff05-a7e9-4ff5-ace2-28eee5070971:1 (deleted)
1	/tempo	/var/tempo/wal/completed/e3279e02-ff5d-4d50-b05f-fdf48d32e03b:1 (deleted)

Likely there is a code path in the ingester that accidentally leaves a handle open to the head block.

The text was updated successfully, but these errors were encountered:

joe-elliott assigned mdisibio Mar 4, 2021

mdisibio mentioned this issue Mar 5, 2021

Close gaps where append/completeblock file handles would rely on GC to close #570

Merged

3 tasks

joe-elliott closed this as completed in #570 Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Occassionally ingesters fail to clear a head block #569

Occassionally ingesters fail to clear a head block #569

joe-elliott commented Mar 4, 2021

Occassionally ingesters fail to clear a head block #569

Occassionally ingesters fail to clear a head block #569

Comments

joe-elliott commented Mar 4, 2021