Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(v2): evict failed jobs from compaction scheduler queue #3892

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kolesnikovae
Copy link
Collaborator

@kolesnikovae kolesnikovae commented Feb 5, 2025

This change introduces a mechanism to evict faulty jobs from the compaction scheduler queue. While queue overflow is primarily a concern during development, it could still potentially occur in other scenarios.

To avoid infinite reassignment loops, the scheduler keeps track of reassignments (failures) for each job. If the number
of failures exceeds a set threshold, the job is not reassigned and remains at the bottom of the queue. Once the cause of
failure is resolved, the error limit can be temporarily increased to reprocess these jobs.

The scheduler queue has a size limit. Typically, the only scenario in which this limit is reached is when the compaction
process is not functioning correctly (e.g., due to a bug in the compaction procedure), preventing blocks from being
compacted and resulting in many jobs remaining in a failed state. Once the queue size limit is reached, failed jobs are
evicted, meaning the corresponding blocks will never be compacted. This may cause read amplification of the data queries
and bloat the metadata index. Therefore, the limit should be large enough. The recommended course of action is to roll
back or fix the bug and restart the compaction process, temporarily increasing the error limit if necessary.

@kolesnikovae kolesnikovae marked this pull request as ready for review February 5, 2025 12:18
@kolesnikovae kolesnikovae requested a review from a team as a code owner February 5, 2025 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant