Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix scan blocked by unreleased tokens #36836

Merged
merged 1 commit into from
Dec 12, 2023

Conversation

fzhedu
Copy link
Contributor

@fzhedu fzhedu commented Dec 12, 2023

Why I'm doing:

some scan operators are blocked as they cann't get tokens, which are hold by others. This case happen when a fragment with scan + limit, some dirvers run faster and reach the limit, and the scan's has_output() always return full state, so the incoming drivers can't be issued, blocking the fragment to be finished.

What I'm doing:

let finishing scan release token in time.
besides, add some key logs when the scan is blocked, help to fix similar issues in the future.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@@ -409,6 +419,8 @@ Status ScanOperator::_trigger_next_scan(RuntimeState* state, int chunk_source_in
int64_t prev_scan_bytes = chunk_source->get_scan_bytes();
auto status = chunk_source->buffer_next_batch_chunks_blocking(state, kIOTaskBatchSize, _workgroup.get());
if (!status.ok() && !status.is_end_of_file()) {
LOG(ERROR) << "scan fragment " << print_id(state->fragment_instance_id()) << " driver "
<< get_driver_sequence() << " Scan tasks error: " << status.to_string();
_set_scan_status(status);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Concurrent modification of shared resources without proper synchronization may lead to undefined behavior or data races.

You can modify the code like this:

void ScanOperator::close(RuntimeState* state) {
    std::lock_guard guard(_task_mutex); // Acquire the lock before modifying shared resources
    set_buffer_finished();
    // For the running io task, we close its chunk sources in ~ScanOperator not in ScanOperator::close.
    for (size_t i = 0; i < _chunk_sources.size(); i++) {
        // std::lock_guard guard(_task_mutex); // This line is commented out as we've already acquired the lock above
        ...

Note: The provided code snippet seems to be a diff patch from a version control system. However, the context around modifications suggests that there is some thread-sensitive logic regarding the management of IO tasks and buffer state. Moving set_buffer_finished() outside of the close method without locking (_task_mutex) could result in data races if another thread is simultaneously involved with the chunk sources or related counters such as _num_running_io_tasks or _submit_task_counter.

Acquiring the mutex lock at the start of the ScanOperator::close method can help ensure these shared resources are protected against concurrent access, preventing data races and ensuring the consistency of the runtime state.

Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

fail : 2 / 10 (20.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 src/exec/pipeline/scan/scan_operator.cpp 2 10 20.00% [234, 235, 236, 237, 238, 239, 422, 423]

@fzhedu fzhedu merged commit d3621d0 into StarRocks:main Dec 12, 2023
44 of 45 checks passed
Copy link

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Dec 12, 2023
Copy link

@Mergifyio backport branch-3.1

@github-actions github-actions bot removed the 3.1 label Dec 12, 2023
Copy link

@Mergifyio backport branch-3.0

Copy link

@Mergifyio backport branch-2.5

Copy link
Contributor

mergify bot commented Dec 12, 2023

backport branch-3.2

✅ Backports have been created

Copy link
Contributor

mergify bot commented Dec 12, 2023

backport branch-3.1

✅ Backports have been created

Copy link
Contributor

mergify bot commented Dec 12, 2023

backport branch-3.0

✅ Backports have been created

Copy link
Contributor

mergify bot commented Dec 12, 2023

backport branch-2.5

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Dec 12, 2023
Signed-off-by: Zhuhe Fang <[email protected]>
(cherry picked from commit d3621d0)
mergify bot pushed a commit that referenced this pull request Dec 12, 2023
Signed-off-by: Zhuhe Fang <[email protected]>
(cherry picked from commit d3621d0)
mergify bot pushed a commit that referenced this pull request Dec 12, 2023
Signed-off-by: Zhuhe Fang <[email protected]>
(cherry picked from commit d3621d0)
mergify bot pushed a commit that referenced this pull request Dec 12, 2023
Signed-off-by: Zhuhe Fang <[email protected]>
(cherry picked from commit d3621d0)
wanpengfei-git pushed a commit that referenced this pull request Dec 12, 2023
wanpengfei-git pushed a commit that referenced this pull request Dec 13, 2023
andyziye pushed a commit that referenced this pull request Dec 15, 2023
Signed-off-by: Zhuhe Fang <[email protected]>
(cherry picked from commit d3621d0)
andyziye pushed a commit that referenced this pull request Dec 15, 2023
wanpengfei-git pushed a commit that referenced this pull request Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants