Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Reset all fields of runtime filter when exceeding global build size #35776

Merged
merged 3 commits into from
Nov 24, 2023

Conversation

ZiheLiu
Copy link
Contributor

@ZiheLiu ZiheLiu commented Nov 24, 2023

Why I'm doing

BE will crash, when the hash table build size exceeds global_runtime_filter_build_max_size.

*** Aborted at 1700708505 (unix time) try "date -d @1700708505" if you are using GNU date ***
PC: @     0x7fd3a38b36a6 __memcpy_ssse3_back
*** SIGSEGV (@0x1ffff0) received by PID 3045 (TID 0x7fd329f54700) from PID 2097136; stack trace: ***
    @          0x63a11c2 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7fd3a52a1cfb os::Linux::chained_handler()
    @     0x7fd3a52a686e JVM_handle_linux_signal
    @     0x7fd3a529aad8 signalHandler()
    @     0x7fd3a4462630 (unknown)
    @     0x7fd3a38b36a6 __memcpy_ssse3_back
    @          0x43ecae9 starrocks::JoinRuntimeFilter::serialize()
    @          0x2c41066 starrocks::RuntimeBloomFilter<>::serialize()
    @          0x43cf16d starrocks::RuntimeFilterHelper::serialize_runtime_filter()
    @          0x4e32e1b starrocks::RuntimeFilterPort::publish_runtime_filters()
    @          0x2e5b879 starrocks::pipeline::HashJoinBuildOperator::set_finishing()
    @          0x2ac99e7 starrocks::pipeline::PipelineDriver::_mark_operator_finishing()
    @          0x2acb8fc starrocks::pipeline::PipelineDriver::process()
    @          0x56a4e5e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x4f8afa2 starrocks::ThreadPool::dispatch_thread()
    @          0x4f85a3a starrocks::Thread::supervise_thread()
    @     0x7fd3a445aea5 start_thread
    @     0x7fd3a385bb0d __clone
    @                0x0 (unknown)

What I'm doing

Fixed it.

Root cause

Clear RF

After #32909 , when the bloom RF is being merged and the build size exceeds global_runtime_filter_build_max_size, it will call SimdBlockFilter::clear().
However, SimdBlockFilter::clear() does nothing if _directory is nullptr, which leads to arbitrary value of _log_num_buckets and _directory_mask (because they don't have initial value).

Serialize RF

  • max_serialized_size() thinks alloc_size is zero, because _directory is nullptr.
  • serialize thinks alloc_size is get_alloc_size, which is greater than max_serialized_size().
    Therefore, the buffer with size of max_serialized_size() cannot bear the larger size from alloc_size.

The root cause is that _directory and _log_num_buckets are inconsistent before calling init, which will be never called when exceeding global build size.
Therefore, give _log_num_buckets a zero initial value.

Status PartialRuntimeFilterMerger::merge_multi_partitioned_local_bloom_filters() {
        for (auto& desc : _bloom_filter_descriptors) {
            if (desc->has_remote_targets() && row_count > _global_rf_limit) {
                filter->clear_bf();
            } else {
                filter->init(row_count);
            }
            desc->set_runtime_filter(filter);
        }
}

class SimdBlockFilter {
public:
    clear() {
        if (_directory) {
            free(_directory);
            _directory = nullptr;
            _log_num_buckets = 0;
            _directory_mask = 0;
        }
    }

    size_t get_alloc_size() const {
        return _log_num_buckets == 0 ? 0 : (1ull << (_log_num_buckets + LOG_BUCKET_BYTE_SIZE));
    }

    size_t max_serialized_size() const {
        const size_t alloc_size = _directory == nullptr ? 0 : get_alloc_size(); // ********** A **********
        return sizeof(_log_num_buckets) + sizeof(_directory_mask) + // data size + max data size
            sizeof(int32_t) + alloc_size;
    }

    size_t serialize(uint8_t* data) const {
        size_t offset = 0;
    #define SIMD_BF_COPY_FIELD(field)                 \
        memcpy(data + offset, &field, sizeof(field)); \
        offset += sizeof(field);
        SIMD_BF_COPY_FIELD(_log_num_buckets);
        SIMD_BF_COPY_FIELD(_directory_mask);

        const size_t alloc_size = get_alloc_size();
        int32_t data_size = alloc_size;
        SIMD_BF_COPY_FIELD(data_size);
        if (LIKELY(data_size > 0)) { // ********** B **********
            memcpy(data + offset, _directory, data_size);
            offset += data_size;
        }
        return offset;
    #undef SIMD_BF_COPY_FIELD
    }

private:
    int _log_num_buckets;
    uint32_t _directory_mask;
    Bucket* _directory = nullptr;
};

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

Signed-off-by: zihe.liu <[email protected]>
Signed-off-by: zihe.liu <[email protected]>
Signed-off-by: ZiheLiu <[email protected]>
Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@ZiheLiu ZiheLiu merged commit ff58c88 into StarRocks:main Nov 24, 2023
81 checks passed
@ZiheLiu ZiheLiu deleted the fix/main/rf_clear branch November 24, 2023 13:32
Copy link

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Nov 24, 2023
Copy link

@Mergifyio backport branch-3.1

@github-actions github-actions bot removed the 3.1 label Nov 24, 2023
Copy link

@Mergifyio backport branch-3.0

Copy link
Contributor

mergify bot commented Nov 24, 2023

backport branch-3.2

✅ Backports have been created

@github-actions github-actions bot removed the 3.0 label Nov 24, 2023
Copy link

@Mergifyio backport branch-2.5

@github-actions github-actions bot removed the 2.5 label Nov 24, 2023
Copy link
Contributor

mergify bot commented Nov 24, 2023

backport branch-3.1

✅ Backports have been created

Copy link
Contributor

mergify bot commented Nov 24, 2023

backport branch-3.0

✅ Backports have been created

Copy link
Contributor

mergify bot commented Nov 24, 2023

backport branch-2.5

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Nov 24, 2023
…ld size (#35776)

Signed-off-by: zihe.liu <[email protected]>
Signed-off-by: ZiheLiu <[email protected]>
(cherry picked from commit ff58c88)
mergify bot pushed a commit that referenced this pull request Nov 24, 2023
…ld size (#35776)

Signed-off-by: zihe.liu <[email protected]>
Signed-off-by: ZiheLiu <[email protected]>
(cherry picked from commit ff58c88)
mergify bot pushed a commit that referenced this pull request Nov 24, 2023
…ld size (#35776)

Signed-off-by: zihe.liu <[email protected]>
Signed-off-by: ZiheLiu <[email protected]>
(cherry picked from commit ff58c88)
mergify bot pushed a commit that referenced this pull request Nov 24, 2023
…ld size (#35776)

Signed-off-by: zihe.liu <[email protected]>
Signed-off-by: ZiheLiu <[email protected]>
(cherry picked from commit ff58c88)

# Conflicts:
#	be/src/exprs/runtime_filter.h
wanpengfei-git pushed a commit that referenced this pull request Nov 24, 2023
…ld size (#35776)

Signed-off-by: zihe.liu <[email protected]>
Signed-off-by: ZiheLiu <[email protected]>
(cherry picked from commit ff58c88)
wanpengfei-git pushed a commit that referenced this pull request Nov 24, 2023
…ld size (#35776)

Signed-off-by: zihe.liu <[email protected]>
Signed-off-by: ZiheLiu <[email protected]>
(cherry picked from commit ff58c88)
wanpengfei-git pushed a commit that referenced this pull request Nov 24, 2023
…ld size (#35776)

Signed-off-by: zihe.liu <[email protected]>
Signed-off-by: ZiheLiu <[email protected]>
(cherry picked from commit ff58c88)
Astralidea pushed a commit that referenced this pull request Nov 27, 2023
…ld size (#35776)

Signed-off-by: zihe.liu <[email protected]>
Signed-off-by: ZiheLiu <[email protected]>
Signed-off-by: Astralidea <[email protected]>
ZiheLiu added a commit to ZiheLiu/starrocks that referenced this pull request Nov 27, 2023
satanson pushed a commit that referenced this pull request Nov 27, 2023
trueeyu pushed a commit to trueeyu/starrocks that referenced this pull request Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants