Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Receive] Fix race condition when adding multiple new tenants at once #7941

Merged
merged 5 commits into from
Dec 3, 2024

Conversation

jnyi
Copy link
Contributor

@jnyi jnyi commented Nov 25, 2024

Update: actually fix the issue instead of reverting the old one, memorize the TSDB client list is valuable to avoid creating thousands of slices in memory, see a6fbb9f

This reverted PR #7782 and fixed Issue #7892

Reproducible by newly added unit tests TestMultiTSDBAddNewTenant:

Screenshot 2024-11-25 at 10 00 54 AM

After this fix, unit test would pass

=== RUN   TestMultiTSDBAddNewTenant
=== PAUSE TestMultiTSDBAddNewTenant
=== CONT  TestMultiTSDBAddNewTenant
--- PASS: TestMultiTSDBAddNewTenant (13.82s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-0
    --- PASS: TestMultiTSDBAddNewTenant/iteration-0 (1.39s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-1
    --- PASS: TestMultiTSDBAddNewTenant/iteration-1 (1.36s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-2
    --- PASS: TestMultiTSDBAddNewTenant/iteration-2 (1.39s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-3
    --- PASS: TestMultiTSDBAddNewTenant/iteration-3 (1.40s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-4
    --- PASS: TestMultiTSDBAddNewTenant/iteration-4 (1.40s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-5
    --- PASS: TestMultiTSDBAddNewTenant/iteration-5 (1.36s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-6
    --- PASS: TestMultiTSDBAddNewTenant/iteration-6 (1.37s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-7
    --- PASS: TestMultiTSDBAddNewTenant/iteration-7 (1.38s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-8
    --- PASS: TestMultiTSDBAddNewTenant/iteration-8 (1.41s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-9
    --- PASS: TestMultiTSDBAddNewTenant/iteration-9 (1.36s)
PASS
  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@jnyi jnyi changed the title [Receive] Fix issue-7892 race condition [Receive] Fix race condition when adding multiple new tenants at once Nov 25, 2024
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's not a fix really but a revert. Thanks for the test which reproduces the bug, it will be much easier to fix. I think we should fix this instead of reverting because the cost of creating this slice is not trivial when you have thousands of selects per second.

@jnyi
Copy link
Contributor Author

jnyi commented Nov 27, 2024

hi @GiedriusS

I got your point, this client list should be relatively stable most of the time (wasteful of memory to create slices), I spent some time to actually fix it, appreciate another review

pkg/receive/multitsdb.go Outdated Show resolved Hide resolved
@jnyi
Copy link
Contributor Author

jnyi commented Nov 27, 2024

I think the e2e test failure is tranisent, but i don't have permission to rerun

@jnyi jnyi requested a review from GiedriusS November 27, 2024 18:21
@jnyi jnyi force-pushed the issue-7892 branch 2 times, most recently from 519b937 to ea3c2a0 Compare November 27, 2024 18:50
pkg/receive/multitsdb.go Outdated Show resolved Hide resolved
pkg/receive/multitsdb.go Outdated Show resolved Hide resolved
@jnyi jnyi requested a review from GiedriusS December 2, 2024 18:51
@jnyi jnyi force-pushed the issue-7892 branch 4 times, most recently from 8582bfc to 982408e Compare December 2, 2024 20:36
GiedriusS
GiedriusS previously approved these changes Dec 2, 2024
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 thanks a lot for this and sorry for the problems

@GiedriusS GiedriusS enabled auto-merge (squash) December 2, 2024 20:39
auto-merge was automatically disabled December 2, 2024 21:04

Head branch was pushed to by a user without write access

@jnyi jnyi force-pushed the issue-7892 branch 2 times, most recently from 28580fd to 83b09f5 Compare December 2, 2024 21:30
Signed-off-by: Yi Jin <[email protected]>
@jnyi
Copy link
Contributor Author

jnyi commented Dec 2, 2024

🚀 thanks a lot for this and sorry for the problems

no worries, I've tried to fix the tests, might help merge it since all checks pass

@jnyi jnyi requested a review from GiedriusS December 2, 2024 22:06
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use force pushes in the future as it makes it hard to follow what changes between reviews, thanks!

@GiedriusS GiedriusS merged commit 1ea4e69 into thanos-io:main Dec 3, 2024
22 checks passed
saswatamcode pushed a commit to saswatamcode/thanos that referenced this pull request Dec 3, 2024
…thanos-io#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <[email protected]>

* add a change log

Signed-off-by: Yi Jin <[email protected]>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <[email protected]>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <[email protected]>

* address comments

Signed-off-by: Yi Jin <[email protected]>

---------

Signed-off-by: Yi Jin <[email protected]>
saswatamcode pushed a commit to saswatamcode/thanos that referenced this pull request Dec 3, 2024
…thanos-io#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <[email protected]>

* add a change log

Signed-off-by: Yi Jin <[email protected]>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <[email protected]>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <[email protected]>

* address comments

Signed-off-by: Yi Jin <[email protected]>

---------

Signed-off-by: Yi Jin <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
saswatamcode added a commit that referenced this pull request Dec 4, 2024
* Merge pull request #7674 from didukh86/query_frontend_tls_redis_fix

Query-frontend: Fix connection to Redis cluster with TLS.
Signed-off-by: Saswata Mukherjee <[email protected]>

* Capnp: Use segment from existing message (#7945)

* Capnp: Use segment from existing message

Signed-off-by: Filip Petkovski <[email protected]>

* Downgrade capnproto

Signed-off-by: Filip Petkovski <[email protected]>

---------

Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>

* [Receive] Fix race condition when adding multiple new tenants at once (#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <[email protected]>

* add a change log

Signed-off-by: Yi Jin <[email protected]>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <[email protected]>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <[email protected]>

* address comments

Signed-off-by: Yi Jin <[email protected]>

---------

Signed-off-by: Yi Jin <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>

* Cut patch release v0.37.1

Signed-off-by: Saswata Mukherjee <[email protected]>

* Update promql-engine for subquery fix (#7953)

Signed-off-by: Saswata Mukherjee <[email protected]>

* Sidecar: Ensure limit param is positive for compatibility with older Prometheus (#7954)

Signed-off-by: Saswata Mukherjee <[email protected]>

* Update changelog

Signed-off-by: Saswata Mukherjee <[email protected]>

* Fix changelog

Signed-off-by: Saswata Mukherjee <[email protected]>

---------

Signed-off-by: Saswata Mukherjee <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Yi Jin <[email protected]>
Co-authored-by: Filip Petkovski <[email protected]>
Co-authored-by: Yi Jin <[email protected]>
GiedriusS pushed a commit to vinted/thanos that referenced this pull request Dec 12, 2024
…thanos-io#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <[email protected]>

* add a change log

Signed-off-by: Yi Jin <[email protected]>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <[email protected]>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <[email protected]>

* address comments

Signed-off-by: Yi Jin <[email protected]>

---------

Signed-off-by: Yi Jin <[email protected]>
jnyi added a commit to jnyi/thanos that referenced this pull request Dec 13, 2024
…thanos-io#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <[email protected]>

* add a change log

Signed-off-by: Yi Jin <[email protected]>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <[email protected]>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <[email protected]>

* address comments

Signed-off-by: Yi Jin <[email protected]>

---------

Signed-off-by: Yi Jin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants