Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporterhelper] Fix race in concurrency check in batch sender leading to smaller batch sizes #9761

Merged

Conversation

carsonip
Copy link
Contributor

@carsonip carsonip commented Mar 14, 2024

Description:

Although activeRequests is atomic, it is possible for 2 arriving
requests to both increment activeRequests, and when entering the
critical section of bs.activeRequests.Load() >= bs.concurrencyLimit,
both times it evaluates to true. The correct behavior should be that
only the 2nd check is true.

Remove the workaround in tests that bypassed the bug.


Even with this change, the results are slightly better but still depend on goroutine scheduling.

Although activeRequests is atomic, it is possible for 2 arriving
requests to both increment activeRequests, and when entering the
critical section of bs.activeRequests.Load() >= bs.concurrencyLimit,
both times it evaluates to true. The correct behavior should be that
only the 2nd check is true.

Remove the workaround in tests that bypassed the bug.
@carsonip carsonip changed the title [exporterhelper] Fix race in concurrency check in batch sender [exporterhelper] Fix race in concurrency check in batch sender leading to smaller batch sizes Mar 14, 2024
Copy link

codecov bot commented Mar 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.96%. Comparing base (cc485e0) to head (88358c8).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9761   +/-   ##
=======================================
  Coverage   90.96%   90.96%           
=======================================
  Files         353      353           
  Lines       18626    18628    +2     
=======================================
+ Hits        16943    16945    +2     
  Misses       1356     1356           
  Partials      327      327           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@carsonip carsonip marked this pull request as ready for review March 14, 2024 19:21
@carsonip carsonip requested review from a team and djaglowski March 14, 2024 19:21
@carsonip carsonip marked this pull request as draft March 14, 2024 19:28
@carsonip carsonip marked this pull request as ready for review March 14, 2024 19:53
@dmitryax
Copy link
Member

Thanks for the fix, @carsonip! I'm curious to know more about your adoption of this API. Any feedback is welcome.

@dmitryax dmitryax merged commit 3cb1250 into open-telemetry:main Mar 14, 2024
65 of 80 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 14, 2024
@carsonip
Copy link
Contributor Author

carsonip commented Mar 14, 2024

@dmitryax not sure if you caught my latest edit. In my tests, with the bugfix, the behavior is slightly better, but there are still a lot of requests with just 1 item. It boils down to goroutine scheduling. It may require a bigger change in how queue sender and batch sender interact with each other to eliminate this issue.

I'm curious to know more about your adoption of this API. Any feedback is welcome.

I'm still experimenting with it. Batch sender blocking the send() function with <-batch.done is quite a nice workaround to avoid immediately calling the callback in queue sender which deletes items from persistent queue. Maybe I can give a better summary of our usage later!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants