Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pipeline backoff #1097

Merged
merged 4 commits into from
Jun 10, 2024
Merged

Improve pipeline backoff #1097

merged 4 commits into from
Jun 10, 2024

Conversation

Mallets
Copy link
Member

@Mallets Mallets commented Jun 7, 2024

This PR improves the pipeline backoff implementation and reduces the possibility of spinning tasks.

Without this patch, it may happen before that in high-load scenario we could observe the following logs:

2024-06-07T07:56:30.783760Z  WARN tx-0 ThreadId(16) zenoh_transport::common::pipeline: Pipeline pull backoff overflow detected! Retrying in 4294967295ns. (tcp/127.0.0.1:7447 => tcp/127.0.0.1:51696)
2024-06-07T07:56:30.783771Z  WARN tx-0 ThreadId(16) zenoh_transport::common::pipeline: Pipeline pull backoff overflow detected! Retrying in 4294967295ns. (tcp/127.0.0.1:7447 => tcp/127.0.0.1:51696)
2024-06-07T07:56:30.783772Z  WARN tx-0 ThreadId(16) zenoh_transport::common::pipeline: Pipeline pull backoff overflow detected! Retrying in 4294967295ns. (tcp/127.0.0.1:7447 => tcp/127.0.0.1:51696)
2024-06-07T07:56:30.783774Z  WARN tx-0 ThreadId(16) zenoh_transport::common::pipeline: Pipeline pull backoff overflow detected! Retrying in 4294967295ns. (tcp/127.0.0.1:7447 => tcp/127.0.0.1:51696)
2024-06-07T07:56:30.783775Z  WARN tx-0 ThreadId(16) zenoh_transport::common::pipeline: Pipeline pull backoff overflow detected! Retrying in 4294967295ns. (tcp/127.0.0.1:7447 => tcp/127.0.0.1:51696)
2024-06-07T07:56:30.783830Z  WARN tx-0 ThreadId(16) zenoh_transport::common::pipeline: Pipeline pull backoff overflow detected! Retrying in 4294967295ns. (tcp/127.0.0.1:7447 => tcp/127.0.0.1:51696)

With this patch, in my tests I didn't observe those logs anymore.
What I expect is that this patch heavily mitigates the problem although it might not solve it in 100% of the cases.

@Mallets
Copy link
Member Author

Mallets commented Jun 7, 2024

@yellowhatter can you please help in reviewing this PR?

@Mallets Mallets changed the title Improve backoff Improve pipeline backoff Jun 7, 2024
@yellowhatter
Copy link
Contributor

yellowhatter commented Jun 7, 2024

@Mallets looks nice!

I've been in parallel working on a little bit different design. Our queue implementation uses mutex to achieve synchronization around current batch. This is a weak point, because it makes a producer thread to go to sleep when current batch is being processed by a consumer task:

pipeline.rs: 138:
// Lock the current serialization batch.
let mut c_guard = self.mutex.current();

My idea is that the producer thread should take care about always having "one batch behind": after each serialization check atomic count of batches in s_out and move current batch (potentially not full) to s_out if s_out is empty. In this case, there won't be any producer/consumer contention around current batch. Regarding the performance in terms of batch size utilization, this approach will essentially regulate batch utilization, because misperforming (due to sending non-full batches) consumer will cause producer to manage to completely fill current, restoring consumer performance....
The only problem of this approach is how to efficiently consume non-empty current when there is no new serializations

@Mallets Mallets merged commit 0942a69 into main Jun 10, 2024
21 checks passed
Mallets added a commit that referenced this pull request Jun 19, 2024
* Add NOTE for LowLatency transport. (#1088)

Signed-off-by: ChenYing Kuo <[email protected]>

* fix: Improve debug messages in `zenoh-transport` (#1090)

* fix: Improve debug messages for failing RX/TX tasks

* fix: Improve debug message for `accept_link` timeout

* chore: Fix `clippy::redundant_pattern_matching` error

* Improve pipeline backoff (#1097)

* Yield task for backoff

* Improve comments and error handling in backoff

* Simplify pipeline pull

* Consider backoff configuration

* Add typos check to CI (#1065)

* Fix typos

* Add typos check to CI

* Start link tx_task before notifying router (#1098)

* Fix typos (#1110)

* bump quinn & rustls (#1086)

* bump quinn & rustls

* fix ci windows check

* add comments

* Fix interface name scanning when listening on IP unspecified for TCP/TLS/QUIC/WS (#1123)

Co-authored-by: Julien Enoch <[email protected]>

* Enable releasing from any branch (#1136)

* Fix cargo clippy (#1145)

* Release tables locks before propagating subscribers and queryables declarations to void dead locks (#1150)

* Send simple sub and qabl declarations using a given function

* Send simple sub and qabl declarations after releasing tables lock

* Send simple sub and qabl declarations after releasing tables lock (missing places)

* Update async-io

* Update base64 dependency

* Update event-listener dependency

* Update jsonschema dependency

* Update keyed-set dependency

* Update console-subscriber dependency

* Update pnet dependency

* Update rcgen dependency

* Update tokio-tungstenite dependency

* Update thread-priority dependency

* Fix typos

* Fix typos

* Add Unicode-3.0 to allowed licenses

---------

Signed-off-by: ChenYing Kuo <[email protected]>
Co-authored-by: ChenYing Kuo (CY) <[email protected]>
Co-authored-by: Mahmoud Mazouz <[email protected]>
Co-authored-by: Luca Cominardi <[email protected]>
Co-authored-by: Tavo Annus <[email protected]>
Co-authored-by: JLer <[email protected]>
Co-authored-by: Julien Enoch <[email protected]>
Mallets added a commit that referenced this pull request Jul 30, 2024
* Add NOTE for LowLatency transport. (#1088)

Signed-off-by: ChenYing Kuo <[email protected]>

* fix: Improve debug messages in `zenoh-transport` (#1090)

* fix: Improve debug messages for failing RX/TX tasks

* fix: Improve debug message for `accept_link` timeout

* chore: Fix `clippy::redundant_pattern_matching` error

* Improve pipeline backoff (#1097)

* Yield task for backoff

* Improve comments and error handling in backoff

* Simplify pipeline pull

* Consider backoff configuration

* Add typos check to CI (#1065)

* Fix typos

* Add typos check to CI

* Start link tx_task before notifying router (#1098)

* Fix typos (#1110)

* bump quinn & rustls (#1086)

* bump quinn & rustls

* fix ci windows check

* add comments

* Fix interface name scanning when listening on IP unspecified for TCP/TLS/QUIC/WS (#1123)

Co-authored-by: Julien Enoch <[email protected]>

* Enable releasing from any branch (#1136)

* Fix cargo clippy (#1145)

* Release tables locks before propagating subscribers and queryables declarations to void dead locks (#1150)

* Send simple sub and qabl declarations using a given function

* Send simple sub and qabl declarations after releasing tables lock

* Send simple sub and qabl declarations after releasing tables lock (missing places)

* feat: make `TerminatableTask` terminate itself when dropped (#1151)

* Fix bug in keyexpr::includes leading to call get_unchecked on empty array UB (#1208)

* REST plugin uses unbounded flume channels for queries (#1213)

* fix: typo in selector.rs (#1228)

* fix: zenohd --cfg (#1263)

* fix: zenohd --cfg

* ci: trigger

* Update zenohd/src/main.rs

---------

Co-authored-by: Luca Cominardi <[email protected]>

* Fix failover brokering bug reacting to linkstate changes (#1272)

* Change missleading log

* Fix failover brokering bug reacting to linkstate changes

* Retrigger CI

---------

Co-authored-by: Luca Cominardi <[email protected]>

* Code format

* Fix clippy warnings

* Code format

* Fix Clippy errors from Rust 1.80 (#1273)

* Allow unexpected `doc_auto_cfg` flag

* Keep never-constructed logger interceptor

* Ignore interior mutability of `Resource`

* Fix typo

* Resolve `clippy::doc-lazy-continuation` errors

* Upgrade `[email protected]` to `[email protected]`

See time-rs/time#693

* Update Cargo.toml (#1277)

Updated description to be aligned with what we use everywhere else

* Merge ci.yaml

---------

Signed-off-by: ChenYing Kuo <[email protected]>
Co-authored-by: ChenYing Kuo (CY) <[email protected]>
Co-authored-by: Mahmoud Mazouz <[email protected]>
Co-authored-by: Tavo Annus <[email protected]>
Co-authored-by: JLer <[email protected]>
Co-authored-by: Julien Enoch <[email protected]>
Co-authored-by: OlivierHecart <[email protected]>
Co-authored-by: Yuyuan Yuan <[email protected]>
Co-authored-by: Diogo Matsubara <[email protected]>
Co-authored-by: OlivierHecart <[email protected]>
Co-authored-by: kydos <[email protected]>
Mallets added a commit that referenced this pull request Aug 27, 2024
* Add NOTE for LowLatency transport. (#1088)

Signed-off-by: ChenYing Kuo <[email protected]>

* fix: Improve debug messages in `zenoh-transport` (#1090)

* fix: Improve debug messages for failing RX/TX tasks

* fix: Improve debug message for `accept_link` timeout

* chore: Fix `clippy::redundant_pattern_matching` error

* Improve pipeline backoff (#1097)

* Yield task for backoff

* Improve comments and error handling in backoff

* Simplify pipeline pull

* Consider backoff configuration

* Add typos check to CI (#1065)

* Fix typos

* Add typos check to CI

* Start link tx_task before notifying router (#1098)

* Fix typos (#1110)

* bump quinn & rustls (#1086)

* bump quinn & rustls

* fix ci windows check

* add comments

* Fix interface name scanning when listening on IP unspecified for TCP/TLS/QUIC/WS (#1123)

Co-authored-by: Julien Enoch <[email protected]>

* Enable releasing from any branch (#1136)

* Fix cargo clippy (#1145)

* Release tables locks before propagating subscribers and queryables declarations to void dead locks (#1150)

* Send simple sub and qabl declarations using a given function

* Send simple sub and qabl declarations after releasing tables lock

* Send simple sub and qabl declarations after releasing tables lock (missing places)

* feat: make `TerminatableTask` terminate itself when dropped (#1151)

* Fix bug in keyexpr::includes leading to call get_unchecked on empty array UB (#1208)

* REST plugin uses unbounded flume channels for queries (#1213)

* fix: typo in selector.rs (#1228)

* fix: zenohd --cfg (#1263)

* fix: zenohd --cfg

* ci: trigger

* Update zenohd/src/main.rs

---------

Co-authored-by: Luca Cominardi <[email protected]>

* Fix failover brokering bug reacting to linkstate changes (#1272)

* Change missleading log

* Fix failover brokering bug reacting to linkstate changes

* Retrigger CI

---------

Co-authored-by: Luca Cominardi <[email protected]>

* Code format

* Fix clippy warnings

* Code format

* Fix Clippy errors from Rust 1.80 (#1273)

* Allow unexpected `doc_auto_cfg` flag

* Keep never-constructed logger interceptor

* Ignore interior mutability of `Resource`

* Fix typo

* Resolve `clippy::doc-lazy-continuation` errors

* Upgrade `[email protected]` to `[email protected]`

See time-rs/time#693

* Update Cargo.toml (#1277)

Updated description to be aligned with what we use everywhere else

* fix: typos (#1297)

* Replace trees computation tasks with a worker (#1303)

* Replace trees computation tasks with a worker

* Address review comments

* Remove review comments

* zenohd-default config error #1292 (#1298)

* Zenohd panic when tring load file

When zenohd trying load file, if it have a problem it crash cause another treat was "unwrap", and it return to a type config. So, it crash and cause painic.

* zenohd default config error #1292

When tring load config file defined by -c option. With haver any problema "unwrap" has been to Config type.

I treat it return a Default Config whe it happen

* If file fail when try load configs

If file fail when try load configs

* Update main.rs

* Resolve typos at comment

Resolve typos at comment

* fix: typos (#1297)

* zenohd-default config error #1292 (#1298)

* Zenohd panic when tring load file

When zenohd trying load file, if it have a problem it crash cause another treat was "unwrap", and it return to a type config. So, it crash and cause painic.

* zenohd default config error #1292

When tring load config file defined by -c option. With haver any problema "unwrap" has been to Config type.

I treat it return a Default Config whe it happen

* If file fail when try load configs

If file fail when try load configs

* Update main.rs

* Resolve typos at comment

Resolve typos at comment

* Replace trees computation tasks with a worker (#1303)

* Replace trees computation tasks with a worker

* Address review comments

* Remove review comments

* revering fix #1298

---------

Signed-off-by: ChenYing Kuo <[email protected]>
Co-authored-by: ChenYing Kuo (CY) <[email protected]>
Co-authored-by: Mahmoud Mazouz <[email protected]>
Co-authored-by: Luca Cominardi <[email protected]>
Co-authored-by: Tavo Annus <[email protected]>
Co-authored-by: JLer <[email protected]>
Co-authored-by: Julien Enoch <[email protected]>
Co-authored-by: OlivierHecart <[email protected]>
Co-authored-by: Yuyuan Yuan <[email protected]>
Co-authored-by: Diogo Matsubara <[email protected]>
Co-authored-by: OlivierHecart <[email protected]>
Co-authored-by: kydos <[email protected]>
Co-authored-by: brianPA <[email protected]>
Co-authored-by: Tiago Neves <[email protected]>
@Mallets Mallets deleted the fix/backoff branch October 16, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants