Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serializarion for zenoh types #16

Open
wants to merge 36 commits into
base: serialization_ext2
Choose a base branch
from

Conversation

milyin
Copy link

@milyin milyin commented Sep 29, 2024

No description provided.

yellowhatter and others added 30 commits September 17, 2024 17:58
Conflicts:
	Cargo.lock
	Cargo.toml
	commons/zenoh-shm/Cargo.toml
	commons/zenoh-shm/src/cleanup.rs
The `StorageService` was splitting the `StorageConfig` that was used to
create it. In addition to adding noise, this prevented separating the
creation of the structure from spawning the subscriber and queryable
associated with a Storage.

This commit changes the fields of the `StorageService` structure to keep
the entire `StorageConfig` -- thus allowing separating the creation from
spawning the queryable and subscriber.

* plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs:
  - Removed the following fields from the `StorageService` structure:
    - `key_expr`,
    - `complete`,
    - `strip_prefix`.
  - Added the field `configuration` to keep track of the associated
    `StorageConfig`.
  - Changed the signature of the `start_storage_queryable_subscriber`
    removing the `GarbageConfig` as it is now contained in `&self`.
  - Updated the calls to access `key_expr`, `complete` and
    `strip_prefix`.
  - Removed an `unwrap()` and instead log an error.

Signed-off-by: Julien Loudet <[email protected]>
This commit separates creating the `StorageService` from starting it.

This change is motivated by the Replication feature: when performing the
initial alignment we want to delay the Storage from answering queries
until after the initial alignment has been performed.

In order to have this functionality we need to be able to dissociate
creating the `StorageService` from starting it.

As the method `start_storage_queryable_subscriber` takes ownership of
the `StorageService`, it became mandatory to first create the
`StorageService`, then start the Replication and lastly start the
Storage. Because of this, as the Replication code was inside a task, the
code to create and start the Storage was also moved inside the task.

* plugins/zenoh-plugin-storage-manager/src/replication/service.rs:
  - Take a reference over the `StorageService` structure as it is only
    needed before spawning the different Replication tasks. The
    StorageService is still needed to call `process_sample`.
  - Clone the `Arc` of the underlying Storage before spawning the
    Replication tasks.

* plugins/zenoh-plugin-storage-manager/src/storages_mgt/mod.rs:
  - Move the logging until starting the Storage.
  - Move the code starting the Storage inside the task.
  - Start the `StorageService` after having started the
    `ReplicationService`.

* plugins/zenoh-plugin-storage-manager/src/storages_mgt/service.rs:
  - Renamed the function `start` to `new` as it now only creates an
    instance of the `StorageService`.
  - Removed the parameter `rx` from the call to `new` as it no longer
    also starts it.
  - Removed the call to `start_storage_queryable_subscriber` from `new`.
  - Changed the visibility of the method
    `start_storage_queryable_subscriber` to `pub(crate)` as it is called
    from outside the `service` module.
  - Added logging information before the Storage "loop" is started (to
    help confirm, with the logs, the order in which the different
    elements are started).

Signed-off-by: Julien Loudet <[email protected]>
As we were using the key expression of the Storage to generate the key
expressions used in the Replication, it was possible to receive Digest
emitted by Replicas that were operating on a subset of the key space of
the Storage.

This commit changes the way the key expressions for the Replication are
generated by using the hash of the configuration of the Replication:
this renders these key expressions unique, hence avoiding the issue just
described.

This property is interesting for the initial Alignment: had we not made
that change, we would have had to ensure that we perform that Alignment
on a Replica operating on exactly the same key space (and not a subset)
and the same configuration (in particular, the `strip_prefix`).

NOTE: This does not solve the initial alignment step that will still
contact Storage that are operating on a subset (if there is no better
match on the network).

* plugins/zenoh-plugin-storage-manager/src/replication/core.rs:
  - Renamed `storage_ke` to `hash_configuration` in the key expression
    formatters of the Digest and the Aligner.
  - Removed the unnecessary clones when spawning the Digest Publisher +
    fixed the different call sites.
  - Removed the scope to access the configuration as we clone it earlier
    in the code + fixed the different call sites.
  - Used the hash of the configuration to generate the key expression
    for the Digest.
  - Removed the unnecessary clones when spawning the Digest Subscriber +
    fixed the different call sites.
  - Used the hash of the configuration to generate the key expression
    for the Digest.
  - Removed the unnecessary clones when spawning the Digest Publisher +
    fixed the different call sites.
  - Used the hash of the configuration to generate the key expression
    for the Digest.
  - Removed the unnecessary clones when spawning the Aligner + fixed the
    different call sites.
  - Used the hash of the configuration to generate the key expression
    for the Aligner Queryable.

Signed-off-by: Julien Loudet <[email protected]>
It does not bring anything to wait and retry on error when attempting to
declare a Queryable or a Subscriber: either the Session is established
and these operations will succeed or the Session is no longer existing
in which case we should terminate.

* plugins/zenoh-plugin-storage-manager/src/replication/core.rs:
  - The `MAX_RETRY` and `WAIT_PERIODS_SECS` constants are no longer
    needed.
  - Removed the wait/retry loops when creating the Digest Subscriber and
    Aligner Queryable.

Signed-off-by: Julien Loudet <[email protected]>
This commit changes the way a replicated Storage starts: if it is empty
and configured to be replicated, it will attempt to align with an active
and compatible (i.e. same configuration) Replica before anything.

The previous behaviour made a query on the key expression of the
Storage. Although, it could, in some cases, actually perform the same
initial alignment, it could not guarantee to only query a Storage that
was configured to be replicated.

To perform this controlled initial alignment, new variants to the
`AlignmentQuery` and `AlignmentReply` enumerations were added:
`Discovery` to discover an active Replica and reply with its `ZenohId`,
`All` to request the data from the discovered Replica.

To avoid contention, this transfer is performed by batch, one `Interval`
at a time.

* plugins/zenoh-plugin-storage-manager/src/replication/aligner.rs:
  - Added new variants `AlignmentQuery::All` and
    `AlignmentQuery::Discovery`.
  - Added new variant `AlignmentReply::Discovery`.
  - Updated the `aligner` method to:
    - Send the `ZenohId` as a reply to an `AlignmentQuery::Discovery`.
    - Send all the data of the Storage as a reply to an
      `AlignmentQuery::All`. This leverages the already existing
      `reply_events` method.
  - Updated the `reply_events` method to not attempt to fetch the
    content of the Storage if the action is set to `delete`. Before this
    commit, the only time this method was called was during an alignment
    which filters out the deleted events (hence not requiring this
    code).
  - Updated the `spawn_query_replica_aligner` method:
    - It now returns the handle of the newly created task as we want to
      wait for it to finish when performing the initial alignment.
    - Changed the consolidation to `ConsolidationMode::Monotonic` when
      sending an `AlignmentQuery::Discovery`: we want to contact the
      fastest answering Replica (hopefully the closest).
    - Stopped processing replies when processing an
      `AlignmentQuery::Discovery` as we only want to perform the initial
      alignment once.
  - Updated the `process_alignment_reply`:
    - Process an `AlignmentReply::Discovery` by sending a follow-up
      `AlignmentQuery::All` to retrieve the content of the Storage of
      the discovered Replica.
    - It does not attempt to delete an entry in the Storage when
      processing an `AlignmentReply::Retrieval`. This could only happen
      when performing an initial alignment in which case the receiving
      Storage is empty. We basically only need to record the fact that a
      delete was performed in the Replication Log.

* plugins/zenoh-plugin-storage-manager/src/replication/core.rs:
  implemented the `initial_alignment` method that attempts to discover a
  Replica by sending out an `AlignmentQuery::Discovery` on the Aligner
  Queryable for all Replicas. Before making this query we wait a small
  delay to give enough time for Zenoh to propagate the routing tables.

* plugins/zenoh-plugin-storage-manager/src/replication/service.rs:
  - Removed the constants `MAX_RETRY` and `WAIT_PERIOD_SECS` as they
    were no longer needed.
  - Updated the documentation of the `spawn_start` function.
  - Removed the previous implementation of the initial alignment that
    made a query on the key expression of the Storage.
  - Added a check after creating the `Replication` structure: if the
    Replication Log is empty, which indicates an empty Storage, then
    perform an initial alignment.

Signed-off-by: Julien Loudet <[email protected]>
…al_ports_in_tests

do not use Linux Ephemeral ports in tests
* fix: starting rest plugin like any other plugin

Signed-off-by: Gabriele Baldoni <[email protected]>

* fix: using mix of blockon_runtime and runtime_spawn

Signed-off-by: Gabriele Baldoni <[email protected]>

* chore: removing commented line

Signed-off-by: Gabriele Baldoni <[email protected]>

---------

Signed-off-by: Gabriele Baldoni <[email protected]>
…ase-yaml

chore: update release.yml for required labels
* connect_peers returns false if not connected to avoid wrong start_contitions notif

* Document `connect` and `connect_peer`

* Clarify comments

* Update comments

---------

Co-authored-by: Mahmoud Mazouz <[email protected]>
* Remove closing from hat trait

* Also move orchestrator session closing code to closed phase

* Remove closing from TransportPeerEventHandler
Copy link

PR missing one of the required labels: {'dependencies', 'breaking-change', 'new feature', 'bug', 'internal', 'documentation', 'enhancement'}

Copy link

PR missing one of the required labels: {'bug', 'new feature', 'breaking-change', 'documentation', 'dependencies', 'enhancement', 'internal'}

@wyfo
Copy link

wyfo commented Sep 29, 2024

Thinking one more time about it, I'm not sure we should add support for these specific types for the sole reason they are used in a backend. The work should maybe be done into the backend code directly.

Copy link

PR missing one of the required labels: {'new feature', 'enhancement', 'bug', 'documentation', 'breaking-change', 'internal', 'dependencies'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants