You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
teh-cmc
changed the title
Support sending a DataCell's size over the wire
Support sending a DataCell's size (& other metadata) over the wire
Apr 4, 2023
The size computation is now happening on the clients' no matter what (we need the value for the size_bytes trigger of the batching system), so not sending it over the wire is a literal waste of compute resources.
A `TransportChunk` is a `Chunk` that is ready for transport and/or
storage.
It is very cheap to go from `Chunk` to a `TransportChunk` and
vice-versa.
A `TransportChunk` maps 1:1 to a native Arrow `RecordBatch`. It has a
stable ABI, and can be cheaply send across process boundaries.
`arrow2` has no `RecordBatch` type; we will get one once we migrate to
`arrow-rs`.
A `TransportChunk` is self-describing: it contains all the data _and_
metadata needed to index it into storage.
We rely heavily on chunk-level and field-level metadata to communicate
Rerun-specific semantics over the wire, e.g. whether some columns are
already properly sorted.
The Arrow metadata system is fairly limited -- it's all untyped strings
--, but for now that seems good enough. It will be trivial to switch to
something else later, if need be.
- Fixes#1760
- Fixes#1692
- Fixes#3360
- Fixes#1696
---
Part of a PR series to implement our new chunk-based data model on the
client-side (SDKs):
- #6437
- #6438
- #6439
- #6440
- #6441
This would allow us to compute the size of
DataCell
s (a very costly operation) on the clients and therefore:The text was updated successfully, but these errors were encountered: