-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal component-transform interface #1882
Comments
Imho the original RFC has several somewhat separable layers/steps:
Adding the semantics concept is probably the hardest part of this! Anyways, I think the only thing is needed as our first steps would be 1 through 4. If keep doing custom non-systemized code for jpeg, we can get away with just (1) but what's the point then :) Proposal for an easy & non-invasive way: Every component is required to have a phantom type field which refers to a semantic trait which itself is never implemented. This way semantic information is accessible but also occurs no runtime cost and doesn't change the way we store things. |
My thought is to approach this by doing some form of 1, 3, and part of 4. Rename the current Now, re-introduce a new Each DataCell will also store the ComponentName -- this will need to be provided at log-time and book-kept through the serialization. The component name will go into the arrow-table metadata for the column, though columns will continue to be named after the Buckets in the data-store, however, should continue to be indexed by-component. They will no-longer be single-typed, but I believe this is technically ok with the new DataCell architecture (at least until we bring back compaction? @teh-cmc to-verify). Now, All of the query logic stays the same but the returned This basically gives us the (component, datatype) tuple for our logged data as described by the rfc, and a hand-rolled way of mapping (component, X) -> (component, Y) at query-time. It just punts on all the dynamic / registration / lookup stuff. |
Single columns containing cells of distinct datatypes will definitely be challenging (assuming we don't rely on native arrow unions obviously), or at the very least full of surprises... but I think that's feasible:
Compaction and serialization/batching are one and the same these days, so same conclusion. |
I'm probably not following entirely, but wouldn't that meant that we again allow several representations of the same thing on a path? E.g. two ways of representing a box. I thought we wanted to make this a runtime error. I'm not sure I like the idea of a single preferred data representation. Different parts of the applications may have different requirements. On the other hand it does have advantages as we can keep the number of conversions down and predictable. Sidenote: This entire discussions cuts very deep into the data/semantic separation that I would have liked to punt on as this is obviously quite deep. |
Only if we have to? |
What's the alternative? I mean, what do we render/return when there are two conflicting definitions of the same thing? Something that might happen easily with transforms for example. |
Different representations of This means that representing a Rect with min/max would require changing the semantic of several components. I.e. it becomes a transform of the archetype and not merely of components. |
This would be a first step towards the bigger RFC: https://github.com/rerun-io/rerun/blob/main/design/component_datatypes.md
We would add something that transforms:
More design needed here
The text was updated successfully, but these errors were encountered: