You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of my biggest problems with the csv crate is the vast number of limitations it imposes. For example, you can't serialize scalars (general primitives) outside of structs, and lots of otherwise-helpful types aren't possible to export.
When it comes to building tables - and that's CSV's only 'real' use case - you sometimes find that your data won't fit into the csv crate's model.
This crate is intended to address these concerns. Unfortunately, this means making yet another subset of CSV that includes representation of Rust types within headers. For example, I currently want an enum to be represented by something like Nested::Enum::Variant in the header, with Variant's data dutifully v.serialize(self)'d during construction.
It also has some design woes...
Design
I've considered various API designs for this library. It seems that csv can represent most types so long as they fit the Record model, meaning that you're stuck serializing one struct with a bazillion fields.
That's pretty ugly in your codebase, but more importantly, it's difficult to maintain. In addition, the overall structure of csvdisallows decent usage of #[serde(flatten)] on fields. This means that it can't really handle 'recursive' fields full of data, which is what Rust is usually good at!
Even worse, csv has countless runtime errors for all of these unsupported cases. You don't know what's going to work until you try it, making your code feel like some flimsy Python. That's fine for many folks using csv, but some use cases can't handle fragile constructs.
It's not csv's fault, though - serde requires you to fully implement their serialization API, despite some formats not supporting certain types. Even so, I aim to somehow reach around the trait system to at least alert users of the problems at compile time when serializing to a header (e.g., actually making a whole file).
In this way, some automatic determination of the given types' ordering would be fantastic! However, there are some difficulties here.
The Spirit of Reflection
I think, then, the design will be oriented around checking the types users give us. Here's some context:
An Example
Let's say we have three structs: A, B, and C. We always want them to be in that order, as CSV is a table, and its records should never mutate the original structure.
However, if our serializer allows users to input multiple types, we're just asking for inconsistencies.
My first thoughts were to just use marker traits and a PhantomData to represent the current state of the serializer. I still think that could be helpful for handling header build state, but it vastly increases the amount of random checks we need in each implementation. That sounds supremely annoying to implement, so I considered other options.
Grabbing Type Information
Well, how about std::any::TypeId? Using it would allow us to keep a list of types that we want in each Record without enforcing the struct of shame. It's also generated at compile-time, and... oh! We have to access it at runtime.
I briefly remembered const and how it could inspire my design. It helped, basically reminding me that I know the order of serialization. Because... here's what I'm looking for.
use nqcsv::{Serializer, RecordBuilder};
struct A {
a: u8,
}
struct B;
struct C;
fn main() {
// define the fields
let mut record_maker = RecordBuilder::new()
.add(A)
.add(B)
.add(C)
.build();
// make some data
let a = A { a: 1 };
let b = B;
let c = C;
// alright, let's try to serialize them in the wrong order!
let record = record_maker.push(&c).push(&a).push(&b).make();
}
This should result in some kind of compile-time error:
$ cargo build
Compiling nqcsv v0.1.0(/home/user/Documents/projects/nqcsv)Compiling ...
error:This record contains fields in the wrong order.Expected order:[A,B,C]Found order:[C,A,B]
--> src/main.rs:24:38
|
24 | let record = record_maker.push(&c).push(&a).push(&b).make();
| ^^^^
|
= note: this error originates in the macro `nqcsv::record`
To be clear, I don't love this interface design, either.
I do like the idea of making a builder, but the fact that you need to specify it twice is relatively annoying. It feels... shameful. I'd rather some way to create a Record type from the given information. After all, wasn't that the purpose of TypeId?
One consideration for this feature would be a macro design. For example, a procedural macro could kinda create a marker type and implement a StateMarker for it.
That's not fully up to snuff, though. Because the entire Serializer trait needs to be implemented for each stage, there'd still be runtime errors! Gah!
To get rid of those, we'd have to move the macro stuff into the implementation side of things. Here's the ideal way I'd do this if you could keep state across procedural macro invocations:
impl<'a> ser::Serializerfor&'a mutCsvSerializer{// ...snip// just a random one of these MANY methodsfnserialize_i8(self,v:i8) -> Result<Self::Ok,Self::Error>{// panics (during compilation) if order's wrong.// otherwise, resolves to unit type, (). ;)crate::util::type_check!(&mutself.order, v);self.push_str_unchecked(&itoa::Buffer::new().format(v));Ok(())}// snip...}
However, I am not convinced that it's possible without reflection (or macro state, which is generally viewed as bad).
Falling Back
With these problems combined, I think we need a formal Record type that can represent that data a user's putting in. That increases flexibility!
However, it still has that "doublespeak" pattern where you first specify what you'll give to the library, then you hand it over. And... we're back to the single-struct syntax that we wanted to avoid.
I suppose we're hitting the edge of what serde (and Rust) can do, making us lean back into macro territory again. Essentially, we need to know EVERY call that ever comes in to avoid it.
Additionally, I'm not even sure how to get the entire context in my proc macro. It'd have to be something incredibly hacky that practically does this:
make all the header methods except "Record" (canonical struct) syntax panic. hide them from docs. put scary doc comments for ides. if possible, make them deprecated (not sure if that works on trait impls).
create a state model for serialization similar to that in csv
wrap everything up in the CsvWriter/CsvReader
consider improvements from there. look for ways to split serde's interface, perhaps with a weird fork. see if other format impls have done anything more successful.
One of my biggest problems with the
csv
crate is the vast number of limitations it imposes. For example, you can't serialize scalars (general primitives) outside of structs, and lots of otherwise-helpful types aren't possible to export.When it comes to building tables - and that's CSV's only 'real' use case - you sometimes find that your data won't fit into the
csv
crate's model.This crate is intended to address these concerns. Unfortunately, this means making yet another subset of CSV that includes representation of Rust types within headers. For example, I currently want an enum to be represented by something like
Nested::Enum::Variant
in the header, withVariant
's data dutifullyv.serialize(self)
'd during construction.It also has some design woes...
Design
I've considered various API designs for this library. It seems that
csv
can represent most types so long as they fit theRecord
model, meaning that you're stuck serializing onestruct
with a bazillion fields.That's pretty ugly in your codebase, but more importantly, it's difficult to maintain. In addition, the overall structure of
csv
disallows decent usage of#[serde(flatten)]
on fields. This means that it can't really handle 'recursive' fields full of data, which is what Rust is usually good at!Even worse,
csv
has countless runtime errors for all of these unsupported cases. You don't know what's going to work until you try it, making your code feel like some flimsy Python. That's fine for many folks usingcsv
, but some use cases can't handle fragile constructs.It's not
csv
's fault, though -serde
requires you to fully implement their serialization API, despite some formats not supporting certain types. Even so, I aim to somehow reach around the trait system to at least alert users of the problems at compile time when serializing to a header (e.g., actually making a whole file).In this way, some automatic determination of the given types' ordering would be fantastic! However, there are some difficulties here.
The Spirit of Reflection
I think, then, the design will be oriented around checking the types users give us. Here's some context:
An Example
Let's say we have three structs:
A
,B
, andC
. We always want them to be in that order, as CSV is a table, and its records should never mutate the original structure.However, if our serializer allows users to input multiple types, we're just asking for inconsistencies.
My first thoughts were to just use marker traits and a
PhantomData
to represent the current state of the serializer. I still think that could be helpful for handling header build state, but it vastly increases the amount of random checks we need in each implementation. That sounds supremely annoying to implement, so I considered other options.Grabbing Type Information
Well, how about
std::any::TypeId
? Using it would allow us to keep a list of types that we want in each Record without enforcing thestruct
of shame. It's also generated at compile-time, and... oh! We have to access it at runtime.I briefly remembered
const
and how it could inspire my design. It helped, basically reminding me that I know the order of serialization. Because... here's what I'm looking for.This should result in some kind of compile-time error:
To be clear, I don't love this interface design, either.
I do like the idea of making a builder, but the fact that you need to specify it twice is relatively annoying. It feels... shameful. I'd rather some way to create a
Record
type from the given information. After all, wasn't that the purpose ofTypeId
?One consideration for this feature would be a macro design. For example, a procedural macro could kinda create a marker type and implement a
StateMarker
for it.That's not fully up to snuff, though. Because the entire Serializer trait needs to be implemented for each stage, there'd still be runtime errors! Gah!
To get rid of those, we'd have to move the macro stuff into the implementation side of things. Here's the ideal way I'd do this if you could keep state across procedural macro invocations:
However, I am not convinced that it's possible without reflection (or macro state, which is generally viewed as bad).
Falling Back
With these problems combined, I think we need a formal
Record
type that can represent that data a user's putting in. That increases flexibility!However, it still has that "doublespeak" pattern where you first specify what you'll give to the library, then you hand it over. And... we're back to the single-
struct
syntax that we wanted to avoid.I suppose we're hitting the edge of what
serde
(and Rust) can do, making us lean back into macro territory again. Essentially, we need to know EVERY call that ever comes in to avoid it.Additionally, I'm not even sure how to get the entire context in my proc macro. It'd have to be something incredibly hacky that practically does this:
It may not be possible to use
serde
as-is, but instead, require our own#[derive(Csv)]
trait to make anything flexible.And even that still gives "single struct" vibes.
I'm not sure where to continue, but I'll continue researching and provide additional details when I have them.
Until then, I'm definitely getting to see why
csv
is designed in this way. 😄The text was updated successfully, but these errors were encountered: