-
-
Notifications
You must be signed in to change notification settings - Fork 7
Data format adapter to expose control over the representation of bytes #48
Comments
I think we considered this approach before doing serde-rs/json#656 an ran into some issues. However, it does seem like a wrapper should work. @jseyfried? |
I believe the proposal is not possible. Implementations of |
Not clear why it couldn't wrap that. Check out these existing deserializer adapter libraries:
For example next_element_seed is wrapped here in one of them: https://github.com/dtolnay/serde-stacker/blob/9f19950f4490fc125f2a808301c58fa168def34d/src/de.rs#L678. |
Are you thinking something along the lines of the example in this gist: https://gist.github.com/cfsamson/18009ca988ab1405b6ea6aaac819aed6 This would support a fixed set of encoders/decoders for example base64, hex, Uuencoding? etc. One thing I couldn't get working was serializing a struct using |
That is in the right direction but you need an implementation of serialize_struct and many of the other methods. Currently you've serialized structs exactly the same as the underlying serializer. Refer to https://github.com/dtolnay/serde-stacker/blob/0.1.4/src/ser.rs for an example of propagating custom behavior to struct fields correctly. |
Yeah, that's a good reference implementation. I've always wanted to dive a bit more into both generics and Serde. This looks like a good opportunity to do that, so I'd be happy to take a stab at this and try to get something working. |
@cfsamson We're intending to port over serde-rs/json#656 to this design. |
@jethrogb Ah, ok. Well, I've implemented a working serializer here https://github.com/cfsamson/rfi-serde-byte-repr, and I thought I'd check in before I went any further. I'm not sure if this still has any value for other serializers/deserializers (toml, yaml) though. |
@dtolnay I might have jumped the guns a bit by finishing an implementation without waiting for any discussion. If so, that's on me, but sometimes it's hard to stop. I recon this might still be useful for toml/yaml (de)serialization even though there are ongoing discussions on how to handle this in It could do with more tests though and ideas for good test cases is very welcome. I haven't published anything yet but those interested can take a look here:
I decided to change the naming a bit. I'm still not sure about it though, it might be better to follow the conventions used in |
@cfsamson can you create an empty branch in your repository and then file a pull request from master into that branch? That would make it easy to leave review comments. |
@jethrogb Sure. You can find the PR here: cfsamson/rfi-serde-byte-repr#1 Edit |
@jethrogb So, I found a way to make this easy to review, but it included creating a new repository. You can take a look here: cfsamson/serde-byte-repr#1 |
I'd prefer to have a default bytestring representation for In an attempt to follow the principle of least surprise, I cooked up an encoding scheme for my own use which works by using the null codepoint (which the JSON spec requires support for but which is only allowed in paths in the NT kernel object manager, where counted strings are used) as an escape character to indicate "the following codepoint should be re-interpreted as a byte". Thus, with my scheme, any I haven't published it to crates.io, but my own implementation is at https://gist.github.com/ssokolow/0d9f5c5e4a8a37a962875af205bcc723 if anyone wants to take a look, MIT/Apache licensed, complete with a small round-trip testing suite and, though the harness isn't included in the gist, I even did some performance tuning using Criterion. (Though I have no experience with low-level performance tuning and have no idea what performance to expect, so I was just flailing around, trying things blindly.) If there's sufficient interest, I could try to make it the first crate I publish to crates.io, but it may take some time, since things are a bit of a mess right now. |
@dtolnay Any idea on how/if we can move this forward? I've created a working implementation in line with the initial proposal (AFAIK). @ssokolow If I understand correctly you want to special case I would think that this check could be done by another crate that took care of these kinds of text/bytes special cases and chose the right (de)serialization strategy based on some checks for validity. |
I will close out this issue with a link to that repo. We can follow up on any further feature work in the issue tracker over there! If you find @ssokolow's custom encoding scheme compelling then that would be implementable as one of the ByteFormat options in serde-byte-repr. |
Sounds good. Please link to https://github.com/cfsamson/rfi-serde-byte-repr, the other repo was one I crated to make the codebase easier to comment on and will get deleted. |
Looks like that is the one I linked to. |
@cfsamson I want to ensure that It's also in line with the current behaviour, which treats it as a Because I used null as my escape character and neither POSIX nor Win32 allow nulls in paths, the encoding remains inter-compatible with what Serde currently does. (Null is the only character which the JSON spec requires support for but which is disallowed in POSIX paths. The alternative would probably be something in the Private Use Area, in which case I'd have to research which codepoint is least likely to collide with things that live there, like ConScript and MUFI... it'd still work, but inter-compatibility with non-escaping serializers/deserializers would be lost for that codepoint.) It just so happens that my encoding scheme works for any |
@ssokolow I see, well, as suggested that can be implemented as an option on If you're willing to publish a crate with this encoding scheme with an API like the Base64 crate for example it would be only a few lines of code to add that option in. I think I'm going to publish this crate on crates.io as it is now and we'll add the support you want as soon as it's finished. You can send an issue to that repo with a link to the crate or a PR. Does that sound like a plan? |
Sounds good. For the last few days, I've had to pour all my effort into preparing for some hard drive replacements and overdue upgrades, but I'll leave this on my TODO list. |
Great. The crate is now published: https://crates.io/crates/serde-bytes-repr |
Human readable formats tend not to include a universally agreed way to represent arbitrary binary data, which means those serde libraries can end up using a representation for serde's "bytes" type which isn't ideal for all uses.
Three examples:
It would be good to have an adapter which one can wrap around a Serde serializer or deserializer to make it use a different underlying representation for binary bytes, such as base64 (which itself may need to be customizable with base64 character set and padding).
Usage may look like:
The implementation would look like a wrapper which wraps a Serializer or Deserializer and forwards through all the methods of those traits except for serialize_bytes and deserialize_bytes, converting those instead to serialize_str / deserialize_str with the appropriate base64 (or other) encoding applied.
The text was updated successfully, but these errors were encountered: