Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The se::to_writer function now requires std::fmt::Write instead of std::io::Write #499

Open
calops opened this issue Oct 28, 2022 · 22 comments
Labels
question serde Issues related to mapping from Rust types to XML

Comments

@calops
Copy link
Contributor

calops commented Oct 28, 2022

Is there a reason for this change? It makes it impossible to use this function with a Vec<u8> (which does implement std::io::Write), while it is still possible in the current released version.

@Mingun
Copy link
Collaborator

Mingun commented Oct 28, 2022

Yes, the reason is because XML is a text-based format which could use many encodings. You should manually encode the string result to bytes for now. If you want an UTF-8 encoded data in Vec<u8>, it is easily and zero-cost to get it:

let ser = Serializer::new(String::new());
let string: String = data.serialize(ser).unwrap();
let bytes: Vec<u8> = string.into_bytes();

@Mingun Mingun closed this as completed Oct 28, 2022
@Mingun Mingun added question serde Issues related to mapping from Rust types to XML labels Oct 28, 2022
@calops
Copy link
Contributor Author

calops commented Oct 30, 2022

That's a fair point. I have a question though:

I create a Writer, which requires a Write argument that implements std::io::Write. But if I want to pass that writer around and append to what it's doing, I can only call to_writer(writer.inner()), which won't work, because to_writer requires std::fmt::Write. Right now I'm having to use an intermediate type that implements the fmtversion and forwards it toio` (or vice versa).

I feel like the actual solution to my use-case would be the ability to call the Writer directly so that it can serialize a random argument I throw at it, without having to go through create_element, which requires defining a root tag (which I don't want in this context).

@Mingun
Copy link
Collaborator

Mingun commented Oct 30, 2022

You want to mix serde serializer with event-based Writer API? Could you describe your case better? Also, I suppose you use master, right?

@Mingun Mingun reopened this Oct 30, 2022
@calops
Copy link
Contributor Author

calops commented Oct 31, 2022

Basically, I write a bunch of nodes myself through the event-based API. But then the inner part is serialized through serde, so I end up calling to_writer(ToFmtWrite(writer.inner())) in the middle of my events, ToFmtWrite being a dumb newtype with the correct fmt::Write implementation for anything that implements io::Write. I'm using a Vec<u8> as the buffer for the writer (but I'd use a String just as well if it meant I don't have to do this Write trickery).

Being able to replace something like that

writer.write_event(Event::Start(BytesStart::new("foo")))?;
quick_xml::se::to_writer(ToWriteFmt(writer.inner()), &Bar { baz: 42 })?;
writer.write_event(Event::End(BytesEnd::new("foo")))?;

with something like this

writer
    .create_element("foo")
    .write_serialized_content(Bar { baz: 42 })?;

would be the ideal implementation for my use-case.

And yes, I am using master.

@Mingun
Copy link
Collaborator

Mingun commented Oct 31, 2022

with something like this

writer
    .create_element("foo")
    .write_serialized_content(Bar { baz: 42 })?;

would be the ideal implementation for my use-case.

Yes, I agree that we should have it, but it just not yet implemented. Did you mind to make a PR?

Ideally, we should have a reader, that would be able to deserialize objects using serde and a writer, that should be able to serialize them, but implementing a deserializer part a much more hard task then the serializer one.

@calops
Copy link
Contributor Author

calops commented Nov 3, 2022

Yeah I can work on a PR for this, I'll get back to you once it's ready.

@zeenix
Copy link
Contributor

zeenix commented Jan 4, 2023

I just tried to upgrade to 0.27 in zbus and was surprised by this change breaking my code. I know it's a different semver release so breaking changes are to be expected but it would have been nice to find any mention of this change and the rationale in either the release notes, or the relevant commit log (4f376b1).

In any case, isn't it change making to_writer inconsistent with both other similar serde APIs and also with the to_reader, which expects a io::Read?

@zeenix
Copy link
Contributor

zeenix commented Jan 4, 2023

In any case, isn't it change making to_writer inconsistent with both other similar serde APIs and also with the to_reader, which expects a io::Read?

Oops, give the incorrect link to the JSON lib. I meant serde_json of course: https://docs.rs/serde_json/latest/serde_json/fn.to_writer.html

@zeenix
Copy link
Contributor

zeenix commented Jan 4, 2023

Yes, the reason is because XML is a text-based format

I would think that's true for JSON as well.

which could use many encodings.

Does fmt::Write help there? AFAIK it's UTF-8 specific. In fact the docs say:

This trait only accepts UTF-8–encoded data ... If you only want to accept Unicode and you don’t need flushing, you should implement this trait; otherwise you should implement std::io::Write.

So that seems like a counter-argument to using fmt::Write unless I'm missing something? 🤔

@Mingun
Copy link
Collaborator

Mingun commented Jan 4, 2023

I would think that's true for JSON as well.

Yes, and I think this is oversight in the serde_json. It writes UTF-8 only, which is a task for fmt::Write.

Our serde writer doesn't write the XML declaration (<?xml ... ?>) for composability (actually, it writes XML fragments instead of XML documents) and produces only UTF-8 encoded output. This is by intention: if you need to get the result in another encoding, you should encode the result yourself and append a correct XML declaration. Both tasks can be added as helper functions to quick-xml.

Adapter from fmt::Write to io::Write performs infallible conversion, but the opposite is not true. So we decided to use more specific trait to write data.

@zeenix
Copy link
Contributor

zeenix commented Jan 4, 2023

Thanks for explaining.

Yes, and I think this is oversight in the serde_json

Are you sure? Is there an issue on serde_json about this? I'm sorry for being skeptical here but this is @dtolnay (the main person behind a lot of Rust's fundamental API, including serde itself) we're talking about.

Our serde writer doesn't write the XML declaration (<?xml ... ?>) for composability (actually, it writes XML fragments instead of XML documents) and produces only UTF-8 encoded output. This is by intention: if you need to get the result in another encoding, you should encode the result yourself and append a correct XML declaration. Both tasks can be added as helper functions to quick-xml.

Yeah but AFAIK Write is an abstraction for producing the output and should be independent of the format. E.g if I want to write to a file or socket, I'd use the io::Write implementing types for that (just like I'd use io::Read for reading from them). If only UTF-8 is supported, then I don't see any problem. If other encodings are planned or expected in the future, then I'd suggest making it configurable via some parameters to to_writer and from_reader. I actually implemented such an API (see the ctxt param).

Also what about the internal inconsistency with from_reader?

@Mingun
Copy link
Collaborator

Mingun commented Jan 4, 2023

Is there an issue on serde_json about this?

I don't known. While @dtolnay can be great in other projects, serde project is not in him focus nowadays and it is effectively frozen as you can see from numerous problems in serde bugtracker those does not get any attention from maintainers.

Yeah but AFAIK Write is an abstraction for producing the output and should be independent of the format.

Yes, it is, but in our case, it is more convenient for us to work internally with a type that protects us from the accidental use of incorrect encoding when writing XML parts (or worse, the use of different encodings when writing different parts). That type is everything that impl fmt::Write. Since there is no reason to hide this implementation from an external user, it is part of the public API. The presence of an additional API with io::Write is not prohibited and I will be happy to accept PR, which will add it. I am unlikely to work in this direction, since I do not need to write XML.

Also what about the internal inconsistency with from_reader?

Difference from reader side in two points:

  • there are no trait similar to fmt::Write for reader part (but probably we'll switch to an internal reader that will follow that concept in ergonomics & encodings #158)
  • XML encoding can be detected during reading

@dtolnay
Copy link

dtolnay commented Jan 4, 2023

I think io::Write is the better API for to_writer. Pretty near 100% of users of a to_writer API are going to be using it with types that implement io::Write and do not implement fmt::Write. Things like File, BufWriter, BytesMut, Stdout, etc. Forcing near 100% of callers to use a fmt/io adapter would be a worse API regardless of the adapter being zero cost.

@zeenix
Copy link
Contributor

zeenix commented Jan 4, 2023

Is there an issue on serde_json about this?

I don't known. While @dtolnay can be great in other projects, serde project is not in him focus nowadays and it is effectively frozen as you can see from numerous problems in serde bugtracker those does not get any attention from maintainers.

Yeah but both these traits existed when serde was developed so I'm not sure the current state of serde projects is very relevant here.

The presence of an additional API with io::Write is not prohibited and I will be happy to accept PR, which will add it.

If this API could be zero-cost, I wouldn't mind at all and can look into contributing it. However, looking at the implementations in std, I don't see how that would work w/o some external crate or new fmt::Write impls. 🤔

I am unlikely to work in this direction, since I do not need to write XML.

That's fair and hope you can agree that it'd be nice not to ditch functionality that could be essential for users.

XML encoding can be detected during reading

Not sure if this difference is relevant? In case of writing, there is no need for detection.

@Mingun
Copy link
Collaborator

Mingun commented Jan 4, 2023

Forcing near 100% of callers to use a fmt/io adapter would be a worse API regardless of the adapter being zero cost.

That adapter could represent essential entity of XML world -- an XML document, which is a container for XML declaration, DTD, used namespaces. Serde part writes only XML fragments which is not a documents, so writing they to file / network directly could be error-prone for reader (if reader expects standalone XML). This is the case where the explicit is better, than implicit.

Yeah but both these traits existed when serde was developed so I'm not sure the current state of serde projects is very relevant here.

It was the answer to why serde_json shouldn't be seen as the truth of last resort, and why the missing issue in their bug tracker means nothing.

I don't see how that would work w/o some external crate or new fmt::Write impls.

The simplest adapter is only 10 lines:
https://github.com/tafia/quick-xml/pull/508/files#diff-7d5c45da93a71cc9d454396255360338e2476d783dcd17e9b57a09c4b36aa182R368-R379

struct ToFmtWrite<T>(pub T);

impl<T> std::fmt::Write for ToFmtWrite<T>
where
    T: std::io::Write,
{
    fn write_str(&mut self, s: &str) -> std::fmt::Result {
        self.0.write_all(s.as_bytes()).map_err(|_| std::fmt::Error)
    }
}

Not sure if this difference is relevant? In case of writing, there is no need for detection.

When you write, you himself command which encoding to use (= which bytes to write when you write text string). When you read, XML could contain used encoding inside XML itself (so you need to read XML in some encoding to determine which encoding is used...)

@dralley
Copy link
Collaborator

dralley commented Jan 4, 2023

It was the answer to why serde_json shouldn't be seen as the truth of last resort, and why the missing issue in their bug tracker means nothing.

Nonetheless it would be a good idea to file an issue with serde

@zeenix
Copy link
Contributor

zeenix commented Jan 4, 2023

Serde part writes only XML fragments which is not a documents, so writing they to file / network directly could be error-prone for reader (is reader expects standalone XML). This is the case where the explicit is better, than implicit.

Right, that's a good point.

The simplest adapter is only 10 lines:

Ah right, thanks. I was thinking the other way around.

@zeenix
Copy link
Contributor

zeenix commented Jan 4, 2023

Serde part writes only XML fragments which is not a documents

BTW, is this a new behavior? I'm asking cause my testcase now fails on reading a sample xml when i upgrade with

Error: QuickXml(Custom("missing field `type`"))

@Mingun
Copy link
Collaborator

Mingun commented Jan 4, 2023

This is because you need to name attributes in your structs starting with a @ character. The code under provided link is probably already was changed, because I do not see the field type in your Node struct, but I see that field name is not renamed to @name.

@iwinux
Copy link

iwinux commented Aug 23, 2024

After reading all the comments above, I still haven't figure how to write XML to std::fs::File with to_writer (without holding the complete XML output in memory first) :(

@Mingun
Copy link
Collaborator

Mingun commented Aug 23, 2024

For UTF-8 case you can use:

use quick_xml::se::to_writer;
use serde::Serialize;

struct ToFmtWrite<T>(pub T);

impl<T> std::fmt::Write for ToFmtWrite<T>
where
    T: std::io::Write,
{
    fn write_str(&mut self, s: &str) -> std::fmt::Result {
        self.0.write_all(s.as_bytes()).map_err(|_| std::fmt::Error)
    }
}
#[derive(Serialize)]
struct Xml {
    #[serde(rename = "@attribute")]
    attribute: &'static str,
    element: &'static str,
}
let data = Xml {
    attribute: "attribute",
    element: "element",
};
let mut io = ToFmtWrite(std::io::stdout());
to_writer(&mut io, &data).unwrap();

@pronebird
Copy link
Contributor

pronebird commented Dec 29, 2024

I really feel like we're just copying and pasting a part of quick-xml in our codebases. UTF-8 is nearly everywhere, so lacking a layman facility is unacceptable.

Given that ToFmtWrite is already a part of quick-xml, maybe there should be a helper method accepting std::io::Writer, something along the lines of to_utf8_writer().

Serializer can probably offer an alternative constructor accepting io::Writer and auto-wrapping it into ToFmtWrite -- I explored this but that seems rather tidious to implement because of generic constraint on Serializer

Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants