Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserializing untagged enums is slow #2101

Closed
epage opened this issue Oct 14, 2021 · 9 comments
Closed

Deserializing untagged enums is slow #2101

epage opened this issue Oct 14, 2021 · 9 comments

Comments

@epage
Copy link

epage commented Oct 14, 2021

When profiling toml_edit, a hand-written Deserialize for an untagged enum took a benchmark for a common case in cargo from 484us to 181us (for a single Cargo.toml file, using this in the dependency resolving would multiple that by hundreds).

Root Cause

The derive's untagged enum implementation tries each variant, one after another.

#[automatically_derived]
impl <'de> _serde::Deserialize<'de> for Value {
    fn deserialize<__D>(__deserializer: __D)
     -> _serde::__private::Result<Self, __D::Error> where
     __D: _serde::Deserializer<'de> {
        let __content =
            match <_serde::__private::de::Content as
                      _serde::Deserialize>::deserialize(__deserializer)
                {
                _serde::__private::Ok(__val) => __val,
                _serde::__private::Err(__err) => {
                    return _serde::__private::Err(__err);
                }
            };
        if let _serde::__private::Ok(__ok) =
               _serde::__private::Result::map(<i64 as
                                                  _serde::Deserialize>::deserialize(_serde::__private::de::ContentRefDeserializer::<__D::Error>::new(&__content)),
                                              Value::Integer)
           {
            return _serde::__private::Ok(__ok);
        }
        if let _serde::__private::Ok(__ok) =
               _serde::__private::Result::map(<f64 as
                                                  _serde::Deserialize>::deserialize(_serde::__private::de::ContentRefDeserializer::<__D::Error>::new(&__content)),
                                              Value::Float) {
            return _serde::__private::Ok(__ok);
        }
	// ...
        if let _serde::__private::Ok(__ok) =
               _serde::__private::Result::map(<Table as
                                                  _serde::Deserialize>::deserialize(_serde::__private::de::ContentRefDeserializer::<__D::Error>::new(&__content)),
                                              Value::Table) {
            return _serde::__private::Ok(__ok);
        }
        _serde::__private::Err(_serde::de::Error::custom("data did not match any variant of untagged enum Value"))
    }
}

For each failure, we generate an error which allocates its message and is adverised as a #[cold]

#[cold]
fn invalid_type(unexp: Unexpected, exp: &Expected) -> Self {
    Error::custom(format_args!("invalid type: {}, expected {}", unexp, exp))
}

Potential Solutions

  • Add a new attribute for suggesting a type to dispatch off of. This can be brittle as it requires knowing the Deserialize implementation of each of the types within the untagged enum.
  • Find a backwards-compatible way of checking each variant without a call to invalid_type
  • Find a backwards-compatible way of not allocating in invalid_type
  • Consider making this error not be #[cold] (I'm assuming this will make little difference)
@Kixunil
Copy link
Contributor

Kixunil commented Oct 18, 2021

Another solution is to not use untagged enum and implement deserialization manually. However I wonder what is the exact use case of untagged enums in cargo?

@epage
Copy link
Author

epage commented Oct 18, 2021

This was caught in my benchmarks for general APIs reading Cargo.toml files (think serde_json::Value)

For a general API, you also have to deal with values being one of several types, for example

key_to_string = "value"
key_to_num = 200

As for cargo, Cargo Manifests allow short-circuiting an entire Table with a single value (e.g. dependencies table). I do not recall whether cargo uses hand-written deserialization for this.

And yes, another solution is the "do nothing". The challenge with that is this is surprisingly, embarrassingly slow and doesn't meet with the general expectation for serde's performance.

@Kixunil
Copy link
Contributor

Kixunil commented Oct 19, 2021

Ah, OK, I was under the impression that optimizing for cargo was your main goal and seemed easier to modify its deserialization.

@dtolnay
Copy link
Member

dtolnay commented Jan 23, 2022

This previously came up in #748. My preference is that this be explored in a different crate with a Deserialize derive macro geared toward producing high-performance deserialization of untagged enums. The slow part isn't in serde but in the derive-generated code, so a different derive could have different performance characteristics while still using only the same traits, although would likely take longer to compile than serde_derive.

@epage
Copy link
Author

epage commented Jan 23, 2022

Should we at least have some note in the documentation for untagged?

@epage
Copy link
Author

epage commented Aug 31, 2023

As an update on this, serde-untagged was recently created to make hand-implementing the derive easier. In rust-lang/cargo#12574, I forgot about this conversation and brought up the idea of this being supported natively in serde_derive

dtolnay said

Yeah we could potentially add derive support by having the call site tag the type of each of the variants.

pub enum SslVersionConfig {
    #[serde(string)]
    Single(String),
    #[serde(map)]
    Range(SslVersionConfigRange),
}

@dtolnay are you thinking that this might be re-opened or do you still lean on this being experimented on outside of serde_derive first?

@dtolnay
Copy link
Member

dtolnay commented Aug 31, 2023

Currently there is a moratorium on new attributes to keep serde_derive compile time under control.

@epage
Copy link
Author

epage commented Aug 31, 2023

Would a PR for the untagged documentation that discusses the trade offs and calls out serde_untagged be of interest?

@dtolnay
Copy link
Member

dtolnay commented Sep 2, 2023

Yes -- thanks! I am not sure what to say about the serde-untagged yet since I am rewriting erased-serde at the moment; after that I will have a better idea of what the guidance should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants