Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanned values don't play well with dotted notation: getting "invalid type: string [..], expected a borrowed string" #798

Open
yannham opened this issue Oct 2, 2024 · 6 comments
Labels
A-serde Area: Serde integration C-enhancement Category: Raise on the bar on expectations

Comments

@yannham
Copy link

yannham commented Oct 2, 2024

I'm using Spanned to deserialize TOML to Nickel (a configuration language) while preserving spans as much as possible, as Nickel adds validation capabilities and we'd like to link back validation errors to the precise piece of TOML data that failed.

To do so, we define a bespoke datastructure that is more or less like a TOML value but with Spanned appropriately sprinkled, and write a custom deserializer using serde_untagged. You can find the type definition and the deserializer here: https://github.com/tweag/nickel/blob/927ee23993747b7851e51bcfe3eb3e685ba4ebb1/core/src/serialize.rs#L491-L582

However, when deserializing the following file:

[foo.bar]
baz = "qux"

This gives the following surprising error:

error: toml parse error: TOML parse error at line 1, column 1
  |
1 | [foo.bar]
  | ^
invalid type: string "bar", expected a borrowed string



in `foo`

It's surprising because we never try to deserialize a borrowed string: all strings, both as terminal values and keys, are owned in SpannedValue (NickelString is a simple wrapper around String). Also, any TOML file without dotted notation is parsed fine. After some experimentation, it seems that this happens when trying to deserialize the value (and not the key) of the outer map, that is the value associated to foo.

I suspect that there are some shenanigans around getting the location of the nested map {bar = {baz = "qux"}}. It seems that the spanned deserializer of toml-rs tries to deserialize markers as borrowed string (

impl<'de, T> serde::de::Deserialize<'de> for Spanned<T>
) but it also expects a very precise structure, so I'm not entirely sure what's going on here.

The issue is that I don't see any easy work-around: once we've tried to deserialize the content of a map as spanned (which is entirely legit for files that don't have the dotted notation), there doesn't seem to be anyway to retry the same deserialization at a different type.

@yannham
Copy link
Author

yannham commented Oct 2, 2024

(For the record, the explicitly nested version foo = {bar = {baz = "qux"}} parses correctly, so it really seems to be around dotted field syntax)

@epage
Copy link
Member

epage commented Oct 2, 2024

Do you have an isolated reproduction case for this?

@yannham
Copy link
Author

yannham commented Oct 2, 2024

I can try to make one, yes.

@jneem
Copy link

jneem commented Oct 21, 2024

Here's a smallish reproduction, which crashes with "invalid type: string "bar", expected a borrowed string". Removing the Spanned on line 16 makes it succeed.

#!/usr/bin/env -S cargo -Zscript
---cargo
[dependencies]
serde = { version = "1", features = ["derive"] }
serde-untagged = "0.1.6"
toml = "0.8.19"
---

use serde_untagged::UntaggedEnumVisitor;
use serde::de::{Deserializer, MapAccess};
use toml::Spanned;

#[derive(Debug)]
pub enum SpannedValue {
    String(String),
    Map(Vec<(String, Spanned<SpannedValue>)>)
}

impl<'de> serde::Deserialize<'de> for SpannedValue {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let data = UntaggedEnumVisitor::new()
            .string(|str| Ok(SpannedValue::String(str.into())))
            .map(|mut map| {
                let mut result = Vec::new();

                while let Some((k, v)) = map.next_entry()? {
                    result.push((k, v));
                }

                Ok(SpannedValue::Map(result))
            })
            .deserialize(deserializer)?;

        Ok(data)
    }
}

const INPUT: &str = r#"
[foo.bar]
baz = "qux"
"#;

fn main() {
    let val: SpannedValue = toml::from_str(INPUT).unwrap();
    dbg!(val);
}

@epage epage added C-enhancement Category: Raise on the bar on expectations A-serde Area: Serde integration labels Oct 24, 2024
@epage
Copy link
Member

epage commented Oct 24, 2024

Thanks for the reproduction case!

We have tests for Spanned being used in arrays, keys, and values, but not in recursive data structures like this. It appears that untagged enums, whether using serde_untagged or using #[serde(untagged)] isn't supported at this time.

serde is a bit of a mess to dig into to support cases like this. I personally will likely not get to this for a bit but would be happy with any help on this.

@yannham
Copy link
Author

yannham commented Oct 30, 2024

In the meantime, if anyone has the same issue and is looking for a way out, our current work-around is to use the lower-level toml-edit crate, which works: tweag/nickel#2074.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-serde Area: Serde integration C-enhancement Category: Raise on the bar on expectations
Projects
None yet
Development

No branches or pull requests

3 participants