Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion(Seq): Map to Vector of structs #565

Closed
ClSlaid opened this issue Mar 6, 2023 · 5 comments · Fixed by #566
Closed

Suggestion(Seq): Map to Vector of structs #565

ClSlaid opened this issue Mar 6, 2023 · 5 comments · Fixed by #566

Comments

@ClSlaid
Copy link

ClSlaid commented Mar 6, 2023

For structs like:

struct TupSt<K,V>(K,V);
struct OdnSt<K,V>{
    custom_key: K,
    v: V,
    // #[serde(skip)]
    // serde_skipped_fields
}

And implemented From<(K,V)> and Into<(K,V)>, allow them to be used inner type of Seq.

Expected behaviour

#[serde_as]
pub struct Data {
    #[serde_as(as = "Seq<TupSt<_,_>>")]
    tuple_map: BTreeMap<i32,i32>,
    #[serde_as(as = "Seq<OdnSt<_,_>>")]
    ordinary_map: BTreeMap<i32,i32>,
}

Can be serialized into (or deserialized from):

{
    "tuple_map": [
        {"key": 1, "value": 1},
        {"key": 2, "value": 2},
        {"key": 3, "value": 3},
        {"key": 4, "value": 4}
    ],
    "ordinary_map": [
         {"custom_key": 1, "v": 1},
         {"custom_key": 2, "v": 2},
         {"custom_key": 3, "v": 3},
         {"custom_key": 4, "v": 4},
    ]
}

Motivation

I've been trying to implement reading from Iceberg's AVRO files, and using apache-avro Serde API to read Iceberg encoded Maps will produce an array of small, key-value with naming tuples.

@jonasbb
Copy link
Owner

jonasbb commented Mar 6, 2023

From/Into is not the correct interface here. During serialization you only have references, so this might lead to cloning. The way you wrote the attributes you would get a TupSt<Same, Same>, which is not really a useful struct to have.

The SerializeAs/DeserializeAs traits do cover that use case. With them, this can work (It doesn't currently, because Seq is not flexible enough). It does make quite heavy use of the generics.

#[derive(serde::Serialize, serde::Deserialize)]
struct Custom<K, V> {
    custom_key: K,
    v: V,
}

impl<K, KAs, V, VAs> SerializeAs<(K, V)> for Custom<KAs, VAs>
where
    KAs: SerializeAs<K>,
    VAs: SerializeAs<V>,
{
    fn serialize_as<S>((k, v): &(K, V), serializer: S) -> Result<S::Ok, S::Error>
    where
        S: serde::Serializer,
    {
        (Custom {
            custom_key: SerializeAsWrap::<K, KAs>::new(k),
            v: SerializeAsWrap::<V, VAs>::new(v),
        })
        .serialize(serializer)
    }
}

impl<'de, K, KAs, V, VAs> DeserializeAs<'de, (K, V)> for Custom<KAs, VAs>
where
    KAs: DeserializeAs<'de, K>,
    VAs: DeserializeAs<'de, V>,
{
    fn deserialize_as<D>(deserializer: D) -> Result<(K, V), D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        let c = <Custom<DeserializeAsWrap<K, KAs>, DeserializeAsWrap<V, VAs>>>::deserialize(
            deserializer,
        )?;
        Ok((c.custom_key.into_inner(), c.v.into_inner()))
    }
}

#[serde_as]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
struct SM(#[serde_as(as = "Seq<Custom<_, _>>")] BTreeMap<u32, IpAddr>);

let map: BTreeMap<_, _> = vec![(1, ip), (10, ip), (200, ip2)].into_iter().collect();
is_equal(
    SM(map),
    expect![[r#"
        [
          {
            "custom_key": 1,
            "v": "1.2.3.4"
          },
          {
            "custom_key": 10,
            "v": "1.2.3.4"
          },
          {
            "custom_key": 200,
            "v": "255.255.255.255"
          }
        ]"#]],
);

Is that a good enough solution?

@ClSlaid
Copy link
Author

ClSlaid commented Mar 7, 2023

Sorry this doesn't seem works, I got error:

error[E0277]: the trait bound `serde_with::Seq<KvProxy<Same, Same>>: SerializeAs<_>` is not satisfied
   --> src/model/manifest.rs:245:24
    |
244 | / #[serde_as]
245 | | #[derive(Debug, Clone, Serialize, Deserialize)]
    | |                        ^^^^^^^^-
    | |________________________|_______|
    |                          |       required by a bound introduced by this call
    |                          the trait `SerializeAs<_>` is not implemented for `serde_with::Seq<KvProxy<Same, Same>>`
    |
    = help: the following other types implement trait `SerializeAs<T>`:
              <serde_with::Seq<(KAs, VAs)> as SerializeAs<BTreeMap<K, V>>>
              <serde_with::Seq<(KAs, VAs)> as SerializeAs<HashMap<K, V>>>
    = note: required for `DefaultOnNull<serde_with::Seq<KvProxy<Same, Same>>>` to implement `SerializeAs<_>`
note: required by a bound in `serde_with::ser::<impl As<T>>::serialize`
   --> /home/cl/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_with-2.2.0/src/ser/mod.rs:166:12
    |
166 |         T: SerializeAs<I>,
    |            ^^^^^^^^^^^^^^ required by this bound in `serde_with::ser::<impl As<T>>::serialize`
    = note: this error originates in the derive macro `Serialize` (in Nightly builds, run with -Z macro-backtrace for more info)

Looks like SerializeAs<BTreeMap<K,V>, Seq<KvProxy<KAs, VAs>>> cannot be deduced.

--- code ---

#[derive(Clone, Debug, Serialize, Deserialize)]
/// helper struct to deserialize a custom named struct into K-V pairs in Maps
struct KvProxy<KAs, VAs> {
    /// proxy typed key
    key: KAs,
    /// proxy typed value
    value: VAs,
}

impl<K, V, KAs, VAs> SerializeAs<(K, V)> for KvProxy<KAs, VAs>
where
    KAs: SerializeAs<K>,
    VAs: SerializeAs<V>,
{
    fn serialize_as<S>((k, v): &(K, V), serializer: S) -> Result<S::Ok, S::Error>
    where
        S: serde::Serializer,
    {
        let proxy = KvProxy {
            key: SerializeAsWrap::<K, KAs>::new(k),
            value: SerializeAsWrap::<V, VAs>::new(v),
        };
        proxy.serialize(serializer)
    }
}

impl<'de, K, V, KAs, VAs> DeserializeAs<'de, (K, V)> for KvProxy<KAs, VAs>
where
    KAs: DeserializeAs<'de, K>,
    VAs: DeserializeAs<'de, V>,
{
    fn deserialize_as<D>(deserializer: D) -> Result<(K, V), D::Error>
    where
        D: Deserializer<'de>,
    {
        let proxy = KvProxy::<DeserializeAsWrap<K, KAs>, DeserializeAsWrap<V, VAs>>::deserialize(
            deserializer,
        )?;
        Ok((proxy.key.into_inner(), proxy.value.into_inner()))
    }
}

//...

struct DataFile {
// other fields
    #[serde(default)]
    #[serde_as(as = "DefaultOnNull<Seq<KvProxy<_,_>>>")]
    column_sizes: BTreeMap<i32, i64>,
}

@jonasbb
Copy link
Owner

jonasbb commented Mar 7, 2023

This still requires changes to Seq for it to work. I said as much in my last message

With them, this can work (It doesn't currently, because Seq is not flexible enough).

@ClSlaid
Copy link
Author

ClSlaid commented Mar 7, 2023

This still requires changes to Seq for it to work. I said as much in my last message

With them, this can work (It doesn't currently, because Seq is not flexible enough).

Sorry, my mistake.

bors bot added a commit that referenced this issue Mar 7, 2023
566: Generalize the trait bounds of `Seq` r=jonasbb a=jonasbb

Generalize the trait bounds of `Seq` to allow for more flexible. Instead of forcing the inner type to be a tuple, take anything convertible to/from a tuple. This can be expressed using the

Closes #565

Co-authored-by: Jonas Bushart <[email protected]>
@bors bors bot closed this as completed in 361f0b1 Mar 7, 2023
@jonasbb
Copy link
Owner

jonasbb commented Mar 10, 2023

This doesn't actually work. You get this error message for this code snippet. Unfortunately, it wasn't caught with the tests.

#[serde_as(as = "Seq<(_, Vec<_>)>")]
BTreeMap<usize, Vec<usize>>,
error[E0277]: the trait bound `serde_with::Seq<(serde_with::Same, std::vec::Vec<serde_with::Same>)>: serde_with::SerializeAs<_>` is not satisfied

The problem is that during serialization you never get the key or value directly, you only get references to them. That is ok for the SerializeAs implementation of Seq where the two types are used separately. But you cannot make a SerializeAs which works on a (K, V). At best you can do a (&K, &V) but that doesn't really work well with SerializeAs. The SerializeAs trait unfortunately cannot handle different amounts of references. The type and the type pattern always must match. For the snippet above: After the PR #566 it tried to match the Vec of the type with a &Vec in the type pattern (the thing after "as = "). So there is a mismatch in the number of references and it no longer compiles.


Ultimately, this probably means it cannot be implemented properly. As mentioned before, From/Into does not seem like the correct traits here either, since they would require cloning too.
This means if you want something like Seq which supports other types, you will need to create your own type. The trait implementations are not that complicated.

bors bot added a commit that referenced this issue Mar 10, 2023
573: Add regression test for problems with #565/#566 r=jonasbb a=jonasbb

#572

#565 (comment)

bors r+

Co-authored-by: Jonas Bushart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants