Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flattening Structs #315

Open
davfsa opened this issue Feb 10, 2023 · 11 comments
Open

Flattening Structs #315

davfsa opened this issue Feb 10, 2023 · 11 comments

Comments

@davfsa
Copy link
Contributor

davfsa commented Feb 10, 2023

Description

This is more of a question than a feature request, but could turn into one.

One of my uses when it comes to deserialising something similar to:

{
    "version": 1,
    "data": {
        "options": [{}, {}]
    }
}

into

class Option(msgspec.Struct):
    ...

class Foo(msgspec.Struct):
    version: int
    options: list[Option]

I have scoured through the documentation and can't find an easy way to do this. The way I have managed currently is by deserialising the Struct to a dict and then parsing the JSON as a dict (using attrs), but would like to move away from it to reduce the amount of code to maintain (the reason I have been looking at msgspec, appart from the obvious speed gains!)

Thanks!

@luochen1990
Copy link

Is msgspec.json.decode(msg, type=Foo) what you want?

@davfsa
Copy link
Contributor Author

davfsa commented Feb 13, 2023

Is msgspec.json.decode(msg, type=Foo) what you want?

Yeah, would be nice to be able to do msgspec.json.decode(msg, type=Foo) and it be aware that options can be found inside the data field and extracted off there

@jcrist
Copy link
Owner

jcrist commented Feb 14, 2023

Hi! Support for flattening structs would be hard. It's doable, but not easily - there's a bunch of edge cases that can pop up as features are mixed together. I'd be happy to write up what makes this hard if you're interested, but in short I don't have plans to add this feature.

That said, I'm curious about your use case. Why do you want to flatten the runtime structure here? Why not write out the full structure of Foo matching how it's serialized?

In [8]: class Option(msgspec.Struct):
   ...:     x: int  # made up some fields for here
   ...:

In [9]: class Data(msgspec.Struct):
   ...:     options: list[Option]
   ...:

In [10]: class Foo(msgspec.Struct):
    ...:     version: int
    ...:     data: Data                        
    ...:

In [11]: msg = """                             
    ...: {                                     
    ...:     "version": 1,
    ...:     "data": {                         
    ...:         "options": [{"x": 1}, {"x": 2}]
    ...:     }                                 
    ...: }                                     
    ...: """                                   

In [12]: msgspec.json.decode(msg, type=Foo)
Out[12]: Foo(version=1, data=Data(options=[Option(x=1), Option(x=2)]))

@davfsa
Copy link
Contributor Author

davfsa commented Feb 14, 2023

Thanks for the answer!

The reason for this is mostly because of an opinionated approach to an API wrapper I am working on. The data field for this payload feels a bit cluncky and useless, as it doesn't really contain much, but just makes things harder to access, specially due to Options containing more Options:

obj.data.options.data.options
# vs
obj.options.options

It was a choice we went with when implementing this part of the API for simplicity sake.

When I opened the issue my idea for this was something along the lines of:

class Foo(msgspec.Struct):
    version: int
    option: Option = msgspec.field(location="data__option")

a little side effect here would also be allowing a syntax to rename attributes


For some quick dump of info because this idea has been coming and going in my head, the syntax would go something like this:

  • data__option would signify option=payload["data"]["option"]
  • data[0] would be option = payload["data"][0]
  • (And a combination of both) data[0]__option would be option = payload["data"][0]["option"]

Which I believe should cover all usecases for this.

A tricky case I also thought about would be:

{
    "data": {
        "option": {}
    },
    "data__": {
        "option": {}
    }
}
class Obj(msgspec.Struct):
    data: Data
    data__: MoreData
    data_option: Option = msgspec.field(location="data__option")
    more_data_option: Option = msgspec.field(location="data____option")
    # or (which would be equivalent)
    some_data: Data = msgspec.field(location="data")
    some_more_data: Data = msgspec.field(location="data__")
    data_option: Option = msgspec.field(location="data__option")
    more_data_option: Option = msgspec.field(location="data____option")

In this case, the data fields will properly resolve and the distinction between flattening the stuct or not will be dictated based on whether the key exists or not, taking priority the first one.

For extreme cases that I don't believe can really be found in the wild, an extra arg to force a location to be treated as a flattenener could be added too.


I understand this could be a lot more work than is actually usefully, but I just wanted to dump the idea. I unfortunately don't have the C skills to try and implemt this myself, but would love to try.

Also interested in the limitations that you mentioned, as they might render my whole idea useless, as lack information on the internals of msgspec 😅

@Rogdham
Copy link

Rogdham commented Mar 25, 2023

The rename mechanism could be probably used for this (from the point of view of the user of the lib), something like this:

class Option(msgspec.Struct):
     x: int

foo_names= {
  "options": ["data", "options"],  # for example, TBD
}

class Foo(msgspec.Struct, rename=foo_names):
     version: int
     options: list[Option]

AFAIK Pydantic will support flattening in V2:

class Foo(BaseModel):
    bar: str = Field(aliases=[['baz', 2, 'qux']])

They have probably thought about edge cases, so it might be worth looking into as a good starting point.

@ml31415
Copy link

ml31415 commented Jun 1, 2023

@davfsa If it's only about making the parsed objects more usable, what about simply:

class Foo(msgspec.Struct):
    version: int
    data: ...
    
    @property
    def options(self):
        return self.data.options

You might even hide the original data field, having it renamed to e.g. _data.

@mjkanji
Copy link

mjkanji commented Aug 18, 2023

I have a similar use case. This is what the data looks like:

{
    "username": "jcrist",
    "attributes": [
        {"Name": "first_name", "Value": "Jim"},
        {"Name": "last_name", "Value": "Crist"},
        ...
    ]
    ...
}

I'd like to model it such that the attribute keys (like first_name) and the corresponding Values are attributes of the Struct and also type validated. That is,

class User(Struct):
    username: str
    first_name: str
    last_name: str

msgspec.json.decode(data, type=User)
# > MyUser(username='jcrist', first_name='Jim', last_name='Crist')

Even if I created a new Attribute struct and set attributes: list[Attribute], there's no (obvious) way to validate the type of the Value based on what the Name is.

(PS: Not sure if this is the right issue to ask this; it seemed very similar to mine, but also slightly different because there's a level of...indirection(?), where the relevant key-value pairs are 'hidden' under the Name and Value keys of the list of dicts. Let me know if I should create a new issue instead.)


For reference, I found a solution to a similar problem using Pydantic's @root_validator(pre=True) decorator. [Stack Overflow comment, example code]

@cutecutecat
Copy link

Also met a similar case, I think these schema of data would happens frequently at a GraphQL API.

{
  "data":{
    "issues":{
      "nodes":[
        {
          "id":"12345"
        },
        {
          "id":"67890"
        }
      ]
    }
  }
}

Thanks for @ml31415 that #315 (comment) helps a lot, but I still need to define 4 one-line-structs to express it. I would be really grateful if there could be a native support.

@ml31415
Copy link

ml31415 commented Nov 30, 2023

@mjkanji

What you could do is create tagged attribute objects. Then msgspec can distinguish them and you can add some verification.

class Attribute(msgspec.Struct, tag_field="Name")
    pass
    
class Firstname(Attribute, tag="first_name"):
    Value: str  # add validation for first_name here as required

class Lastname(Attribute, tag="last_name"):
    Value: str  # separate validation for last_name goes

Attribute = Firstname | Lastname

class User(msgspec.Struct):
    username: str
    attributes: list[Attribute]

Otherwise, if it's just about making the object easier to access, instead of modifying the data, just again use property. Roughly like that:

class User(msgspec.Struct):
    username: str
    attributes: list[Attribute]

    def _attribute_dict(self):
        return {attr.Name.lower(): attr.Value for attr in self.attributes}
        
    def __getattr__(self, attr):
        try:
            return self._attribute_dict()[attr]
        except KeyError:
            raise AttributeError(attr)

@ml31415
Copy link

ml31415 commented Nov 30, 2023

Hi @cutecutecat ,
if you don't care about further fields of "data" and "issues", just go with ordinary dictionaries and happily nest the type definition:

from typing import Literal

class Node(msgspec.Struct):
    id: int
    
class Container(msgspec.Struct):
    data: dict[Literal["issues"], dict[Literal["nodes"], list[Node]]]

    @property
    def nodes(self):
        return self.data["issues"]["nodes"]
>>> container = msgspec.json.decode(data, type=Container, strict=False)
>>> container.nodes
[Node(id=12345), Node(id=67890)]

@notpushkin
Copy link

notpushkin commented Sep 2, 2024

I'm currently working on a Docker API client and flattening would be really useful.

For example, we have a struct like this:

class ServiceSpec(Struct):
    name: str
    labels: dict[str, str]
    image: str
    environment: list[str]

And Docker expects something like this:

{
  "Name": "web",
  "Labels": {"com.docker.example": "string"},
  "TaskTemplate": {
    "ContainerSpec": {
      "Image": "nginx:alpine",
      "Env": ["SECRET_KEY=123"]
    }
  }
}

To achieve this, I currently use the following hack:

Code
class DockerContainerSpec(Struct):
    image: str = field(name="Image")
    environment: list[str] = field(name="Env")

    @classmethod
    def from_spec(cls, spec: ServiceSpec):
        obj = msgspec.convert(spec, cls, from_attributes=True)
        return obj

class DockerTaskTemplate(Struct):
    _container_spec: DockerContainerSpec = field(default=None, name="ContainerSpec")

    @classmethod
    def from_spec(cls, spec: ServiceSpec):
        obj = msgspec.convert(spec, cls, from_attributes=True)
        obj._container_spec = DockerContainerSpec.from_spec(spec)
        return obj

class DockerService(Struct):
    name: str = field(name="Name")
    labels: dict[str, str] = field(name="Labels")
    _task_template: DockerTaskTemplate = field(default=None, name="TaskTemplate")

    @classmethod
    def from_spec(cls, spec: ServiceSpec):
        obj = msgspec.convert(spec, cls, from_attributes=True)
        obj._task_template = DockerTaskTemplate.from_spec(spec)
        return obj

This is a bit clumsy, but works out fairly well:

>>> spec = ServiceSpec(
...     name="app",
...     labels={},
...     image="nginx:alpine",
...     environment=["HELLO=world"]
... )
>>> msgspec.json.encode(DockerService.from_spec(spec))
b'{"Name":"app","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine","Env":["HELLO=world"]}}}'

UPD: this can be refactored as a wrapper for msgspec.convert: https://gist.github.com/notpushkin/3639f45acd2aa053b9d2416375135045
(see example at the bottom)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants