-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending the built-in type set with (a) tagged union(s) isn't supported? #140
Comments
I'm sorry, I'm not sure I understand this issue? What are you trying to do here?
Note that In [1]: import msgspec
In [2]: class Point(msgspec.Struct):
...: x: int
...: y: int
...:
In [3]: msg = b'{"x": 1, "y": 2}'
In [4]: msgspec.json.decode(msg, type=Point) # returns a Point object
Out[4]: Point(x=1, y=2)
In [5]: from typing import Union, Any
In [6]: msgspec.json.decode(msg, type=Union[Point, Any]) # returns a raw dict, same as the default below
Out[6]: {'x': 1, 'y': 2}
In [7]: msgspec.json.decode(msg)
Out[7]: {'x': 1, 'y': 2} |
@jcrist sorry if i'm not being clear. It's your first case if I'm not mistaken: msgspec.json.decode(msg, type=Point) This will not decode, for example, [ins] In [7]: msgspec.json.Decoder(Point).decode(msgspec.json.encode((1, 2)))
---------------------------------------------------------------------------
DecodeError Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 msgspec.json.Decoder(Point).decode(msgspec.json.encode((1, 2)))
DecodeError: Expected `object`, got `array` However, if you do the union including [ins] In [9]: from typing import Any
[nav] In [10]: msgspec.json.Decoder(Point | list).decode(msgspec.json.encode((1, 2)))
Out[10]: [1, 2] hopefully that's clearer in terms of what i was trying to describe 😂 UPDATE: |
Ahh I also see what you mean now, which I didn't anticipate: [nav] In [13]: class Point(msgspec.Struct, tag=True):
...: x: int
...: y: int
[nav] In [14]: msgspec.json.Decoder(Point | Any).decode(msgspec.json.encode(Point(1, 2)))
Out[14]: {'type': 'Point', 'x': 1, 'y': 2}
[nav] In [15]: msgspec.json.Decoder(Point).decode(msgspec.json.encode(Point(1, 2)))
Out[15]: Point(x=1, y=2) That actually is super non-useful to me; i would expect the Like, does a struct always have to be the outer most decoding type? I was actually going to create another issue saying something like embedded structs don't decode (like say a |
More details of what I'm doing:
Maybe I'm having a dreadful misconception about all this 😂 |
Ahh so I know why I see why this feels odd, it seems the limitation is really due to the use of from typing import Union
from msgspec import Struct
class Point(Struct, tag=True):
x: float
arr: list[int]
msgspec.json.Decoder(
Union[Point] | list
).decode(msgspec.json.encode(Point(1, [2]))) Works just fine, but if you try What's more odd to me is that you can support |
Any
Ahh so one way to maybe do what I'd like is to use This python code i think replicates what I thought was going to be the default behavior with tagged union structs: from contextlib import contextmanager as cm
from typing import Union, Any, Optional
from msgspec import Struct, Raw
from msgspec.msgpack import Decoder, Encoder
class Header(Struct, tag=True):
uid: str
msgtype: Optional[str] = None
class Msg(Struct, tag=True):
header: Header
payload: Raw
class Point(Struct, tag=True):
x: float
y: float
_root_dec = Decoder(Msg)
_root_enc = Encoder()
# sub-decoders for retreiving embedded
# payload data and decoding to a sender
# side defined (struct) type.
_decs: dict[Optional[str], Decoder] = {
None: Decoder(Any),
}
@cm
def init(msg_subtypes: list[list[Struct]]):
for types in msg_subtypes:
first = types[0]
# register using the default tag_field of "type"
# which seems to map to the class "name".
tags = [first.__name__]
# create a tagged union decoder for this type set
type_union = Union[first]
for typ in types[1:]:
type_union |= typ
tags.append(typ.__name__)
dec = Decoder(type_union)
# register all tags for this union sub-decoder
for tag in tags:
_decs[tag] = dec
try:
yield dec
finally:
for tag in tags:
_decs.pop(tag)
def decmsg(msg: Msg) -> Any:
msg = _root_dec.decode(msg)
tag_field = msg.header.msgtype
dec = _decs[tag_field]
return dec.decode(msg.payload)
def encmsg(payload: Any) -> Msg:
tag_field = None
plbytes = _root_enc.encode(payload)
if b'type' in plbytes:
assert isinstance(payload, Struct)
tag_field = type(payload).__name__
payload = Raw(plbytes)
msg = Msg(Header('mymsg', tag_field), payload)
return _root_enc.encode(msg)
if __name__ == '__main__':
with init([[Point]]):
# arbitrary struct payload case
send = Point(0, 1)
rx = decmsg(encmsg(send))
assert send == rx
# arbitrary dict payload case
send = {'x': 0, 'y': 1}
rx = decmsg(encmsg(send))
assert send == rx I guess for me the more ideal default would be that the/some standard decoder is capable of handling tagged union structs and the built-in type set (which I've probably emphasized ad nauseam at this point 😂). So for example I could still do my But, as a (short term) solution I guess the above could be a way to get what I want? The even more ideal case for me would be that you could embed tagged structs inside other std container data types ( |
Heh, actually the more I think about this context-oriented msg type decoding policy, the more i like it. This kind of thing would play super well with structured concurrency. msg: Msg
with open_msg_context(
types=[IOTStatustMsg, CmdControlMsg],
capability_uuid='sd0-a98sdf-9a0ssdf'
) as decoder:
# this will simply log an error on non-enabled payload msg types
payload = decoder.decode(msg) |
Sorry, there's a lot above, I'll try to respond to what I think are your current issues.
This does work. All types are fully composable, there is no limitation in msgspec requiring structs be at the top level, or that structs can't be subtypes in containers.
Side note - when posting comments referring to errors, it's helpful to include the full traceback so we're all on the same page. Right now I'm left guessing what you're seeing raising the type error. First, there's no difference in encoding/decoding support between Unions of tagged structs and structs in general. Also, import msgspec
from typing import Union
class Point(msgspec.Struct):
x: int
y: int
for typ in [Union[Point, list, set], Union[Point, dict], Union[int, list, dict]]:
print(f"Trying a decoder for {typ}...")
try:
msgspec.json.Decoder(typ)
except TypeError as exc:
print(f" Failed: {exc}")
else:
print(" Succeeded") This outputs:
Note that the error is coming from creating the In both cases the issue is that the union contains mutiple Python types that all map to the same JSON type with no way to determine which one to decode into. Both
This is not possible, for the same reason as presented above. import msgspec
from typing import Any
class Point(msgspec.Struct):
x: int
y: int
dec = msgspec.json.Decoder(Point | Any) # right now this works, but ignores the struct completely since `Any` is present Given a message like
Knowing nothing about what you're actually trying to achieve here, why not just define an extra import msgspec
from typing import Any, Union
class Msg(msgspec.Struct, tag=True):
pass
class Msg1(Msg):
x: int
y: int
class Msg2(Msg):
a: int
b: int
class Custom(Msg):
obj: Any
enc = msgspec.json.Encoder()
dec = msgspec.json.Decoder(Union[Msg1, Msg2, Custom])
def encode(obj: Any) -> bytes:
if not isinstance(obj, Msg):
obj = Custom(obj)
return enc.encode(obj)
def decode(buf: bytes) -> Any:
msg = dec.decode(buf)
if isinstance(msg, Custom):
return msg.obj
return msg
buf_msg1 = encode(Msg1(1, 2))
print(buf_msg1)
print(decode(buf_msg1))
buf_custom = encode(["my", "custom", "message"])
print(buf_custom)
print(decode(buf_custom)) Output:
Note that the builtin message types ( MsgTypes = Union[Msg1, Msg2, Custom]
# Decoder expects either one of the above msg types, or a list of the above msg types
decoder = msgspec.json.Decoder(Union[MsgTypes, list[MsgTypes]]) |
In the future, large issue dumps like this that are rapidly updated are hard to follow as a maintainer. If you expect a concise and understanding response from me, please put in the effort to organize and present your thoughts in a cohesive manner. While the examples presented in this blogpost aren't 100% relevant for this repository, the general sentiment of "users should provide clear, concise, and reproducible examples of what their issue is" is relevant. |
My apologies, I didn't know the root issue that I was seeing at outset, it's why I've tried to update things as I've discovered both using the lib and seeing what's possible through tinkering. Also a lot of this is just thinking out loud as a new user, my apologies if that's noisy, hopefully someone else will find it useful if they run into a similar issue. The main issue I was confused by was mostly this (and i can move this to the top for reference if you want):
I do think making some examples of the case I'm describing would be super handy to
Totally, originally I thought this was a simple question and now I realize it's a lot more involved; I entirely mis-attributed the real problem to something entirely different, hence my original issue title being ill-informed 😂 In summary, my main issue was more or less addressed in your answer here, which is what I also concluded:
In other words you can't pass in an arbitrary:
So I think this is pretty similar to what i presented in the embedded So really, I guess what I am after now is some way to dynamically describe such schemas, maybe even during a struct-msg flow. Again my apologies about this being noisy, not well formed, ill-attributed; I really just didn't know what the real problem was. |
@jcrist I updated the description to include the summary of everything, hopefully that makes all the noise, back and forth, more useful to onlookers 😎 To just finally summarize and answer all questions you left open for me:
Yes, this does work as long if you specify the schema ahead of time, but even still it's not clear to me how you would use some "top level" decoder to decode non-static-schema embedded
Agreed, I mis-attributed the error:
Agreed, but with the case of tagged
Ok so this sounds like what I'm asking for is supposed to work right?
But then you say it isn't and give an example with a non-tagged-
Yes, this is more or less what I concluded except using
So i guess the problem here would be decode aliasing due to a tag field collision? |
The greasy details are strewn throughout a `msgspec` issue: jcrist/msgspec#140 and specifically this code was mostly written as part of POC example in this comment: jcrist/msgspec#140 (comment) This work obviously pertains to our desire and prep for typed messaging and capabilities aware msg-oriented-protocols in #196, caps sec nods in I added a "wants to have" method to `Context` showing how I think we could offer a pretty neat msg-type-set-as-capability-for-protocol system.
Just as one final proposal, couldn't we just presume if you find a
And then the user will know either the serialized object is malformed or there is a collision they have to work around by changing the |
The greasy details are strewn throughout a `msgspec` issue: jcrist/msgspec#140 and specifically this code was mostly written as part of POC example in this comment: jcrist/msgspec#140 (comment) This work obviously pertains to our desire and prep for typed messaging and capabilities aware msg-oriented-protocols in #196, caps sec nods in I added a "wants to have" method to `Context` showing how I think we could offer a pretty neat msg-type-set-as-capability-for-protocol system.
My use case: handling an IPC stream of arbitrary object messages, specifically with
msgpack
. I desire to useStruct
s for custom object serializations that can be passed between memory boundaries.My presumptions were originally:
msgpack
bytes and taggedmsgspec.Structs
{"type": "CustomStruct", "field0": "blah"}
) and automatically know that the embeddedmsgpack
object is one of our custom tagged structs and should be decoded as aCustomStruct
.Conclusions
Based on below thread:
Union
Decoder(Any | Struct)
won't work even for top levelStruct
s in themsgpack
frameThis took me a (little) while to figure out because the docs didn't have an example for this use case, but if you want to create a
Decoder
that will handle aUnion
of tagged structs and it will still also process the standard built-in type set, you need to specify the subset of the std types that don't conflict withStruct
as per @jcrist's comment in the section starting with:So
Decoder(Any | MyStructType)
will not work.I had to dig into the source code to figure this out and it's probably worth documenting this case for users?
Alternative solutions:
It seems there is no built-in way to handle an arbitrary serialization encode-stream that you wish to decode into the default set as well as be able to decode embedded tagged
Struct
types.But, you can do a couple other things inside custom codec routines to try and accomplish this:
create a custom boxed
Any
struct type, as per @jcrist's comment under the section starting with:consider creating a top-level boxing
Msg
type and then usingmsgspec.Raw
and a custom decoder table to decode payloadmsgpack
data as in my example belowThe text was updated successfully, but these errors were encountered: