-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expectations of to_dict methods in returning JSON-typed values, regarding geometry #1047
Comments
Curious here, shouldn't the tool be doing a |
Presumably yes. I also think that a jsonschema (openapi schema?) validator that takes a Dict should treat tuples and lists as equivalent to arrays, but here we are. |
Thinking about this more, enforcing "JSON-correct" types on all An alternative might be:
I'm a bit hesitant to add extra overhead to As another aside, I think we're probably working too hard here -- serialization libraries should take care of this stuff for us. Perhaps we should consider converting the base types to use a serialization library for v2? @TomAugspurger pointed me to https://github.com/jcrist/msgspec as a faster-than-pydantic option. |
Hi @gadomski You may also consider an alternative https://github.com/Fatal1ty/mashumaro which is mature, fast, customizable and provides |
Maybe it would be enough just having some documentation that it returns a dictionary representation, not necessarily one that exactly matches the GeoJSON structure based on what objects it was constructed with? |
I really like this idea of outsourcing serialization/deserialization to another library and letting types provide more of a guarantee and validation step. |
Hi - msgspec author here. I'm not familiar with STAC, but if you decide to try out msgspec, you might find this GeoJSON example in our docs useful (it does (de)serialization and validation). I'm also willing to help as needed. No pressure to use msgspec of course, there are lots of good and useful libraries in this space. |
Currently, the to_dict method returns a "shallow" dict representation of the object. For example, if the Item has a geometry that has a
coordinates
value that is a nested tuple of numbers , the resulting dict'sgeometry
key points to that same tuple. However, I think a more common assumption is that to_dict returns a JSON-typed representation of the Item, e.g., with geometry being nested arrays and numbers, regardless of what the internal representation is. Maybe there could be a flag on to_dict to determine this behavior, or another method liketo_geojson_dict
?mypy checking doesn't catch this because geometry is typed as
Optional[Dict[str,Any]]
, so it can't catch it. (Union types would help here, allowing the stricterOptional[Dict[str,str|list[list|float]]
)The current docstring for to_dict is
Generate a dictionary representing the JSON of this serialized object.
, which is wrong wrt to the current behavior if non-JSON-compliant tuples are used.The context behind this is we had an Item that (unintentionally) had tuple geometry coordinate values, and used .to_dict() to create a value to pass to stac-validator validation, expecting that we were getting a JSON-style dict. The error message was:
because the tuple geometry doesn't match the schema of nested arrays of numbers, and then gets
str
serialized to a value of nested tuples of numbers.Relatedly,
__geo_interface__
will return an invalid value if tuple coordinates are used.The text was updated successfully, but these errors were encountered: