Expectations of to_dict methods in returning JSON-typed values, regarding geometry #1047

philvarner · 2023-03-16T15:40:17Z

Currently, the to_dict method returns a "shallow" dict representation of the object. For example, if the Item has a geometry that has a coordinates value that is a nested tuple of numbers , the resulting dict's geometry key points to that same tuple. However, I think a more common assumption is that to_dict returns a JSON-typed representation of the Item, e.g., with geometry being nested arrays and numbers, regardless of what the internal representation is. Maybe there could be a flag on to_dict to determine this behavior, or another method like to_geojson_dict?

mypy checking doesn't catch this because geometry is typed as Optional[Dict[str,Any]], so it can't catch it. (Union types would help here, allowing the stricter Optional[Dict[str,str|list[list|float]])

The current docstring for to_dict is Generate a dictionary representing the JSON of this serialized object. , which is wrong wrt to the current behavior if non-JSON-compliant tuples are used.

The context behind this is we had an Item that (unintentionally) had tuple geometry coordinate values, and used .to_dict() to create a value to pass to stac-validator validation, expecting that we were getting a JSON-style dict. The error message was:

Exception: STAC Item validation failed. Error: {‘type’: ‘Polygon’, ‘coordinates’: (((-91.00013888888888, 43.00013888888889), (-91.00013888888888, 44.00013888888889), (-92.00013888888888, 44.00013888888889), (-92.00013888888888, 43.00013888888889), (-91.00013888888888, 43.00013888888889)),)} is not valid under any of the given schemas. Error is in geometry .

because the tuple geometry doesn't match the schema of nested arrays of numbers, and then gets str serialized to a value of nested tuples of numbers.

Relatedly, __geo_interface__ will return an invalid value if tuple coordinates are used.

The text was updated successfully, but these errors were encountered:

gadomski · 2023-03-16T16:48:49Z

because the tuple geometry doesn't match the schema of nested arrays of numbers, and then gets str serialized to a value of nested tuples of numbers.

Curious here, shouldn't the tool be doing a json.dumps rather than a str conversion?

philvarner · 2023-03-20T14:30:09Z

Presumably yes. I also think that a jsonschema (openapi schema?) validator that takes a Dict should treat tuples and lists as equivalent to arrays, but here we are.

gadomski · 2023-03-21T11:59:33Z

Thinking about this more, enforcing "JSON-correct" types on all to_dict methods would be a relatively heavy lift. I count 30 distinct to_dict definitions, each of which would need to get some sort of recursive "convert tuples to lists" treatment (aside: are there other "JSONify" conversions that we should be tracking as well?). Not impossible, but for sure adds complexity and runtime cost to a commonly-used method.

An alternative might be:

Add a utility funtion pystac.utils.to_json_dict that does a blanket conversion (maybe just with a json.loads(json.dumps(d))).
Update the documentation for each to_dict to point to to_json_dict as a way to ensure all your tuples are lists, etc

I'm a bit hesitant to add extra overhead to to_dict for this use case, because (per my previous comment) well-behaved consumers should be able to handle non-JSON-y dicts.

As another aside, I think we're probably working too hard here -- serialization libraries should take care of this stuff for us. Perhaps we should consider converting the base types to use a serialization library for v2? @TomAugspurger pointed me to https://github.com/jcrist/msgspec as a faster-than-pydantic option.

Fatal1ty · 2023-03-21T12:45:15Z

Hi @gadomski

You may also consider an alternative https://github.com/Fatal1ty/mashumaro which is mature, fast, customizable and provides to_dict and from_dict methods. As an author of this library I'd be happy to help.

philvarner · 2023-03-21T12:51:51Z

Maybe it would be enough just having some documentation that it returns a dictionary representation, not necessarily one that exactly matches the GeoJSON structure based on what objects it was constructed with?

jsignell · 2023-03-21T14:29:45Z

As another aside, I think we're probably working too hard here -- serialization libraries should take care of this stuff for us. Perhaps we should consider converting the base types to use a serialization library for v2? @TomAugspurger pointed me to https://github.com/jcrist/msgspec as a faster-than-pydantic option.

I really like this idea of outsourcing serialization/deserialization to another library and letting types provide more of a guarantee and validation step.

jcrist · 2023-03-21T14:54:17Z

Hi - msgspec author here. I'm not familiar with STAC, but if you decide to try out msgspec, you might find this GeoJSON example in our docs useful (it does (de)serialization and validation). I'm also willing to help as needed. No pressure to use msgspec of course, there are lots of good and useful libraries in this space.

gadomski added this to the 1.7.1 milestone Mar 16, 2023

gadomski self-assigned this Mar 16, 2023

gadomski mentioned this issue Mar 20, 2023

Extensions should coerce values to correct type #1044

Closed

gadomski modified the milestones: 1.7.1, 1.8 Mar 21, 2023

gadomski added enhancement documentation Issues related to PySTAC documentation labels Mar 21, 2023

gadomski removed the enhancement label Mar 31, 2023

gadomski mentioned this issue Mar 31, 2023

Remove text around to_dict and JSON #1074

Merged

6 tasks

gadomski mentioned this issue Apr 11, 2023

Use a serialization library #1092

Open

gadomski closed this as completed in #1074 Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expectations of to_dict methods in returning JSON-typed values, regarding geometry #1047

Expectations of to_dict methods in returning JSON-typed values, regarding geometry #1047

philvarner commented Mar 16, 2023

gadomski commented Mar 16, 2023

philvarner commented Mar 20, 2023

gadomski commented Mar 21, 2023

Fatal1ty commented Mar 21, 2023

philvarner commented Mar 21, 2023

jsignell commented Mar 21, 2023

jcrist commented Mar 21, 2023

Expectations of to_dict methods in returning JSON-typed values, regarding geometry #1047

Expectations of to_dict methods in returning JSON-typed values, regarding geometry #1047

Comments

philvarner commented Mar 16, 2023

gadomski commented Mar 16, 2023

philvarner commented Mar 20, 2023

gadomski commented Mar 21, 2023

Fatal1ty commented Mar 21, 2023

philvarner commented Mar 21, 2023

jsignell commented Mar 21, 2023

jcrist commented Mar 21, 2023