Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments on Part 3 #575

Closed
philvarner opened this issue May 7, 2021 · 10 comments · Fixed by #616
Closed

Comments on Part 3 #575

philvarner opened this issue May 7, 2021 · 10 comments · Fixed by #616
Assignees
Labels
CQL Grammar CQL2 Editorial Editorial changes and document updates outside of requirements classes Scope

Comments

@philvarner
Copy link
Contributor

philvarner commented May 7, 2021

I'm working on incorporating Part 3 into STAC API, so these comments come from that work. Once I have a summary written up of how CQL can be used by STAC, I'm going to get more feedback from the STAC community and will send that to you.

Overall, I think this looks great. I'm particularly excited about the queryables concept -- I've seen this done in ad-hoc or difficult-to-use ways in the past, and this seems like a very clean way of defining this. I think it's also useful for providing a well-defined "translation layer" so that an implementer can easily define what the variable term names are independent of how they're actually stored in the user's datastore.

  • I'd like to see a tighter description on the behavior when a variable that is not defined as a queryable is used in an expression. In section 6.2 Queryables, I think adding "only" would help to make it clear that only queryables are valid in expressions, and not arbitrary properties that you happen to know (out-of-band) exist on a feature, e.g., "to determine the only property names that may be used to construct...". I'd like to see the language be more strict, e.g., "Only variables that are defined as a queryable may be used in an expression. If an expression contains a variable that is not
    defined as a queryable, the server must respond with a 400 Bad Request status code". Without this explicit description, I could imagine a fragile implementation that did variable re-writing only for the queryables, but passed others through into a database query, that may or may not work consistently.
    I see now that there is a mention of this in 6.4 R5 B, but only as it applies to cross-collection queries -- so, the error behavior should be made more explicit for per-collection query also.
  • 6.6 filter-crs needs a reference to where to find these URIs and a examples in that section. The only example I found for a valid CRS URI is in 8.6 Example 10. One example I'd like to see is what the value for MODIS Sinusoidal is b/c that doesn't have an EPSG code. Also, which one of http://www.opengis.net/def/crs/OGC/1.3/CRS84 or http://www.opengis.net/def/crs/EPSG/0/4326 is preferred (if there is a preference).
  • In Queryables Example 1, add a title to the "geometry" field as is done with all of the other ones. Adding "description" values to all of these would also be
    good, since it would be useful to have in a UI displaying these as query options.
  • The requirement that properties be allowed on the left side of binaryComparisonPredicate will make implementation more challenging with some datastores. Elasticsearch doesn't allow fields (propertyName) on the left side of expressions, so the slower "script query" feature must be used instead. At first I thought this was going to be a problem, but now I think that implementing Simple CQL is going to be complex enough already, so having one more special case when binaryComparisonPredicate has a propertyName for the first scalarExpression isn't going to be much harder to implement.
  • In 8.3, a mention that the spatial geometry (text) is WKT, the spatial geometry (JSON) is GeoJSON Geometry or envelope, and the temporal is using RFC 3339
  • There should be an explicit mention in 8.7 that the format of the datetime is derived from RFC 3339 and a reference to the format being used in Part 1.
  • In 8.6, it says "CQL uses Well-Known-Text (WKT)" -- but this should say that it use WKT or GeoJSON Geometry or envelope. It would be useful to have that these are the encodings at the beginning of 8.6, rather than only as a lead-in as to why filter-crs is necessary. Also, there's an extra dash between Known and Text.
  • For servers that support filter-crs, it would be good to have a queryables-like mechanism for advertising which CRSes are supported.

Typo / Wordsmithing:

  • In section 1, the last sentence says "How the set of properties or keys that can be used..." -- I am unclear what what a property or key is here, since it seems to imply there are two different things that can be used in expressions. Maybe it should just say "properties"?
  • in section 2, there's a reference to the "Arithmetic operators" conformance class that should be "Arithmetic expressions"
  • also next to Arithmetic operators, there are backslashes before the parentheses \(+, -, *, /\)
  • section 2 - "suitable for use with as the body" ==> "suitable for use as ..." (remove "with")
  • In 8.5 Example 4, the "balance-150.0 > 0" has windSpeed in the JSON.
  • In 8.5 R18 H, misspelling of "SINGLECAHR"
  • In 8.7, use of term "temporal coordinate reference system" should probably just be "temporal reference system"
  • Figure 5 is a good diagram explaining the terms when on screen, but printed in monochrome, the color disappears and it's not clear there's only one i and many j's. This will also be problem for people who are green-red colorblind. Maybe adding some texture to i line would help with this.
  • Annex C -- the font face for the schema, Droid Sans Mono","DejaVu Sans Mono",monospace;, prints very oddly for the field values that are in bold -- it appears as if it's printed two very then characters about 1mm offset, and is very hard to read
@cportele cportele added the CQL2 label May 8, 2021
@cportele
Copy link
Member

cportele commented May 8, 2021

Thank you for the feedback and the thorough review. Getting more feedback from the STAC community will be very valuable, too.

I'd like to see a tighter description on the behavior when a variable that is not defined as a queryable is used in an expression. In section 6.2 Queryables, I think adding "only" would help to make it clear that only queryables are valid in expressions, and not arbitrary properties that you happen to know (out-of-band) exist on a feature, e.g., "to determine the only property names that may be used to construct...". [...]

Yes, that is this the intent and we need to make that clear(er).

6.6 filter-crs needs a reference to where to find these URIs and a examples in that section. The only example I found for a valid CRS URI is in 8.6 Example 10. One example I'd like to see is what the value for MODIS Sinusoidal is b/c that doesn't have an EPSG code. Also, which one of http://www.opengis.net/def/crs/OGC/1.3/CRS84 or http://www.opengis.net/def/crs/EPSG/0/4326 is preferred (if there is a preference).

Agreed, we need to clarify where to find the supported CRSs (in /collections/{collectionId}#/crs).

And we also need to be clearer in the wording that the default is http://www.opengis.net/def/crs/OGC/1.3/CRS84 for 2D coordinates and http://www.opengis.net/def/crs/OGC/0/CRS84h for 3D coordinates. Currently we use the short form, but that is ambiguous.

Adding an example would also help, I think.

In Queryables Example 1, add a title to the "geometry" field as is done with all of the other ones. Adding "description" values to all of these would also be good, since it would be useful to have in a UI displaying these as query options.

Good points, will add those.

The requirement that properties be allowed on the left side of binaryComparisonPredicate will make implementation more challenging with some datastores. Elasticsearch doesn't allow fields (propertyName) on the left side of expressions, so the slower "script query" feature must be used instead. At first I thought this was going to be a problem, but now I think that implementing Simple CQL is going to be complex enough already, so having one more special case when binaryComparisonPredicate has a propertyName for the first scalarExpression isn't going to be much harder to implement.

We don't have an issue about this, but there was a comment on Gitter about this and that Elasticsearch does not allow to "compare a field to another field". Maybe one option could be to only support properties on the left side and literals on the right side in Simple CQL and add property/property and literal/literal in a separate conformance class?

One question, you say that "Elasticsearch doesn't allow fields (propertyName) on the left side of expressions", but I think that is the typical way that expressions are written (e.g., floors > 5). Did you mean to say "right side"?

In 8.3, a mention that the spatial geometry (text) is WKT, the spatial geometry (JSON) is GeoJSON Geometry or envelope, and the temporal is using RFC 3339
There should be an explicit mention in 8.7 that the format of the datetime is derived from RFC 3339 and a reference to the format being used in Part 1.
In 8.6, it says "CQL uses Well-Known-Text (WKT)" -- but this should say that it use WKT or GeoJSON Geometry or envelope. It would be useful to have that these are the encodings at the beginning of 8.6, rather than only as a lead-in as to why filter-crs is necessary. Also, there's an extra dash between Known and Text.

These suggestions all make sense to me.

For servers that support filter-crs, it would be good to have a queryables-like mechanism for advertising which CRSes are supported.

Yes, that exists, see above. In the JSON representation of the collection (/collections/{collectionId}) the crs member is an array of supported CRSs. We need to make that clear and point to the relevant section in Part 2.

We should also point out that, if there is no crs member, only the default is supported and the filter-crs parameter has no use. This is implied in the condition in requirement 8, but we should be more explicit in the informative text.

Typo / Wordsmithing: ...

Thank you.

@cportele cportele added CQL Grammar Editorial Editorial changes and document updates outside of requirements classes Scope labels May 10, 2021
@cportele
Copy link
Member

2021-05-10: @pvretano will split separate issues out, if they need a discussion. Regarding the binary operator discussion and the support for properties on both sides, there was consensus to restrict Simple CQL to property/literal comparisons and add others in a separate conformance class.

@philvarner
Copy link
Contributor Author

The requirement that properties be allowed on the left side of binaryComparisonPredicate will make implementation more challenging with some datastores. Elasticsearch doesn't allow fields (propertyName) on the left side of expressions, so the slower "script query" feature must be used instead. At first I thought this was going to be a problem, but now I think that implementing Simple CQL is going to be complex enough already, so having one more special case when binaryComparisonPredicate has a propertyName for the first scalarExpression isn't going to be much harder to implement.

We don't have an issue about this, but there was a comment on Gitter about this and that Elasticsearch does not allow to "compare a field to another field". Maybe one option could be to only support properties on the left side and literals on the right side in Simple CQL and add property/property and literal/literal in a separate conformance class?

One question, you say that "Elasticsearch doesn't allow fields (propertyName) on the left side of expressions", but I think that is the typical way that expressions are written (e.g., floors > 5). Did you mean to say "right side"?

Yes, I meant so say that ES only allows fields on the left side by default. But, there is a slightly different syntax you can use that allows them to be used on either side ("script query" syntax). I don't know off-hand if there's any restriction like that on literals.

Putting RHS properties into a separate conformance class does add a tiny bit of complexity, but it just means implementations have to add that one entry, and it's a good safety value in case there's another datastore out there that cannot support RHS properties for some reason.

@philvarner
Copy link
Contributor Author

For servers that support filter-crs, it would be good to have a queryables-like mechanism for advertising which CRSes are supported.

Yes, that exists, see above. In the JSON representation of the collection (/collections/{collectionId}) the crs member is an array of supported CRSs. We need to make that clear and point to the relevant section in Part 2.

We should also point out that, if there is no crs member, only the default is supported and the filter-crs parameter has no use. This is implied in the condition in requirement 8, but we should be more explicit in the informative text.

Ah, thank you for clarifying that. I'm going to create a tracking issue for the STAC spec to see if we want to add support for Part 2, even if it's simply advertising that the CRS is always WGS84.

@jampukka
Copy link
Contributor

jampukka commented May 11, 2021

@philvarner do you see any issues with floors > doors e.g. comparing property-property rather than propery-literal / literal-property?

Edit: Just read @cportele's comment fully. Agree with the idea of simple-cql core being as simple as possible while still useful for majority of use-cases and extending the functionality with conformance classes to handle more complex stuff.

@jisantuc
Copy link

I worked on an implementation of the STAC API version of this in Franklin and added some comments in @philvarner's PR (radiantearth/stac-api-spec#128 (comment)). I'm copying the relevant bits here, working only with the CQL JSON side of things:

Strange mix of types?

One thing that I keep tripping on is a strange mixture of types. An expression for filtering a field is a single key at the root of an object, followed by either a map also holding a single key, or an array of maps holding a single key. This singularity strikes me as odd, because objects can hold many keys, so if there's only supposed to be one thing, why does the data structure in principle hold more than one? Similarly, for or and and, the value behind the key is supposed to be, e.g.:

[
  {"eq": ...},
  {"neq": ...}
]

These string keys in a plural structure are much less convenient to work with than an alternative in which they live under a kind key in each json object, like:

[
  {"kind": "eq", ...},
  {"kind": "neq", ...}
]

In the kind case, it's no longer a map from strings to the actual filter (which potentially can hold many keys), but one object discriminated by kind. The nice thing is that it makes singular entries singular and has a much nicer type in the and / or children (Array<CQLFilter> instead of "Array<Map<str, CQLFilter>> where the Map has exactly one key").

queryables seem pretty tough to know ahead of time

This is becoming a recurring theme with the open-endedness of the STAC spec, but it's not obvious to me what I'm supposed to do for queryables in a server implementation when the server will serve other people's data. It's also not clear to me whether queryables is required. I can accumulate item properties as someone fills Franklin's database with items (the dynamic case mentioned in the PR to the STAC API spec), but then it's not obvious to me what "queryable" means -- does it mean it's technically possible to query based on x, or that querying based on x will be fast? If the latter, then I need to expose some way for users to control what's queryable and what isn't, since Postgres JSONB indexing is really effective for equality at a key filters but pretty much worthless otherwise. There's a related issue for this in Franklin from before the CQL work here azavea/franklin#616 -- basically there's a trade-off in what fields we can search efficiently and index size / cost of inserting items, and it's hard to make a good choice about that without knowing what promise "x is queryable" represents.

It's also pretty difficult in the dynamic case to figure out what to do with fields that don't have a JSON schema. Let's say someone posts an item that has a string value in mySpecialField -- if I'm dynamically aggregating queryables, I don't know that all myVerySpecialField values are strings, just that I happen to have seen one.

@philvarner
Copy link
Contributor Author

philvarner commented May 19, 2021

Related to the format for the operators, the option James mentions above with using kind also works better because this text fails JSON validation because it has duplicate "eq" fields.

This was a bad example, nevermind!

@pvretano
Copy link
Contributor

pvretano commented Jun 7, 2021

@philvarner according to cql.json, shouldn't your example be:

    "and": [
      { "eq": [{ "property": "id" },"LC08_L1TP_060247_20180905_20180912_01_T1_L1TP"}] },
      { "eq": [{ "property": "collection" },"landsat8_l1tp"]}
   ]

@philvarner
Copy link
Contributor Author

@philvarner according to cql.json, shouldn't your example be:

ah, yes, i missed a level of braces. That's not even valid json. corrected.

@cportele
Copy link
Member

cportele commented Aug 2, 2021

@jisantuc - We merged a PR to address the issues raised by @philvarner (except for the ones that have been moved to their own issue).

However, we did not really address your comment, because the context was not clear to us and also how it is related to Phil's comments. For example, we are missing context for the "kind" discussion and how it is related to CQL. Please open a new issue, if there is a change that you would like to propose. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CQL Grammar CQL2 Editorial Editorial changes and document updates outside of requirements classes Scope
Development

Successfully merging a pull request may close this issue.

5 participants