-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Browseable recommendations and Children conformance class #229
Browseable recommendations and Children conformance class #229
Conversation
core/README.md
Outdated
@@ -62,24 +72,66 @@ A `service-doc` endpoint is recommended, but not required. | |||
| ------------- | ----------- | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | |||
| `service-doc` | `/api.html` | OAFeat OpenAPI | An HTML service description. Uses the `text/html` media type to refer to a human-consumable description of the service. The path for this endpoint is only recommended to be `/api.html`, but may be another path. | | |||
|
|||
Additionally, `child` relations may exist to individual catalogs and collections. | |||
Additionally, `child` relations may exist to child Catalogs and Collections and `item` relations to Items. These | |||
relations form a directed acyclic graph that supports browseable traversal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm throwing this comment here because it's the first mention of the acyclicality of the graph -- I don't think this is technically correct. I think the existence of parent
links means that the graph has cycles. Moreover, I think it's good that the graph has cycles -- it means we can get back up a level without using the back button / that we can retrace our steps using only the data. Is acyclicality an important property? I've thought about this a bit and I'm not sure what it gets us. I think this is just a normal directed graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so that's a good point and one I should be more clear on. I was primarily thinking about the graph of child
links, which must be a DAG. Also to make it clear that these are DAGs and not only trees, since the subcatalogs don't have to be distinct (e.g., you can slice them up by date or grid id, not only one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that even necessarily the case (that child links are acyclic)? I doubt anyone has mutual child
relationships, but:
- it's not explicitly disallowed
- I'm not aware of any notes recommending against multi-part cycles like catalog a has item a as a child, item a has catalog b as a child (nothing says items can't have children as far as I know), catalog a is a child of catalog b
I have no idea why anyone would do such a thing, I just read a nice Alloy 6 tutorial earlier today and it might have poisoned my brain forever.
Either way, unless the acyclicality is important for some reason, I'm not sure it's doing much work except encouraging thinking about edge cases where it doesn't hold -- I think a plain directed graph should be sufficient for supporting browseable traversal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jisantuc here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the language to only be directed graph
Would it make sense to split the Children conformance class and the rest of the PR into two PRs? This PR is so large that it's nearly impossible for me to really do a full PR in one go... Also, could it be that this PR also includes some other changes? For example, I've also seen pagination changes in here... |
children/README.md
Outdated
- **Extension [Maturity Classification](../extensions.md#extension-maturity):** Pilot | ||
- **Dependencies**: [STAC API - Core](../core) | ||
|
||
A STAC API can return information about all STAC [Catalogs](../stac-spec/catalog-spec/catalog-spec.md) available using a link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused if the server is expected to return just the first generation of children, the last generation of children, or the entire ancestry. For example, if I had Landsat catalogs organized as /catalogs/landsat_8_c1/{path}_{row}_{date}
, the /children
endpoint could:
- Return all
{path}
catalogs (immediate children). - Return every combination of
{path}_{row}_{date}
(last generation of children). - Return every combination of
{path}
,{path}_{row}
, and{path}_{row}_{date}
(entire ancestry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... is this a conformance class a "simple" way for providers to express their "preferred list of catalogs and collections to show on a frontpage"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer @geospatial-jeff 's question, it's up to the implementer. I think typically it will return exactly the same set of children that have child
relations from the root. The benefit to having it at this endpoint is that the title, desc, etc can all be returned, so a client doesn't have to retrieve every single child link just to find out the title to display -- this was Rob's use case.
Maybe "simple" isn't the right word. I'll think about this.
children/README.md
Outdated
A STAC API can return information about all STAC [Catalogs](../stac-spec/catalog-spec/catalog-spec.md) available using a link | ||
from the landing page that uses the link relation `children`, which links to an endpoint called | ||
`/children`. The purpose of this endpoint is to present a single resource from which clients can retrieve | ||
all the child objects (Catalogs and Collections) of a Catalog. This eliminates then need for a client to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that returning both Catalogs and Collections in the /children
endpoint is confusing. My understanding after reading this PR is that collections are for searching while catalogs are for browsing. It even says in the Browseable Catalogs
best practices that collections should not be included as part of the browseable tree of catalogs. Allowing collections to be used for browsing is poor separation of concerns, and instead of just recommending against their use the spec should forbid using collections in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a bit confusing. Like do I not need to retrieve /collections anymore if /children is implemented? How should clients handle all the duplication to show a list of unique catalogs/collections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the benefit of having children
return both Catalogs and Collections - it more fits with the STAC core specification which uses rel=child
for both of those cases. It also allows for all of the catalogs and collections to be retrieved in a single paginated call - useful for rendering the content of the API.
One addition we could make is that a query parameter can determine which type to return - e.g.
https://stac.api/children?type=catalog
Where is the best practices text you referenced located?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's included as part of https://github.com/philvarner/stac-api-spec/blob/catalogs-and-browseable/core/README.md#browseable-catalogs introduced in this PR. The exact language is:
These are the two standard ways of structuring a browseable tree of catalogs, the only difference being whether the Collection is used as part of the tree or not:
- Catalog (root) -> Catalog* -> Item (recommended)
- Catalog (root) -> Collection -> Catalog* -> Item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I explicitly stated it anywhere, but the intention was that /children
would return the same list of entities that are linked to via rel=child
from the root. The benefit (as Rob has articulated before) is that a client can get all of the entities with one call instead of having to make one HTTP request for each one to, say, only get the title of the collection or catalog.
@@ -0,0 +1,156 @@ | |||
# STAC API - Children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are catalogs returned by this endpoint allowed to implement item search (i.e. can they have a /search
endpoint)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The landing page is just a catalog, and itself implements /search
, so I'm guessing this is allowed. And the /search
response would only return items that are contained by the catalog (potentially across many collections).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, any (sub) Catalog can implement a search endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing this up. I thought I'd explicitly mentioned this, but I can't find it, so I'll make sure to add something.
The top level README (https://github.com/philvarner/stac-api-spec/tree/catalogs-and-browseable#in-this-repository) should be updated to link to the |
Some comments talking through this PR at with Matthias Matt and Chris:
Matt suggested scheduling a working session to clarify these points and others, and get this PR over the finish line - he will follow up |
I was confused by this in the call and did not pick this up as being agreed on yet. What's the background on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite done with full review, but want to get some of my comments in as not sure when I'll get back to it.
core/README.md
Outdated
that indicates to clients that this is a STAC API and how to access conformance classes, including this | ||
one. The relevant conformance URI's are listed in each part of the | ||
API specification. If a conformance URI is listed then the service must implement all of the required capabilities. | ||
Whenever a static STAC catalog is served over HTTP, it is a defacto hypermedia-driven web API. Even without implementing any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should define what a 'static STAC catalog' is? Maybe it's just rephrasing this, but as it reads it seems to assume some knowledge of a static STAC. Could also link to the section in stac-spec on static catalogs.
core/README.md
Outdated
Whenever a static STAC catalog is served over HTTP, it is a defacto hypermedia-driven web API. Even without implementing any | ||
STAC API conformance classes, the entire catalog can be traversed from the root via `child` and `item` link relations. Support for | ||
this "browse" mode of interaction is complementary to the dynamic search capabilities defined by other STAC API conformance classes. | ||
Conversely, many STAC API implementations do not support browse, even though the root is a Catalog object, because they do not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line and the next seems to be talking about existing STAC API implementations? I think it'd be good to rephrase it more abstractly for the spec. Describe the use case when API's do not support browse, and perhaps in parenthesis explain how many 1.0-beta and early catalogs didn't have the appropriate link relations to traverse. Like the spec should read as the spec, without too much dialog on the existing state of things.
core/README.md
Outdated
Conversely, many STAC API implementations do not support browse, even though the root is a Catalog object, because they do not | ||
have the appropriate `child` and `item` link relations to traverse over the objects in the catalog. | ||
Providing users with these two different, complementary ways of navigating the catalog allows them to interrogate the data in whichever | ||
way best meets their needs. Supporting these also opens up a catalog to both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain the use cases of both of these ways? One for crawling / browsing, one for getting the endpoints to search against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(could also be a link to a spot that discusses it more)
1. Catalog -> Catalog (product) -> Catalog (date) -> Catalog (path) -> Catalog (row) | ||
2. Catalog -> Catalog (product) -> Catalog (path) -> Catalog (row) -> Catalog (date) | ||
|
||
There are many options for how to structure these catalog graphs, so it will take some analysis work to figure out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be good to have some sort of 'best practices' linked to where options for this are discussed more. It also might be worth a bit of guidance here, just like thinking about it from a users perspective - how people would like to browse to data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed an issue to come back to these #243
I think this is an improvement over what we had before, but recognize there's still a better way to describe these that I can't quite articulate right now
core/README.md
Outdated
- child -> /catalogs/sentinel_2_l2a | ||
|
||
Since the catalogs structure is a directed acyclic graph which allows | ||
you to provide numerous different Catalog and Collection graphs reach leaf Items. For example, for a Landsat 8 data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to provide some guidance on the final 'item' link from different views. Like should they all link to the exact same item url? Or have different item urls that all have canonical hrefs to the same one? And should the canonical one be in the browse hierarchy, or in the collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
I see this is in direct conflict with this declaration of Catalogs as a browsable unit that contains Items. When we were discussing it, I was less thinking about the "browseable" part of APIs, which to me is less interesting. The children extension is interesting for my use case in that it allows the ability to create sub-STAC APIs to organize Collections into a multi-level hierarchy, which solves an issue when you have many Collections in an API, some of them strongly related. If we consider Collections as the "searchable" container vs child Catalogs being "sub-APIs", while a Catalog that is a child of a Landing Page root of a STAC API could have Items, it is confusing to have Items in Catalogs, which can't be searched, and confusing for Collections to contain anything but Items. This is true if we only want to use Catalogs in an API them as an organizational mechanism for Collections. Collections that contain Catalogs that organize the Items into things like date/path/row would allow for users to click through an organized set of items, enabling the "Browsable" part of all this work - I see that now. This compounds the complexity of using Catalogs for both organizing Collections (and existing as sub-APIs) and also organizing Items (which may or may not be a sub-API, though I'm not sure you'd want that many sub-APIs for groups of Items in the same Collection, though in order to have their own To think through something specific and imagine an API that might have both cases:
Perhaps someone can help clear that up for me by continuing that specific example? I was trying to get to the point where the MOD14A2 Collection has subcatalogs that organize the Items into browsable categories, let's say year month day. Then thinking through how that would translate into a front-end experience like STAC Browser, i.e. what endpoints the front end would call when. |
I extracted a bunch of the typo and wordsmithing changes into https://github.com/radiantearth/stac-api-spec/pull/234/files |
One other issue to consider is that OGC uses the term "crawlable", which I think is synonymous with our use of "browseable", so we should consider adopting that term instead. |
Thanks @geospatial-jeff -- I noticed the same and added it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I added a few commitable suggestions, but feel free to tweak them. And one or two other suggestions that are not 'must have', so I'm approving this.
I was wondering why you ended up with 'browseable', as I thought you were leaning towards 'crawlable' like OGC. I'm happy with it as is, just curious.
browseable/README.md
Outdated
This JSON is what would be expected from an API that implements `STAC API - Browseable`. | ||
|
||
This particular catalog provides both the ability to browse down to child Catalog objects through its | ||
`child` links, which then will eventually reach Items through `item` link relations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example below doesn't seem particularly meaningful. Could be good to try to illustrate the point more, since usually the example helps show that. Perhaps a little diagram, that shows a non-browsable catalog vs a browsable one? Or perhaps just explain after the example that a catalog not implementing browsable would not have the 'child' links, but it would have them in the 'data' rel link. I think it'd be good to just provide more context, and to help make it clear to existing implementations what they need to do.
This looks great, approved, I just left a comment regarding the use of the word "shall". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some comments related to using catalogs/
instead of collections/
for hosting sub-catalogs for Item groupings for the Hierarchy recommendations. However I don't think these comments are blocking.
💯 well done!
|
||
| Endpoint | Returns | Description | | ||
| --------------------- | ---------------------------------------------- | -------------------- | | ||
| `/catalogs/{catalogId}` | [Catalog](../stac-spec/catalog-spec/README.md) | child Catalog object | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous question on the example using /catalogs/landsats-8-c1
is answered here. I see that it's a recommendation to avoid conflicts, though I think implementations could work around that and it would provide a cleaner set of URLs to avoid both a Catalog and Collection with the same name. Actually I was surprised to see that Catalog.id and Collection.id don't at least recommend there isn't a conflict inside the same root, as having duplicate IDs for Collections and Catalogs might get confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a pretty reasonable recommendation to make. My only hesitancy in making it is that I feel like Catalog and Collections (as we currently define them) are not related in a way in which there would ever be confusion between them. If we'd defined a Collection as-a Catalog (maybe with the additional restriction that only one Collection can exist in a hierarchy?), then I could see a good case for not duplicating ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @philvarner!
Co-authored-by: Chris Holmes <[email protected]>
Co-authored-by: Rob Emanuele <[email protected]>
Co-authored-by: Chris Holmes <[email protected]>
Co-authored-by: Chris Holmes <[email protected]>
My understanding was that the semantics of this may be slightly different than whatever crawlable ends up meaning in OGC, and that crawlable might not even be the term they end up using. Given that uncertainty, if we go with our own terminology, we can easily align it to theirs when then work is finalized, whereas it would be very confusing if we had something that was named the same but different. |
Related Issue(s): #17 #159 #138 #137 #230
Proposed Changes:
PR Checklist:
stac-spec
directory (these are included as a subtree and should be updated directly in radiantearth/stac-spec)