Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Bundled streams w/ self-identification #3875

Closed
wants to merge 3 commits into from

Conversation

handrews
Copy link
Member

@handrews handrews commented Jun 3, 2024

[NOTE: Please prioritize the open 3.0.4 and 3.1.1 PRs]

This adds a proposal for bundling both OAS (including full 3.1 support) and Arazzo using YAML native streams or any of several JSON document stream formats. It includes self-identification of documents to ensure that all references within the stream can be resolved without looking outside of the stream (except for External Documents).

Also paging @frankkilcommins for the Arazzo aspect (GitHub won't let me make you a reviewer, for who knows what reason).

This adds a proposal for bundling both OAS (including full 3.1 support)
and Arazzo using YAML native streams or any of several JSON document
stream formats.  It includes self-identification of documents to ensure
that all references within the stream can be resolved without looking
outside of the stream (except for External Documents).
@handrews handrews added this to the v3.2.0 milestone Jun 3, 2024
@handrews handrews requested review from a team as code owners June 3, 2024 23:05
Copy link
Contributor

@ralfhandl ralfhandl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, minor nits

proposals/2024-06-01-Self-Identification-and-Bundling.md Outdated Show resolved Hide resolved
proposals/2024-06-01-Self-Identification-and-Bundling.md Outdated Show resolved Hide resolved
proposals/2024-06-01-Self-Identification-and-Bundling.md Outdated Show resolved Hide resolved
proposals/2024-06-01-Self-Identification-and-Bundling.md Outdated Show resolved Hide resolved
proposals/2024-06-01-Self-Identification-and-Bundling.md Outdated Show resolved Hide resolved

## Introduction

Poor support for external references has fractured the OAS tooling landscape, with many tools requiring multi-document OpenAPI Descriptions (OADs) to be combined into a single document. Arazzo requires resolving sources and runtime expressions from multiple OADs, each of which might consist of multiple documents. There is no way to combine all of the OAD and Arazzo documents involved into a single document, but an alternate solution would be a bundle similar to [what we recommend on our blog for bundling Schema Objects](https://www.openapis.org/blog/2021/08/23/json-schema-bundling-finally-formalised). This would require a similar mechanism to JSON Schema's `$id` for OAS and Arazzo documents and the components within them. It would also provide an alternative to current multi-to-single-document OAD tools, most (possibly all) of which do not fully support OAS 3.1, and allow for _lossless bundling of identifiable components_, which is increasingly needed by industry standards groups publishing API "building blocks" for use across many APIs by many different providers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way to combine all of the OAD and Arazzo documents involved into a single document

I'd like to understand why this is the case. What about just OAD documents?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objects in descriptions are scoped to the document in which they originate. We don't really have semantics for merging multiple documents. In some cases, it would involve renaming unique identifiers like map keys. Tools that have done this create an inconsistent experience, and to reverse the process requires a source map of some kind. Maintaining document identity in a multi-document scenario obviates the need for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikekistler basically what @kevinswiber said, but I'll add two things:

I chatted with someone on some Slack a while ago who was having a lot of problems because his "merge to a single document" tools that he needed in order to get something (AWS gateway, maybe?) to accept the OAD was messing things up with what it though were "safe" transformations. As Kevin pointed out, there are a lot of choices that tools doing this have to make, and those can be surprising and sometimes breaking. When you start with a single document, split it, and then use the same toolchain to re-combine the pieces, that works quite well because the split and combine tool are the same and make the same assumptions. But mixing toolchains doesn't work well here.

Another way to look at it is to look at how JSON Schema bundling works. It relies heavily on "$id" to preserve not just the referencing behavior but the literal reference values. This way there isn't any "rewriting" of the documents, and no need to merge them. They are simply placed in the standard location ("$defs") and used in a way that the name under "$defs" is irrelevant.

As briefly mentioned in the call this morning, if we want to support "$id" (or an equivalent named differently to reduce confusion with JSON Schema) in every Object type in the OAS and Arazzo, we could absolutely implement single-document bundling. But nested "$id"s make it much harder to determine the base URI, which is something that tooling vendors seem to struggle with (or are just indifferent to) as it is. Using a document stream minimizes the changes and implementation work involved.

I'm trying to get the maximum improvement for the minimum effort with this proposal.


### Arazzo not supported

All current tools depend on it being possible to structure OADs as single JSON or YAML documents. This is not possible with Arazzo, as it coordinates multiple OADs without being part of any of them. We do not yet know how much of a challenge this will be for Arazzo, but history suggests that the ecosystem will be healthier if a clear solution is endorsed early on.
Copy link
Contributor

@kevinswiber kevinswiber Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make composite keys using a combination of the identifier for the entry OAD and identifiers for dependency documents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinswiber Arazzo identifying targets in OADs is not a problem, if that's what you mean. I'm just kind-of assuming that since multi-document support has been a challenge (or just not viewed as cost-effective) for our community, publishing a spec that depends on reading and using multiple documents seems a touch risky.


Field Name | Type | Description
---|:---|:---
self | `URI-reference` (without a fragment) | Sets the URI of this document, which also serves as its base URI in accordance with [RFC 3986 §5.1.1](https://www.rfc-editor.org/rfc/rfc3986#section-5.1.1); the value MUST NOT be the empty string and MUST NOT contain a fragment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For interleaving multiple OAD streams, we could also include an optional field to identify the bundle, perhaps using the entry document ID. Ideally, it would allow multiple bundles to prevent duplicating documents in certain scenarios, but I'm usually the odd one out for being cool with arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinswiber I think this would be a good further direction, but I worry about making the initial thing too complicated. Really, the only thing you need to solve for having multiple OADs in a stream is how to identify each entry document. This proposal just says "the first document in the stream is the entry document", which means there can only be one clear OAD in the stream. But there's no reason you couldn't get a stream, and then select (by URI) each entry document you want to try to parse from the stream. As long as all needed documents are in the stream, it would work fine.

@handrews
Copy link
Member Author

Per yesterday's TDC call, I am going to split this into two proposals, as the self field met with wide acceptance, while there are still questions and concerns about bundling. Ironically, these two were only together because I though the bundling piece would be enthusiastically embraced, and would be the only way to sell folks on the self field 🙃

@handrews handrews closed this Jul 19, 2024
@handrews handrews deleted the self-id branch October 19, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants