Skip to content

Latest commit

 

History

History
1054 lines (857 loc) · 46.3 KB

0075-object-param-and-result-types.md

File metadata and controls

1054 lines (857 loc) · 46.3 KB
status title creation-date last-updated authors
implementable
Object/Dictionary param and result types
2021-07-14
2022-04-08
@bobcatfish
@skaegi

TEP-0075: Object/Dictionary param and result types

Summary

Recommendation: read TEP-0076 (array support in results and indexing syntax) before this TEP, as this TEP builds on that one.

This TEP proposes adding object (aka dictionary) types to Tekton Task and Pipeline results and params, as well as adopting a small subset of JSONPath syntax (precedent in the array expansion syntax we are using) for accessing values in these objects.

This proposal is dependent on TEP-0076 (array support in results and indexing syntax) in that this TEP should follow the precedents set there and also in that if we decide not to support array results, we probably won't support object results either.

This proposal also includes adding support for limited use of json object schema, to express the expected structure of the object.

Dictionary vs. object: The intended feature was supporting "dictionaries", but JSON Schema calls these "objects" so this proposal tries to use "objects" as the term for this type.

Motivation

Tasks declare workspaces, params, and results, and these can be linked in a Pipeline, but external tools looking at these Pipelines cannot reliably tell when images are being built, or git repos are being used. Current ways around this are:

This proposal takes the param name type hinting a step further by introducing object types for results: allowing Tasks to define structured results. These structured results can be used to define interfaces, for example the values that are expected to be produced by Tasks which do git clone. This is possible with just string results, but without a grouping mechanisms, all results and params are declared at the same level and it is not obvious which results are complying with an interface and which are specific to the task.

Goals

  • Tidy up Task interfaces by allowing Tasks to group related parameters (similar to the long parameter list or too many parameters "code smell")
  • Enabled defining well known structured interfaces for Tasks (typing), for example, defining the values that a Task should produce when it builds an image, so that other tools can interface with them (e.g. Tekton Chains type hinting)
  • Take a step in the direction of allowing Tasks to have even more control over their params and results (see pipelines#1393) (e.g. one day providing built in validation for params as part of their declaration)

Non-Goals

  • Adding complete JSONPath syntax support
  • Adding support for nesting, e.g. object params where the values are themselves objects or arrays (no reason we can't add that later but trying to keep it simple for now)
  • Adding complete JSON schema syntax support
  • Supporting use of the entire object in a Task or Pipeline field, i.e. as the values for a field that requires a object, the way we do for arrays. Keeping this out of scope because we would need to explictly decide what fields we want to support this for.

Use Cases

  1. Grouping related params and results such that users and tools can make inferences about the tasks. For example allowing Tekton Chains and other tools to be able to understand what types of artifacts a Pipeline is operating on.

    Tekton could also define known interfaces, to make it easier to build these tools. For example:

    • Images - Information such as the URL and digest
    • Wheels and other package formats (see below)
    • Git metadata - e.g. the state of a git repo after a clone, pull, checkout, etc.
    • Git repo configuration - e.g. the info you need to connect to a git repo including url, proxy configuration, etc.

    For example the upload-pypi Task defines 4 results which are meant to express the attributes of the built wheel:

      results:
      - name: sdist_sha
        description: sha256 (and filename) of the sdist package
      - name: bdist_sha
        description: sha256 (and filename) of the bdist package
      - name: package_name
        description: name of the uploaded package
      - name: package_version
        description: version of the uploaded package

    Some interesting things are already happening in this example: sdist_sha contains both the filename and the sha - an attempt to group related information.

    With object results, we could define the above with a bit more structure like this:

      results:
        - name: sdist
          description: |
            The source distribution
             * sha: The sha256 of the contents
             * path: Path to the resulting tar.gz
          type: object
          properties:
            sha:
             type: string
            path:
             type: string
        - name: bdist
          description: |
            The built distribution
             * sha: The sha256 of the contents
             * path: Path to the resulting .whl file
          type: object
          properties:
            sha:
             type: string
            path:
             type: string
        - name: package
          description: |
            Details about the created package
             * name: The name of the package
             * version: The version of the package
          type: object
          properties:
            name:
             type: string
            version:
             type: string

    Eventually when we have nested support, we could define a wheel interface which contains all of the above.

  2. Grouping related params to create simpler interfaces, and allowing similar Tasks to easily define pieces of their interface. For example the git-clone task has 15 parameters , and the git-rebase task has 10 parameters . Some observations:

    • Each has potential groupings which stand out, for example:
      • The git-rebase task could group the PULL and PUSH remote params; each object would need the same params, which could become an interface for "git remotes"
      • Potential groupings for the git-clone task stand out as well, for example the proxy configuration
      • On that note, since the git-rebase task is using git and accessing remote repos, it probably needs the same proxy configuration, so what they probably both need is some kind of interface for what values to provide when accessing a remote git repo Other examples include okra-deploy which is using param name prefixes to group related params (e.g. ssh-, okra-vm, okra-token).

Requirements

  • Must be possible to programmatically determine the structure of the object
  • Must be possible for a result object to be empty
  • Must be possible to use the object results of one Task in a Pipeline as the param of another Task in the pipeline which has a object param when the interface matches (more detail in matching interfaces and required vs optional keys)
  • Must be possible to use one specific value in an object result of one Task in a Pipeline as the param of another Task in the pipeline which has a string param
    • If there is no value for the specified key, the Pipeline execution should fail (or we may be able to use TEP-0048 to specify a default value to use in that case instead)

Proposal

  • We would add support for object types for results and params, in addition to the existing string and array support
    • Initially we would only support string values, eventually we can expand this to all values (string, array, object) (note that this is the case for arrays as well, we don't yet support arrays of arrays)
    • Only string key types would be supported (which is the only key type supported by json)
  • We would use json object schema syntax (as previously suggested - and see also why JSON Schema) to express the object structure
  • To support object results, we would support writing json results to the results file, as described in TEP-0076 Array results and indexing (see "why json" in that proposal)

This feature would be considered alpha and would be (optional)gated by the alpha flag.

Defaulting to string types for values

As an optimization, we'd support defaulting to string types for keys without the Task author needing to explicitly specify this. For example this more verbose specification:

 - name: sdist
   description: |
     The source distribution
      * sha: The sha256 of the contents
      * path: Path to the resulting tar.gz
   type: object
   properties:
     sha:
       type: string
     path:
       type: string

Would be equivalent to:

 - name: sdist
   description: |
     The source distribution
      * sha: The sha256 of the contents
      * path: Path to the resulting tar.gz
   type: object
   properties:
     sha: { } # type:string is implied
     path: { } # type:string is implied

This would be supported in recognition that:

a. Only string types would initially be supported b. Even when other types are supported, the most common usage of objects will likely use string values

Adding additional JSON Schema properties

This proposal suggests adding a new properties section to param and result definition. If we later support more json schema attribute such as additionalProperties and required, we'd also support them at the same level as the properties field here. ( See Alternative #1 adding a schema section .) (At that point we should also consider whether we want to adopt strict JSON schema syntax or if we want to support Open API schema instead; see why JSON Schema.)

Why JSON Schema?

Assuming we move forward with using JSON to specify results (see "why json" in TEP-0076 Array results and indexing), we'll need a syntax that allows us to define schemas for JSON.

Since JSON schema was created for exactly that purpose, it seems like a reasonable choice (see original suggestion). OpenAPI schema objects are an interesting alternative which builds on JSON schema and are already used by Kubernetes to publish schema valiation for CRDs . The subset of JSON Schema that we are proposing to adopt in this TEP is compatible with OpenAPI schema (Open API supports JSON Schema Keywords), so at the point when we start proposing more JSON Schema support we should consider if we want to support the Open API schema variations instead.

Alternative: CUE

The language CUE also provides a syntax for defining schemas. It allows using less text to express the same schema

Pros:

  • Less text required; much more succinct to express complex validation
  • CUE can be used to validate JSON so if we decided to use CUE for expressing schemas, Tasks could still output results in JSON (i.e. no additional complication for Task authors when writing steps and no need for additional tools to generate results in the right format)

Cons:

  • CUE is a superset of json; in order to express CUE within our existing yaml and json types, we'd need a way to encode CUE within those types
  • CUE is intended for more than just expressing schemas, it is intended also for code generation, and expressing configuration. (Instead of embedding it into Tekton types, maybe a Tekton integration that takes advantage of all CUE has to offer would be to use CUE to generate Tekton types?)
  • In the open API schema comparison the open api version can be read without needing to learn a specific syntax (e.g. the difference between max?: uint & <100 the verbose Open API version which specifies max is an integer with minimum 0 and exclusive maximum 100).
  • Our flexibility design standard recommends we prefer that when we need to pull in some other language, we use existing languages which are widely used and supported. Instead of being an early adopter of CUE we could wait until it is more popular and then consider using it (we can also delay this decision until we want to add more schema support).

Matching interfaces and extra keys

An object provided by one Task (as a result) will be considered to match the object expected by another Task (as a params) if the object in the result contains at least the required keys declared by the param in the consuming Task. The extra keys will be ignored, allowing us to take a duck typing approach and maximize interoperabiltiy between tasks.

For example imagine a Task which requires a param object that contains url and httpsProxy. Another task produces an object with those two keys in addition to other keys. It should be possible to pass the object with the additional keys directly to the Task which requires only url and httpsProxy without needing to modify the object in between.

If we expand our JSON Schema support in the future (and we have use cases to support it) we could allow Tasks to express if they would like to override this and not allow additional keys to be provided via JSON Schema's additionalProperties field (which defaults to allowing additional properties).

When the Task writes more keys than it declares in its results:

  • The TaskRun will succeed
  • The additional keys will not be included in the TaskRun status
  • The additional keys will not be available for variable replacement in a Pipeline; only the declared keys will be available

When a Task uses the result of a previous Task which declares more keys in addition to what the Task needs:

  • This will be allowed
  • The TaskRun will be created with only the keys that the Task needs; the rest will be ignored

Required vs optional keys

We have several options for how we define and handle required and optional keys:

  1. Initially do not support the required field and imply implicitly that all keys are required (i.e. make the assumption in the controller but do not reflect it in the types themselves).
    • Pro: By not supporting the required field at all (initially) there will be no ambiguity or confusion in the initial version of this behavior. This approach lets us postpone answering the question of how to effectively support required vs. optional fields. We've been going back and forth on this issue, and by punting the question for now, we can let people use the feature and gather feedback before deciding (e.g. maybe we won't need to support optional fields at all)
    • Pro: We are already suggesting infering some things such as that type: string is being used for empty properties, so we're already setting a precedent for infering some parts of the schema definition
    • Con: Anyone reading the JSON Schema in the Task would think the attributes are optional when they are not
  2. Embrace the optional by default behavior of JSON Schema; instead of validating the presences of "required" keys via jsonschema, we could do the validation based on the declared variable replacement, e.g. say a Task declares it needs an object called foo with an attribute called bar. At runtime if the Task is provided with an object foo that doesn't have an attribute called bar this will be okay unless the Task contains variable replacement that uses it ($(params.foo.bar)).
    • Pro: Allows us to use JSON Schema as is and gives us the behavior we want.
    • Pro: Optional by default behavior will be useful if we pursue the future option of supporting pre-defined schemas: optional by default will allow Tasks to refer to schemas for their params, and additive changes can be made to those schemas without requiring all Tasks using them to be updated.
    • Con: weird because effectively we're implying required (assuming the fields are used, and if they aren't used, why are they there) even though the JSON Schema declaration says the fields are optional.
  3. Infer that all keys are required and add the required field via a mutating admission controller for all properties in the dictionary (unless a required field is already added).
    • Con: it would be strange to mutate instances that will be reused; in fact mutations like this would interfere with efforts we might explore in the future to provides hashes of Tasks and Pipelines so folks can ensure the are running what they think they are running.
  4. Make our version of JSON Schema deviate from the official version: default to required and instead introduce syntax for optional (in fact early versions of JSON Schema used optional instead of required)
  5. Create our own JSON Schema based syntax instead
  6. When using variable replacement with optional fields (i.e. fields that are not explicitly listed in the required stanza), if they are not provided at runtime, replace them with a default zero value (e.g. "" for string types)
    • TEP-0048 is exploring a similar feature, i.e. a way of providing defaults when values are not available when doing variable replacment. Suggest we let TEP-0048 progress and if we decide to provide zero values for optional fields we follow a similar approach. If we support this we may also need a syntax to allow authors to check if a value has been set or is a default value.

This proposal suggests we use (1):

  • Do not support the required field (at least initially)
  • Imply that all fields are reuquired (as if the required field was present and all object fields were listed.)

For example in the following param gitrepo the url and commitish fields will both be required; if at runtime one or both of them are not present in the object provided for the gitrepo parameter, execution will fail (in the same way as it would for a parameter that is entirely missing and has no default specified):

spec:
  params:
    - name: gitrepo
      type: object
      properties:
        url: {}
        commitish: {}

The behavior will be as if the above param specification included required: [url, commitish].

Notes/Caveats

See also TEP-0076 Array results and indexing for notes and caveats specific to supporting json results.

  • Since json only supports string types for keys (aka "names") we would only support string types for keys
  • What if a Pipeline tries to use a object key that doesn't exist in a param or result?
    • Some of this could be caught at creation time, i.e. pipeline params would declare the keys each object param will contain, and we could validate at creation time that those keys exist
    • For invalid uses that can only be caught at runtime (e.g. after fetching the task spec), the PipelineRun would fail

Risks and Mitigations

See TEP-0076 Array results and indexing for risks and mitigations specific to expanding ArrayOrString to support more types, including size limitations caused by our implementation of results (pipelines#4012).

User Experience

  • Using the JSON Schema syntax to describe the structure of the object is a bit verbose but looking at our Alternatives this seems like the best option assuming we want to support more complex types in the future.

Design Details

Declaring object results and params

We would use JSON object schema syntax to declare the structure of the object.

Declaring defaults for parameters that are of type object would follow the pattern we have already established with array param defaults which is to declare the value in the expected format in yaml (which is treated as json by the k8s APIs).

If a value is provided for the param, the default will not be used (i.e. there will be no behavior to merge the default with the provided value, if for example the provided value specified some fields but not others).

Params example with default value:

  params:
    - name: pull_remote
      description: JSON Schema has no "description" fields, so we'd have to include documentation about the structure in this field
      type: object
      properties:
        url: {
          type: string
        }
        path: {
          type: string
        }
      default:
        url: https://github.com/somerepo
        path: ./my/directory/

Results example:

  results:
    - name: sdist
      description: JSON Schema has no "description" fields, so we'd have to include documentation about the structure in this field
      type: object
      properties:
        sha: {
          type: string
        }
        path: {
          type: string
        }

Emitting object results

As described in TEP-0076 Array results and indexing, we add support for writing json content to /tekton/results/resultName, supporting strings, objects and arrays of strings.

For example, say we want to write to emit a built image's url and digest in a dictionary called image:

  1. Write the following content to $(results.image.path):

    {"url": "gcr.io/somerepo/someimage", "digest": "a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09"}
  2. This would be written to the pod termination message as escaped json, for example (with a string example included as well):

    message: '[{"key":"image","value":"{\"url\": \"gcr.io\/somerepo\/someimage\", \"digest\": \"a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09\"}","type":"TaskRunResult"},{"key":"someString","value":"aStringValue","type":"TaskRunResult"}]'
    
  3. We would use the same ArrayOrString type (expanded to support dictionaries, and in TEP-0074, arrays) for task results, e.g. for the above example, the TaskRun would contain:

      taskResults:
      - name: someString
        value: aStringValue
      - name: image
        value:
          url: gcr.io/somerepo/someimage
          digest: a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09

Using objects in variable replacement

We add support for the JSONPath subscript operator for accessing "child" members of the object by name in variable replacement.

For example:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
...
spec:
  params:
    - name: gitrepo
      type: object
      properties:
        url: {}
        commitish: {}
...
tasks:
  - name: notify-slack-before
    params:
      - name: message
        value: "about to clone $(params.gitrepo.url) at $(params.gitrepo.commitish)"
  - name: clone-git
    runAfter: [ 'notify-slack' ]
    params:
      - name: gitrepo
        value: $(params.gitrepo[*])
    results:
      - name: cloned-gitrepo
        type: object
        properties:
          url: {}
          commitish: {}
    taskSpec:
      params:
        - name: gitrepo
          type: object
          properties:
            url: {}
            commitish: {}
      steps:
        - name: do-the-clone
          image: some-git-image
          args:
            - "-url=$(params.gitrepo.url)"
  - name: notify-slack-after
    params:
      - name: message
        value: "cloned $(tasks.cloned-git.results.cloned-gitrepo.url) at $(tasks.cloned-git.results.cloned-gitrepo.commitish)"

This proposal does not include adding support for any additional syntax (though it could be added in the future!).

Variable replacement with object params

When providing values for objects Task and Pipeline authors can provide an entire object as a value only when the value is also an object (see matching interface and extra keys) by using the same [*] syntax used to provide entire arrays. (Note that supporting this replacement for arbitrary Task and Pipeline fields is currently out of scope; this feature would apply to binding results to params only.

In the above example:

params:
  - name: gitrepo
    value: $(params.gitrepo[*])

When providing values for strings, Task and Pipeline authors can access individual attributes of an object param; they cannot access the object as whole (we could add support for this later).

In the above example, within a Pipeline Task:

value: "about to clone $(params.gitrepo.url) at $(params.gitrepo.commitish)"

In the above example, within a Task spec:

- "-url=$(params.gitrepo.url)"

When populating a string field, It would be invalid (at least initially) to attempt to do variable replacement on the entire gitrepo object ($(params.gitrepo)). If we choose to support this later we could potentially replace the value with the json representation of the object directly.

Variable replacement with object results

Within a Task

Within the context of a Task, Task authors would continue to have access to the path attribute of a result (identified by name) and would not have any additional variable replacement access to the keys.

In the above example:

cat /place/with/actual/cloned/gitrepo.json > $(results.cloned-gitrepo.path)

$(results.cloned-gitrepo.path) refers to the path at which to write the clone-gitrepo object as json (it does not refer to an attribute of cloned-gitrepo called path, and even if cloned-gitrepo had a path attribute, there would be no collision because variable replacement for individual attributes of the result is not supported within the Task.

(See emitting object reesults for more details.)

Within a Pipeline Task

Within the context of a Pipeline Task, the Pipeline author can refer to individual attributes of the object results of other Tasks.

In the above example:

value: "cloned $(tasks.cloned-git.results.cloned-gitrepo.url) at $(tasks.cloned-git.results.cloned-gitrepo.commitish)"

As with params only access to individual keys will be allowed initially.

Conflicts

1. Collisions with builtin variable replacement

Tekton provides builtin variable replacements which may one day conflict with the keys in a dictionary. Existing variable replacements will not conflict (the closest candidate is path, as in $(results.foo.path) but since this is only used in the context of writing Tasks results, there will be no conflict - see variable replacement with object params for more detail).

But still, features we add in the future may conflict, for example if we added an optional params feature, we might provide a variable replacement like $params.foo.bound. What if the foo param was an object with a key called bound ?

For example:

results:
  - name: foo
    type: object
    properties:
      bound: {
        type: string
      }

TEP-0080 added support for using [] syntax instead of . to support variables which are have names that include .. Resolving the ambiguity here would only require using $params.foo.bound to refer to the built in variable replacement, and $params.foo["bound"] to unambiguously refer to the key bound within foo. Other options include:

  1. Do not allow keys to be defined which conflict with variable replacement
    • Con: this means that a Task which is perfectly fine today might break tomorrow if we introduce new variable replacement - in the above example, foo.bound is perfectly valid today and would be allowed but would suddenly stop working after the bound feature is introduced
  2. Let the defined object override the variable replacement; i.e. in this case the built in replacement will not be available

2. Collisions when other string/array parameter names contain dots .

Problem: It has been supported to have dots. in parameter names. But things get tricky when both a string/array parameter named foo.bar and an object parameter named foo with a key called bar are declared in the same taskSpec.

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generateName: object-param-test-
spec:
  taskSpec:
    params:
      - name: foo
        properties:
          key1: {}
          bar: {}
        default:
          key1: "val1"
          bar: "val2"
      - name: foo.bar
        default: "tricky"
    results:
      - name: echo-output
    steps:
      - name: echo-params
        image: bash
        script: |
          set -e
          echo $(params.foo.bar) | tee $(results.echo-output.path)

Design: $(params.foo.bar) would be treated as a reference to the key bar of the object parameter foo. Similar design applies to result names i.e. $(tasks.task1.results.foo.bar) refers to the key bar of the object result foo from task1.

Reason: Without this TEP-0075 implemented, $(params.foo.bar) would be treated by the validation webhook as an invalid reference for the parameter named foo.bar since parameter names containing dots can only be referenced using bracket notation i.e. $(params["foo.bar"]) because TEP-0080 added support for using [] syntax instead of dots . to support variables which are have names that include dots . (This doc also mentioned this.). Therefore, after this TEP-0075 is implemented, $(params.foo.bar) will be treated naturally as a reference to a key of an object parameter without any conflicts.

3. Problem when the object parameter name itself or its key name contains dots .

Problem: If a parameter of object type is named foo.bar and has two keys mykey and my.dot.key like the example below, it would become confusing to users how to reference the whole object parameter itself, and how to reference its individual keys.

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generateName: object-param-test-
spec:
  taskSpec:
    params:
      - name: foo.bar
        properties:
          mykey: {}
          my.dot.key: {}
      - name: foo
        properties:
          bar: {}
          anotherkey: {}

Design: For parameters of object type, using dots in its names or its key names is NOT allowed (at least in the initial implementation). Similarly, object result and key names shouldn't contain dots.

Reason: The motivation of TEP-0080 supporting dots in variable names is using domain-scoped names as a way for the domain owner to "own" the definition of what those parameters and results are for and how they will be used. For parameters of object type, the parameter name and its individual key names are already domain-scoped, which means there seems no need to still use dots in object parameter names or its key names.

Alternatives: Allow users to use dots in object parameter names and its key names. That means names containing dots must be referenced in bracket []. Possible ways to reference individual keys

  1. using brackets only: $(params["my.object"]["mykey"])
  2. mixing bracket and dot: $(params["my.object"].mykey)

NOTE: If the key name also contain dots i.e. my.key, only #1 is allowed i.e. $(params["my.object"]["my.key"]).

Test Plan

In addition to unit tests the implementation would include:

  • At least one reconciler level integration test which includes an object param and an object result
  • A tested example of a Task, used in a Pipeline, which:
    • Declares an object param
    • Emits an object result
    • In the Pipeline, another Task consumes a specific value from the object

Design Evaluation

See also the TEP-0076 Array results and indexing design evaluation.

  • Reusability:
    • Pro: PipelineResources provided a way to make it clear what artifacts a Pipeline/Task is acting on, but using them made a Task less resuable
      • this proposal introduces an alternative that does not involve making a Task less reusable (especially once coupled with TEP-0044)
  • Simplicity
  • Flexibility
    • Pro: Improves support for structured interfaces which tools can rely on
    • Con: Although there is a precedent for including JSONPath syntax, this is a step toward including more hard coded expression syntax in the Pipelines API (without the ability to choose other language options)
    • Con: We're also introducing and committing to JSON Schema! Seems worth it for what we get tho
  • Conformance
    • Supporting this syntax would be part of the conformance surface; the JSON Schema syntax is a bit verbose for simple cases (but paves hte way for the more complex cases we can support later, including letting Tasks express how to validate their own parameters)

Drawbacks

See also TEP-0076 Array results and indexing for more drawbacks.

  • In the current form, this sets a precedent for pulling in more schema support (why JSON Schema)
  • This improves the ability to make assertions about Task interfaces but doesn't (explicitly) include workspaces at all. For example, say you had a “git” param for a pipeline containing a commitish and a url. This is likely going to be used to clone from git into a workspace, and there will be some guarantees that can be made about the data that ends up in the workspace (e.g. the presence of a .git directory). But there is no way of expressing the link between the workspace and the params.

Alternatives

Alternative #1: Introduce a schema section specifically for the type schema

Instead of embedding the details of the type in the param/result definition, we could introduce a new schema section.

For example, instead of the proposed syntax where we allow a new value object for type and we add properties:

params:
  - name: someURL
    type: string
  - name: flags
    type: array
  - name: sdist
    type: object
    properties:
      sha: { }
      path: { }

We could add an explicit new schema section:

params:
  - name: someURL
    schema:
      type: string
  - name: flags
    schema:
      type: array
  - name: sdist
    schema:
      type: object
      properties:
        sha: { }
        path: { }

Pros

  • It's very clear where the schema is defined; specifically which part of the Task/Pipeline spec is considered JSON Schema. In the current proposal the JSON Schema is mixed in with our own fields (specifically name).

Cons

  • For simple types this is more verbose - once we allow more complex objects the extra level of indentation won't be as noticeable, but when defining simple types it feels unnecessary.

  • If we go this route, we'll have to grapple with the question of what to do with the existing type field. For example, if we add the syntax suggested here to the existing type syntax, their uses would look like:

    params:
    - name: someURL
      type: string
    - name: flags
      type: array
    - name: sdist
      schema:
        type: object
        properties:
          sha: {
            type: string
          }
          path: {
            type: string
          }

    If we deprecated the type syntax and only used JSON Schema, the above would become very verbose:

    params:
    - name: someURL
      schema:
        type: string
    - name: flags
      schema:
        type: array
    - name: sdist
      schema:
        type: object
        properties:
          sha: {
            type: string
          }
          path: {
            type: string
          }

Alternative #2: Create a wrapper for JSON Schema

We could also go one step further and create a syntax which we translate into JSON Schema inside the controller, but is customized to meet our needs, specifically:

For example, instead of:

params:
  - name: someURL
    description: ""
    schema:
      type: string
  - name: flags
    description: ""
    schema:
      type: array
  - name: sdist
    description: ""
    schema:
      type: object
      properties:
        sha: {
          type: string
        }
        path: {
          type: string
        }

We could have:

params:
  - name: someURL
    description: ""
    type: string
  - name: flags
    description: ""
    type: array
  - name: sdist
    description: ""
    type: dict # use dict intead of object
    keys: # use keys instead of properties
      sha: {
        description: "" # add description section for each key
      }
      path: { }
  optional: [ ] # add "optional" and have required by default

But the big differences are cosmetic (dict vs object, keys vs properties), and with the current proposal we could reasonably add our own description field into the existing property dictionaries (and description is already supported by Open API schema!.

Alternative #3: Create our own syntax just for dictionaries

We could make our own syntax, specific to dictionaries, however if we later add more JSON Schema support we'd need to revisit this. For example:

 results:
   - name: sdist
     type: dict
     keys: [ 'sha', 'path' ]

This is clean and clear when we're talking about a dictionary of just strings, but what if we want to allow nesting, e.g. array or dict values?

More alternatives

  1. Add support for nested types right away
  2. Add complete JSON Schema support
  3. Use something other than JSON Schema (or our own syntax proposed in Alternative #1) for expressing the object/dictionary structure It feels at this point like it's better to adopt a syntax that already handles these cases. (See why JSON Schema)
  4. Reduce the scope, e.g.:
    • Don't add support for grabbing individual items in a dictionary
      • We need this eventually for the future to be useful
    • Don't let dictionaries declare their keys (e.g. avoid bringing in JSON Schema)
      • Without a programmatic way to know what keys a dictionary needs/provides, we actually take a step backward in clarity, i.e. a list of strings has more information than a dictionary where you don't know the keys

Upgrade & Migration Strategy

As mentioned in TEP-0076 Array results and indexing this update will be completely backwards compatible.

Potential future work

Support for "no schema"

To make authoring time easier, we could support users declaring that a param or result is of type object, but not require them to specify anything further, including the keys they expect the object to support. For example inside of a Task you could do something like this:

spec:
  params:
    - name: grab-bag
      type: object
  steps:
    - name: grab-specific-key-from-bag
      image: ubuntu
      script: |
        echo $(params.grab-bag.magic-key)"
    ...
  results:
    - name: buncha-stuff
      type: object

The use of $(params.grab-bag.magic-key) would imply that the grab-bag param is expected to have a key called magic-key but this would not need to be specified in the schema.

The result buncha-stuff could include any keys (no keys would also be valid); no schema would be enforced at runtime.

Increased schema support

Once Task and Pipeline authors are able to define object schemas for Tasks and Params, it would be very useful to:

  1. Allow definition reuse instead of having to copy and paste them around
  2. If we ship a known set of these with the Tekton Pipelines controller, they could define interfaces that tools such as Tekton Chains could rely on and build around.

We could add a way to for users to define these, for example a new CRD such as:

apiversion: tekton.dev/v1
kind: ParamSchema
metadata:
  name: GitRepo
spec:
  type: object # see question below
  properties:
    url: {
      type: string # for now, all values are strings, so we could imply it
    }
    path: { } # example of implying type: string

(Or support something like the above via a ConfigMap.)

Which could be used in Tasks and Pipelines:

params:
  - name: pull_remote
    schemaRef: GitRepo

We could also pursue supporting JSON Schema refs.

(Thanks to @wlynch for the suggestion and above example!)

Implementation Pull request(s)

TBD.

References