This document describes how to create CI jobs for Openshift components using ci-operator and is intended for component developers who want to add tests to their CI process.
To begin setting up a CI jobs for a new repository, run make new-repo
.
After editing the files under this directory, make sure to run the generator to ensure that your changes are compliant to our conventions and pass the CI tests that will run when you submit your changes as a PR:
make jobs
Pre-submit tests on this repository will ensure that a run of the latest generator does not error on proposed configuration changes and also does not generate any new configuration.
Under this directory, we have three main directories:
config/$org/$repo/$org-$repo-$branch.yaml
Contains your ci-operator definition which describes how the images and tests in your repo works. These files are copied into config maps on the CI cluster and referenced by Prow jobs. If you are building branches within a fork of a repo in another organization, $repo should point to the fork that holds the branch (for example github.com/openshift/kubernetes-metrics-server instead of k8s.io/metrics-server).jobs/$org/$repo/$org-$repo-$branch-(presubmit|postsubmit|periodic).yaml
Contains Prow job definitions for each repository that are run on PRs, on merges, or periodically. When we branch jobs, we will copy the current master jobs into a release branch specific job. Each prow job calls into the appropriate subset of the tests defined in your ci-operator config and passes in the secrets and infrastructure info specific to our CI environmenttemplates/*.yaml
These templates are used for more complicated jobs that don't run in a single pod. The templates are referenced by the Prow jobs and are instantiated by the ci-operator using parameters generated by the build (references to images usually).
This section describes how to configure end-to-end tests using ci-operator. In this context, "end-to-end" means the functionality of the application is being tested on top of a Kubernetes cluster from an end-user perspective.
The preferred way to write this type of tests is using ci-operator
. See the
documentation for details on how to download, build, and execute it:
https://github.com/openshift/ci-tools.git
ci-operator requires a configuration file for the repository being tested.
These files are located in the config
directory. The ci-operator
repository has documentation for adding a new configuration file in case one
doesn't already exist. These files take care of most of the CI process:
downloading the source, building binaries, building RPMs, creating images,
executing unit tests, etc., and can be built upon for e2e tests with little or
no modification.
To add an e2e test:
- Determine the pre-requisites for the test. Practically, this means choosing
the ci-operator template according to the type of test. There are already
files in the
templates
directory for the most common cases, see Using a template below. - Determine and configure the template's inputs. This is specific to each template and should be documented in its parameters. The e2e test might need minor modifications to fit the environment created by the template.
- Add one or more Prow jobs to Prow's configuration file with the information gathered in the previous steps.
Contrary to other types of tests, e2e tests usually require a cluster, not just
a single container. While there aren't yet native primitives in ci-operator
for cluster provisioning, it provides one open-ended feature that can be
leveraged to accomplish that: template steps.
Template steps allow the creation of arbitrary objects in the cluster where the CI pipeline is executed. This is used to start a pod that will then provision a separate cluster for the tests. This directory already contains a few templates that can be used either directly or (rarely necessary, in practice) as a reference:
cluster-launch-e2e.yaml
: launches a cluster in GCP using openshift-ansible and runs Origin e2e tests on it, parameterized by test focus.cluster-launch-src.yaml
: launches a cluster in GCP using openshift-ansible and runs a script from the repository being tested with the resulting$KUBECONFIG
, parameterized by test script.cluster-launch-installer-e2e.yaml
: same ascluster-launch-e2e.yaml
, but usesopenshift-installer
instead ofopenshift-ansible
.cluster-launch-installer-src.yaml
: same ascluster-launch-src.yaml
, but usesopenshift-installer
instead ofopenshift-ansible
.master-sidecar-4.2.yaml
: spins up a simple openshift control plane as a sidecar and waits for theCOMMAND
specified to the template to be executed, before itself exiting. The test container is given access to the generated configuration and theadmin.kubeconfig
.
To access the cluster, the test should use the standard configuration loading rules, which are described in the upstream documentation:
https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api
The preferred way to add a test that uses a template is to add it to the
ci-operator
configuration file and use the configuration generator to
generate the job.
The list of supported test types can be found in the
configuration documentation.
The process for adding jobs manually is significantly more complex. For templates that are expected to be used by many jobs, it may be easier to add support for automatic generation. The example job in the next section can be used as a reference for jobs that are not in that category.
This section covers the process of creating a new template when none of the
existing ones provide the workflow required for a particular type of test — e.g.
when a new installer needs to be supported. It supplements the ci-operator
template documentation.
From the perspective of an end-to-end test, a ci-operator
template is simply
a way to create one or more pods and auxiliary objects to setup and clean up
the environment and execute the test.
While users should deal mostly with the ci-operator
configuration file and
generate Prow jobs automatically from it, the structure of the Prow jobs has to
be taken into consideration when writing a template.
For example, the following snippet from the configuration file of repo openshift/origin
is used to generate the presubmit job pull-ci-openshift-origin-master-e2e-conformance-k8s
which uses the template cluster-launch-installer-src.yaml
:
- as: e2e-conformance-k8s
commands: test/extended/conformance-k8s.sh
openshift_installer_src:
cluster_profile: aws
The CI process begins when a webhook from Github triggers the creation of one or more Prow jobs. For a complete description of Prow jobs, see the upstream documentation.
# ci-operator/jobs/openshift/origin/openshift-origin-master-presubmits.yaml
presubmits:
openshift/origin:
# …
- agent: kubernetes
always_run: true
# Each branch needs its own ci-operator configuration file.
branches:
- master
context: ci/prow/e2e-conformance-k8s
decorate: true
# The name should follow the format used for auto-generated jobs:
# pull-ci-$org-$repo-$branch-$name or branch-ci-$org-$repo-$branch-$name.
# "e2e-conformance-k8s" is used as a unique identifier for for this job
# thoughout the job definition (e.g. in `context` above).
name: pull-ci-openshift-origin-master-e2e-conformance-k8s
rerun_command: /test e2e-conformance-k8s
# ci-operator doesn't require the source code of the repository, it will
# be cloned in a separate container.
skip_cloning: true
spec:
containers:
# The names passed to `--secret-dir`, `--target`, and `--template` are
# important and should follow the format presented here.
- args:
- --artifact-dir=$(ARTIFACTS)
- --give-pr-author-access-to-namespace=true
# `--secret-dir` references a directory that is volume-mounted in the
# container by combining secrets and configmaps from the cluster. This
# is one way of passing extra configuration as input to the template.
- --secret-dir=/usr/local/e2e-conformance-k8s-cluster-profile
- --target=e2e-conformance-k8s
# The template is stored in a configmap in the cluster and
# volume-mounted in the container.
- --template=/usr/local/e2e-conformance-k8s
command:
- ci-operator
# Other than CONFIG_SPEC, these are specific to the template being
# used.
env:
- name: CLUSTER_TYPE
value: gcp
# The ci-operator configuration stored in a configmap in the cluster.
- name: CONFIG_SPEC
valueFrom:
configMapKeyRef:
key: openshift-origin-master.yaml
name: ci-operator-master-configs
- name: JOB_NAME_SAFE
value: e2e-conformance-k8s
- name: TEST_COMMAND
value: test/extended/conformance-k8s.sh
image: ci-operator:latest
imagePullPolicy: Always
name: ""
resources:
requests:
cpu: 10m
volumeMounts:
- mountPath: /usr/local/e2e-conformance-k8s-cluster-profile
name: cluster-profile
- mountPath: /usr/local/e2e-conformance-k8s
name: job-definition
subPath: cluster-launch-src.yaml
serviceAccountName: ci-operator
# Specific to the template being used. Combine a secret and a configmap
# into a directory that will be copied to the namespace created by
# ci-operator using the `--secret-dir` option.
volumes:
- name: cluster-profile
projected:
sources:
- secret:
name: cluster-secrets-gcp
- configMap:
name: cluster-profile-gcp
# The template stored in a configmap in the cluster.
- configMap:
name: prow-job-cluster-launch-src
name: job-definition
trigger: ((?m)^/test( all| e2e-conformance-k8s),?(\s+|$))
The Secret
s and ConfigMap
s referenced by the job reside in the ci
namespace. cluster-profile-*
are ConfigMap
s that contain the cluster
profiles in this repository. cluster-secrets-*
are
Secret
s that contain credentials to provision and access clusters in a
specific cloud provider (the contents can be seen in the
script that populates them.
When instantiating the template, data about the pipeline is provided as parameters. The location of images and RPMs from both the release and the CI pipeline is available this way. Extra parameters can be provided via environment variables, which will have to be set by the Prow job.
External access to the images that were built in the test namespace is required by most end-to-end tests, so templates often create this role binding:
- kind: RoleBinding
apiVersion: authorization.openshift.io/v1
metadata:
name: ${JOB_NAME_SAFE}-image-puller
namespace: ${NAMESPACE}
roleRef:
name: system:image-puller
subjects:
- kind: SystemGroup
name: system:unauthenticated
The template can reference objects from any namespace, but Kubernetes requires
them to be in the same namespace to be used as volume mounts. As described in
the section above, the Prow job definition and ci-operator
's --secret-dir
can be used to combine objects into a volume mount and make them available in
the test namespace.
The outputs of a template test are:
- Success/failure status, determined from the test pod.
- The pod's
stdout
andstderr
, reflected in ci-operator's output in case of failure. - Artifacts.
These are described in more detail in the
ci-operator
documentation.
With the template file ready, the steps required to add it to the repository and make it available for CI jobs are:
- Create the yaml file in the
templates/
directory. - Add the files to the
config-updater
section of Prow's configuration file to ensure they are added to aConfigMap
in the CI cluster. - Optional: add a test type to
ci-operator
to enable automatic generation of jobs that use this template. - Add necessary secrets (if any) to the deployment configuration in this repository and apply it to the cluster.
Because the configuration updater configuration has to be updated before a PR with the files is merged, those changes have to be merged previously in a separate PR.
A job that uses a template can be tested in two different ways. The easiest is
to just create a pull request, which will then trigger a run of all added or
changed jobs.
For complete control of the execution, the more manual process of assembling
the ci-operator
call can be used.
ci-operator
tests can be executed locally with little effort, but setting up
the dependencies for template tests is more involved. The typical end-to-end
test requires:
- A kubeconfig pointing to a cluster with external access.
- The
ci-operator
configuration file. - The template file.
- The secrets required for the
--secret-dir
option, if applicable. - The environment variables required, if applicable.
The simplest way to get started is to create a personal namespace in the
CI cluster. Substitute mynamespace
below with
the name of that namespace.
The secrets and environment variables are very specific to the template in use,
but the e2e-conformance-k8s
can be used as a general example. The template it
uses (cluster-launch-src
) requires two parameters (the others are all
provided by ci-operator
): CLUSTER_TYPE
determines the cloud provider used
to provision the cluster, and TEST_COMMAND
is the command that executes the
test.
This template also requires a secret containing the cluster profile and
credentials. In the CI cluster, it is created using a volume that combines the
projection of a Secret
and a ConfigMap
. Locally, it has to be assembled
into a directory manually. How these objects are composed is described in the
"writing a template" section above. One final note:
because the name of the secret is determined by the argument passed to
--secret-dir
, the directory has to be named in a way that reflects the secret
name expected by the template.
Putting this all together, to execute the e2e-conformance-k8s
test the
following command can be used:
name=mytestname
CLUSTER_TYPE=gcp
mkdir artifacts/ "$name-cluster-profile"/
ln -s "$PWD/cluster/test-deploy/$CLUSTER_TYPE/"* "$name-cluster-profile"/
# populate the following files in the $name-cluster-profile directory:
# - gce.json
# - ops-mirror.pem
# - ssh-privatekey
# - ssh-publickey
# - telemeter-token
export CLUSTER_TYPE JOB_NAME_SAFE=$name TEST_COMMAND=test/extended/conformance-k8s.sh
ci-operator \
--artifact-dir artifacts/ \
--config ci-operator/config/openshift/origin/openshift-origin-master.yaml \
--git-ref openshift/origin@master \
--template ci-operator/templates/cluster-launch-src.yaml \
--target cluster-launch-src \
--secret-dir "$name-cluster-profile/" \
--namespace mynamespace
If test volume for a given platform exceeds the Boskos lease capacity, jobs-failing-with-lease-acquire-timeout
will fire.
Presubmit jobs may be rebalanced to move platform-agnostic jobs to platforms with available capacity.
Component teams may mark their presubmit jobs as platform-agnostic by configuring as
names which exclude the platform slug (e.g. aws
), whose absence is used as a marker of "this test is platform-agnostic".
For example, see release#10152.
To locate platform-specific jobs which might be good candidates for moving to the platform-agnostic pool, you can use:
$ hack/step-jobs-by-platform.py
workflows which need alternative platforms to support balancing:
baremetalds-e2e
ipi-aws
ipi-aws-ovn-hybrid
openshift-e2e-aws-csi
...
count platform status alternatives job
39 gcp balanceable aws,azure,vsphere pull-ci-openshift-cluster-version-operator-master-e2e
26 aws unknown azure,gcp,vsphere pull-ci-openshift-sriov-dp-admission-controller-master-e2e-aws
15 aws unknown azure,gcp,vsphere pull-ci-openshift-cluster-authentication-operator-master-e2e-aws
10 aws balanceable azure,vsphere pull-ci-openshift-machine-config-operator-master-e2e-ovn-step-registry
9 aws unknown gcp pull-ci-openshift-cluster-samples-operator-release-4.1-e2e-aws-image-ecosystem
...
Occasionally we hit install errors like:
Error launching source instance: InsufficientInstanceCapacity: We currently do not have sufficient m5.xlarge capacity in the Availability Zone you requested (us-east-1b). Our system will be working on provisioning additional capacity. You can currently get m5.xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1a, us-east-1c, us-east-1d, us-east-1f
Or AWS will have issues in a particular region, resulting in:
[DEBUG] plugin.terraform-provider-aws: 2019/03/22 18:02:51 [DEBUG] [aws-sdk-go] DEBUG: Response ec2/RunInstances Details:"
[DEBUG] plugin.terraform-provider-aws: ---[ RESPONSE ]--------------------------------------"
[DEBUG] plugin.terraform-provider-aws: HTTP/1.1 500 Internal Server Error"
or similar. To keep CI going during these events, we can reconfigure to push CI load away from impacted regions and zones. Focusing on step-registry consumers, you could avoid us-east-1b by changing:
ipi-conf-aws-commands.sh
to set explicitzone_1
andzone_2
for a particularaws_region
. You can also drop an entry entirely, although you will need to update$((RANDOM % 4))
to match the number of defined entries. If your changes affect concurrent AWS capacity (e.g. because you removed a high-volume region), you may also need to adjust the Boskos lease capacity to avoid overloading the remaining capacity.ipi-conf-aws-sharednetwork-commands.sh
to drop affected regions. All of the same caveats from the previousipi-conf-aws-commands.sh
entry apply here too, although it is harder to shift zones within a region without creating completely new shared subnets. If you do need to create new shared subnets, the procedure is covered here.- There are currently no steps excercising the user-provided flow on AWS, so no pointers about what to adjust there.
- There are legacy templates which could be updated to pivot regions and zones, but they should have few consumers and leaving them impacted would help motivate the remaining consumers to move to the step registry.