Skip to content

Latest commit

 

History

History
484 lines (337 loc) · 20.9 KB

upgrading.md

File metadata and controls

484 lines (337 loc) · 20.9 KB

Upgrading Guide

Breaking changes typically (sometimes we don't realise they are breaking) have "!" in the commit message, as per the conventional commits.

Upgrading to v3.6

See also the list of new features in 3.6.

Deprecations

The following features are deprecated and will be removed in a future verison of Argo Workflows:

  • The Python SDK is deprecated, we recommend migrating to Hera
  • schedule in CronWorkflows, podPriority, mutex and semaphore in Workflows and WorkflowTemplates.

For more information on how to migrate these see deprecations

Fixed Server --basehref inconsistency

For consistency, the Server now uses --base-href and ARGO_BASE_HREF. Previously it was --basehref (no dash in between) and ARGO_BASEHREF (no underscore in between).

Removed redundant Server environment variables

ALLOWED_LINK_PROTOCOL and BASE_HREF have been removed as redundant. Use ARGO_ALLOWED_LINK_PROTOCOL and ARGO_BASE_HREF instead.

Legacy insecure pod patch fallback removed. (#13100)

For the Emissary executor to work properly, you must set up RBAC. See workflow RBAC

Archived Workflows on PostgreSQL

To improve performance, this upgrade will automatically transform the column used to store archived workflows from type json to type jsonb on controller start-up. This requires PostgreSQL version 9.4 or higher.

The migration involves obtaining an ACCESS EXCLUSIVE lock on the argo_archived_wokflows table, which blocks all reads and writes until it has finished. For the vast majority of users, we anticipate this will take less than a minute, but it could take much longer if you have a large number of workflows (100,000+), or the average workflow size is high (100KB+). If you don't fall into one of those two categories, or if minimizing downtime isn't important to you, then you don't need to read any further. Otherwise, you have a few options to keep downtime to a minimum:

  1. If you don't actually need the archived workflows anymore, simply delete them with delete from argo_archived_workflows and the migration will complete almost instantly.

  2. Using a variation of Altering a Postgres Column with Minimal Downtime, it's possible to manually perform this migration with nearly no downtime. This is a two-step process;

    1. Before the upgrade, run the following queries to create a temporary workflowjsonb column and populate it with the existing data. This is safe to do whilst running version 3.5 because the column types are compatible.

      -- Add temporary workflowjsonb column
      ALTER TABLE argo_archived_workflows ADD COLUMN workflowjsonb JSONB NULL;
      
      -- Add trigger to update workflowjsonb for each insert
      CREATE OR REPLACE FUNCTION update_workflow_jsonb() RETURNS TRIGGER AS $BODY$
      BEGIN
          NEW.workflowjsonb=NEW.workflow;
          RETURN NEW;
      END
      $BODY$ LANGUAGE PLPGSQL;
      
      CREATE TRIGGER argo_archived_workflows_update_workflow_jsonb
      BEFORE INSERT ON argo_archived_workflows
      FOR EACH ROW EXECUTE PROCEDURE update_workflow_jsonb();
      
      -- Backfill existing rows
      UPDATE argo_archived_workflows SET workflowjsonb = workflow WHERE workflowjsonb IS NULL;
    2. Once the above has completed and you're ready to proceed with the upgrade, run the following queries before starting the controller:

      BEGIN;
      LOCK TABLE argo_archived_workflows IN SHARE ROW EXCLUSIVE MODE;
      DROP TRIGGER argo_archived_workflows_update_workflow_jsonb ON argo_archived_workflows;
      ALTER TABLE argo_archived_workflows DROP COLUMN workflow;
      ALTER TABLE argo_archived_workflows RENAME COLUMN workflowjsonb TO workflow;
      ALTER TABLE argo_archived_workflows ADD CONSTRAINT workflow CHECK (workflow IS NOT NULL) NOT VALID;
      COMMIT;
  3. Version 3.6 retains compatibility with workflows stored as type json. Therefore, it's currently safe to skip the migration by setting skipMigration: true. This should only be used as an emergency stop-gap, as future versions may drop support for json without notice.

Metrics changes

You can now retrieve metrics using the OpenTelemetry Protocol using the OpenTelemetry collector, and this is the recommended mechanism.

These notes explain the differences in using the Prometheus /metrics endpoint to scrape metrics for a minimal effort upgrade. It is not recommended you follow this guide blindly, the new metrics have been introduced because they add value, and so they should be worth collecting and using.

New metrics

The following are new metrics:

  • cronworkflows_concurrencypolicy_triggered
  • cronworkflows_triggered_total
  • deprecated_feature
  • is_leader
  • k8s_request_duration
  • pod_pending_count
  • pods_total_count
  • queue_duration
  • queue_longest_running
  • queue_retries
  • queue_unfinished_work
  • total_count
  • version
  • workflowtemplate_runtime
  • workflowtemplate_triggered_total

and can be disabled with

metricsConfig: |
  modifiers:
    build_info:
      disable: true
...

Renamed metrics

If you are using these metrics in your recording rules, dashboards, or alerts, you will need to update their names after the upgrade:

Old name New name
argo_workflows_count argo_workflows_gauge
argo_workflows_pods_count argo_workflows_pods_gauge
argo_workflows_queue_depth_count argo_workflows_queue_depth_gauge
log_messages argo_workflows_log_messages

Custom metrics

Custom metric names and labels must be valid Prometheus and OpenTelemetry names now. This prevents the use of :, which was usable in earlier versions of workflows

Custom metrics, as defined by a workflow, could be defined as one type (say counter) in one workflow, and then as a histogram of the same name in a different workflow. This would work in 3.5 if the first usage of the metric had reached TTL and been deleted. This will no-longer work in 3.6, and custom metrics may not be redefined. It doesn't really make sense to change a metric in this way, and the OpenTelemetry SDK prevents you from doing so.

metricsTTL for histogram metrics is not functional as opentelemetry doesn't allow deletion of metrics. This is faked via asynchronous meters for the other metric types.

TLS

The Prometheus /metrics endpoint now has TLS enabled by default. To disable this set metricsConfig.secure to false.

Removed Swagger UI

The Swagger UI has been removed from the /apidocs page. It has been replaced with a link to the Swagger UI in the versioned documentation and download links for the OpenAPI spec and JSON schema.

JSON templating fix

When returning a map or array in an expression, you would get a Golang representation. This now returns plain JSON.

Added container name to workflow node error messages

Workflow node error messages are now prefixed with the container name. If you are using Conditional Retries, you may need to adjust your usage of lastRetry.message expressions or the TRANSIENT_ERROR_PATTERN environment variable.

ARGO_TEMPLATE removed from main container

The environment variable ARGO_TEMPLATE which is an internal implementation detail is no longer available inside the main container of your workflow pods. This is documented here as we are aware that some users of Argo Workflows use this.

Upgrading to v3.5

There are no known breaking changes in this release. Please file an issue if you encounter any unexpected problems after upgrading.

Unified Workflows List API and UI

The Workflows List in the UI now shows Archived Workflows in the same page. As such, the previously separate Archived Workflows page in the UI has been removed.

The List API /api/v1/workflows also returns both types of Workflows now. This is not breaking as the Archived API still exists and was not removed, so this is an addition.

Upgrading to v3.4

Non-Emissary executors are removed. (#7829)

Emissary executor is now the only supported executor. If you are using other executors, e.g. docker, k8sapi, pns, and kubelet, you need to remove your containerRuntimeExecutors and containerRuntimeExecutor from your controller's configmap. If you have workflows that use different executors with the label workflows.argoproj.io/container-runtime-executor, this is no longer supported and will not be effective.

chore!: Remove dataflow pipelines from codebase. (#9071)

You are affected if you are using dataflow pipelines in the UI or via the /pipelines endpoint. We no longer support dataflow pipelines and all relevant code has been removed.

feat!: Add entrypoint lookup. Fixes #8344

Affected if:

  • Using the Emissary executor.
  • Used the args field for any entry in images.

This PR automatically looks up the command and entrypoint. The implementation for config look-up was incorrect (it allowed you to specify args but not entrypoint). args has been removed to correct the behaviour.

If you are incorrectly configured, the workflow controller will error on start-up.

Actions

You don't need to configure images that use v2 manifests anymore, such as argoproj/argosay:v2. You can remove them:

% docker manifest inspect argoproj/argosay:v2
# ...
"schemaVersion": 2,
# ...

For v1 manifests, such as docker/whalesay:latest:

% docker image inspect -f '{{.Config.Entrypoint}} {{.Config.Cmd}}' docker/whalesay:latest
[] [/bin/bash]
images:
  docker/whalesay:latest:
    cmd: [/bin/bash]

feat: Fail on invalid config. (#8295)

The workflow controller will error on start-up if incorrectly configured, rather than silently ignoring mis-configuration.

Failed to register watch for controller config map: error unmarshaling JSON: while decoding JSON: json: unknown field \"args\"

feat: add indexes for improve archived workflow performance. (#8860)

This PR adds indexes to archived workflow tables. This change may cause a long time to upgrade if the user has a large table.

feat: enhance artifact visualization (#8655)

For AWS users using S3: visualizing artifacts in the UI and downloading them now requires an additional "Action" to be configured in your S3 bucket policy: "ListBucket".

Upgrading to v3.3

662a7295b feat: Replace patch pod with create workflowtaskresult. Fixes #3961 (#8000)

The PR changes the permissions that can be used by a workflow to remove the pod patch permission.

See workflow RBAC and #8013.

06d4bf76f fix: Reduce agent permissions. Fixes #7986 (#7987)

The PR changes the permissions used by the agent to report back the outcome of HTTP template requests. The permission patch workflowtasksets/status replaces patch workflowtasksets, for example:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: agent
rules:
  - apiGroups:
      - argoproj.io
    resources:
      - workflowtasksets/status
    verbs:
      - patch

Workflows running during any upgrade should be give both permissions.

See #8013.

feat!: Remove deprecated config flags

This PR removes the following configmap items -

  • executorImage (use executor.image in configmap instead) e.g. Workflow controller configmap similar to the following one given below won't be valid anymore:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: workflow-controller-configmap
    data:
      ...
      executorImage: argoproj/argocli:latest
      ...

    From now and onwards, only provide the executor image in workflow controller as a command argument as shown below:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: workflow-controller-configmap
    data:
      ...
      executor: |
        image: argoproj/argocli:latest
      ...
  • executorImagePullPolicy (use executor.imagePullPolicy in configmap instead) e.g. Workflow controller configmap similar to the following one given below won't be valid anymore:

    data:
      ...
      executorImagePullPolicy: IfNotPresent
      ...

    Change it as shown below:

    data:
      ...
      executor: |
        imagePullPolicy: IfNotPresent
      ...
  • executorResources (use executor.resources in configmap instead) e.g. Workflow controller configmap similar to the following one given below won't be valid anymore:

    data:
      ...
      executorResources:
        requests:
          cpu: 0.1
          memory: 64Mi
        limits:
          cpu: 0.5
          memory: 512Mi
      ...

    Change it as shown below:

    data:
      ...
      executor: |
        resources:
          requests:
            cpu: 0.1
            memory: 64Mi
          limits:
            cpu: 0.5
            memory: 512Mi
      ...

fce82d572 feat: Remove pod workers (#7837)

This PR removes pod workers from the code, the pod informer directly writes into the workflow queue. As a result the --pod-workers flag has been removed.

93c11a24ff feat: Add TLS to Metrics and Telemetry servers (#7041)

This PR adds the ability to send metrics over TLS with a self-signed certificate. In v3.5 this will be enabled by default, so it is recommended that users enable this functionality now.

0758eab11 feat(server)!: Sync dispatch of webhook events by default

This is not expected to impact users.

Events dispatch in the Argo Server has been change from async to sync by default. This is so that errors are surfaced to the client, rather than only appearing as logs or Kubernetes events. It is possible that response times under load are too long for your client and you may prefer to revert this behaviour.

To revert this behaviour, restart Argo Server with ARGO_EVENT_ASYNC_DISPATCH=true. Make sure that asyncDispatch=true is logged.

bd49c6303 fix(artifact)!: default https to any URL missing a scheme. Fixes #6973

HTTPArtifact without a scheme will now defaults to https instead of http

user need to explicitly include a http prefix if they want to retrieve HTTPArtifact through http

chore!: Remove the hidden flag --verify from argo submit

The hidden flag --verify has been removed from argo submit. This is a internal testing flag we don't need anymore.

Upgrading to v3.2

e5b131a33 feat: Add template node to pod name. Fixes #1319 (#6712)

This add the template name to the pod name, to make it easier to understand which pod ran which step. This behaviour can be reverted by setting POD_NAMES=v1 on the workflow controller.

be63efe89 feat(executor)!: Change argoexec base image to alpine. Closes #5720 (#6006)

Changing from Debian to Alpine reduces the size of the argoexec image, resulting is faster starting workflow pods, and it also reduce the risk of security issues. There is not such thing as a free lunch. There maybe other behaviour changes we don't know of yet.

Some users found this change prevented workflow with very large parameters from running. See #7586

48d7ad3 chore: Remove onExit naming transition scaffolding code (#6297)

When upgrading from <v2.12 to >v3.2 workflows that are running at the time of the upgrade and have onExit steps may experience the onExit step running twice. This is only applicable for workflows that began running before a workflow-controller upgrade and are still running after the upgrade is complete. This is only applicable for upgrading from v2.12 or earlier directly to v3.2 or later. Even under these conditions, duplicate work may not be experienced.

Upgrading to v3.1

3fff791e4 build!: Automatically add manifests to v* tags (#5880)

The manifests in the repository on the tag will no longer contain the image tag, instead they will contain :latest.

  • You must not get your manifests from the Git repository, you must get them from the release notes.
  • You must not use the stable tag. This is defunct, and will be removed in v3.1.

ab361667a feat(controller) Emissary executor. (#4925)

The Emissary executor is not a breaking change per-se, but it is brand new so we would not recommend you use it by default yet. Instead, we recommend you test it out on some workflows using a workflow-controller-configmap configuration.

# Specifies the executor to use.
#
# You can use this to:
# * Tailor your executor based on your preference for security or performance.
# * Test out an executor without committing yourself to use it for every workflow.
#
# To find out which executor was actually use, see the `wait` container logs.
#
# The list is in order of precedence; the first matching executor is used.
# This has precedence over `containerRuntimeExecutor`.
containerRuntimeExecutors: |
  - name: emissary
    selector:
      matchLabels:
        workflows.argoproj.io/container-runtime-executor: emissary

be63efe89 feat(controller): Expression template tags. Resolves #4548 & #1293 (#5115)

This PR introduced a new expression syntax know as "expression tag template". A user has reported that this does not always play nicely with the when condition syntax (Goevaluate).

This can be resolved using a single quote in your when expression:

when: "'{{inputs.parameters.should-print}}' != '2021-01-01'"

Learn more

Upgrading to v3.0

defbd600e fix: Default ARGO_SECURE=true. Fixes #5607 (#5626)

The server now starts with TLS enabled by default if a key is available. The original behaviour can be configured with --secure=false.

If you have an ingress, you may need to add the appropriate annotations:(varies by ingress):

alb.ingress.kubernetes.io/backend-protocol: HTTPS
nginx.ingress.kubernetes.io/backend-protocol: HTTPS

01d310235 chore(server)!: Required authentication by default. Resolves #5206 (#5211)

To login to the user interface, you must provide a login token. The original behaviour can be configured with --auth-mode=server.

f31e0c6f9 chore!: Remove deprecated fields (#5035)

Some fields that were deprecated in early 2020 have been removed.

Field Action
template.template and template.templateRef The workflow spec must be changed to use steps or DAG, otherwise the workflow will error.
spec.ttlSecondsAfterFinished change to spec.ttlStrategy.secondsAfterCompletion, otherwise the workflow will not be garbage collected as expected.

To find impacted workflows:

kubectl get wf --all-namespaces -o yaml | grep templateRef
kubectl get wf --all-namespaces -o yaml | grep ttlSecondsAfterFinished

c8215f972 feat(controller)!: Key-only artifacts. Fixes #3184 (#4618)

This change is not breaking per-se, but many users do not appear to aware of artifact repository ref, so check your usage of that feature if you have problems.