Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting DAG runs through the UI causes incomplete deletion of DAG run details. This affects tasks logging capabilities #15818

Closed
darthale opened this issue May 13, 2021 · 5 comments
Labels
area:webserver Webserver related Issues kind:bug This is a clearly a bug

Comments

@darthale
Copy link

Apache Airflow version: 2.0.2

Kubernetes version:

Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Environment: bitnami/airflow: 2.0.2

  • Cloud provider or hardware configuration: AWS, EKS K8 cluster
  • OS : Debian GNU/Linux 10 (buster)
  • Kernel : Linux airflow-scheduler-64d8c676ff-h5zkk 4.14.209-160.339.amzn2.x86_64 Improving the search functionality in the graph view #1 SMP Wed Dec 16 22:44:04 UTC 2020 x86_64 GNU/Linux
  • Install tools: helm 3.2.4
  • Others: KubernetesExecutor

What happened:

When deleting a DAG run through the Web UI, the entries in the table task_instance (Airflow DB) don't get deleted.

I have noticed this whilst testing the logging to S3. When the DAG runs the first time, the logs are generated and stored to S3 (and Airflow UI) successfully.
When the DAG run is deleted and the DAG runs again, the logs are not generated.

I did some debugging and I have noticed that deleting the DAG run through the UI doesn't delete the entries for the tasks in the task_instance table. Deleting those entries manually, before the DAG runs again, has fixed the logging problem (logs are written again to S3 and UI).

What you expected to happen:

Deleting a DAG run through the UI should re-trigger the DAG run and the logs for the tasks in that run should be written to the destination that has been set up.

How to reproduce it:

  1. Run a DAG once
  2. Check that the logs have been written
  3. Delete the DAG run
  4. Check that no log has been written
@darthale darthale added the kind:bug This is a clearly a bug label May 13, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented May 13, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@bbovenzi bbovenzi added the area:webserver Webserver related Issues label May 13, 2021
@kaxil
Copy link
Member

kaxil commented May 13, 2021

If you only want to delete the DagRun to re-run it -- "Clear" it instead

@darthale
Copy link
Author

Thanks for the hint @kaxil . Can you just confirm the behaviour above is the intended way of working though?

Anyway, whilst clearing the DagRun as you suggested, I'm facing this: #14265.

I'll remove the timeout for my DAGs for the time being.

@kaxil
Copy link
Member

kaxil commented May 15, 2021

@darthale It is indeed the intended behaviour i.e. Deleting a DagRun does not delete TaskInstances

@kaxil kaxil closed this as completed May 15, 2021
@darthale
Copy link
Author

Cool, thanks @kaxil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:webserver Webserver related Issues kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

3 participants