Tasks do not start and vanishes #46940

gudata · 2025-02-20T12:23:56Z

gudata
Feb 20, 2025

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

v2.9.1

What happened?

I have 20 ec2 worker instances. On each ec2 in a docker I run worker, and the trigger containers.

I am starting 100 dags at the same time. - at 5:00utc.

Very often I see some tasks to be scheduled and then killed/removed/do not started.

Something is failing but i can't see why. At the UI there is a attempt to run _execute_in_fork which fails/timeouts.

I checked on the workers - there is enough memory and cpu, disk space, file descriptors, no limit on the processes, no max forks are reached.

Any ideas what might be?

On the server/scheduler:

INFO - Setting external_id for <TaskInstance: CLIENTNAME_dag.confirm_ready_for_transformations scheduled__2025-02-20T05:00:00+00:00 [queued]> to b55db308-c50b-465b-a91e-789676b4b450
2025-02-20T05:02:40.882885216Z [2025-02-20T05:02:40.882+0000] {task_context_logger.py:91} ERROR - Executor reports task instance <TaskInstance: CLIENTNAME_dag.confirm_ready_for_transformations scheduled__2025-02-20T05:00:00+00:00 [queued]> finished (failed) although the task says it's queued. (Info: None) Was the task killed externally?

on the worker in the logs of worker process (in a docker container)

2025-02-20T05:02:36.620231955Z [2025-02-20 05:02:36,618: ERROR/ForkPoolWorker-16] Process timed out, PID: 1576563
2025-02-20T05:02:36.688952182Z [2025-02-20 05:02:36,687: ERROR/ForkPoolWorker-16] Task airflow.providers.celery.executors.celery_executor_utils.execute_command[b55db308-c50b-465b-a91e-789676b4b450] raised unexpected: AirflowException('Celery command failed on host: 54c1553f81f2 with celery_task_id b55db308-c50b-465b-a91e-789676b4b450')
2025-02-20T05:02:36.688999873Z Traceback (most recent call last):
2025-02-20T05:02:36.689007554Z   File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 453, in trace_task
2025-02-20T05:02:36.689014304Z     R = retval = fun(*args, **kwargs)
2025-02-20T05:02:36.689020164Z                  ^^^^^^^^^^^^^^^^^^^^
2025-02-20T05:02:36.689025754Z   File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 736, in __protected_call__
2025-02-20T05:02:36.689031574Z     return self.run(*args, **kwargs)
2025-02-20T05:02:36.689036854Z            ^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-20T05:02:36.689042145Z   File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 136, in execute_command
2025-02-20T05:02:36.689048015Z     _execute_in_fork(command_to_exec, celery_task_id)
2025-02-20T05:02:36.689053345Z   File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 151, in _execute_in_fork
2025-02-20T05:02:36.689059185Z     raise AirflowException(msg)
2025-02-20T05:02:36.689064465Z airflow.exceptions.AirflowException: Celery command failed on host: 54c1553f81f2 with celery_task_id b55db308-c50b-465b-a91e-789676b4b450
2025-02-20T05:02:43.076153425Z [2025-02-20 05:02:43,075: ERROR/ForkPoolWorker-15] Process timed out, PID: 1576565

And in celery in the UI

AirflowException('Celery command failed on host: 54c1553f81f2 with celery_task_id b55db308-c50b-465b-a91e-789676b4b450')
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 453, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/celery/app/trace.py", line 736, in __protected_call__
    return self.run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 136, in execute_command
    _execute_in_fork(command_to_exec, celery_task_id)
  File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/celery/executors/celery_executor_utils.py", line 151, in _execute_in_fork
    raise AirflowException(msg)
airflow.exceptions.AirflowException: Celery command failed on host: 54c1553f81f2 with celery_task_id b55db308-c50b-465b-a91e-789676b4b450

What you think should happen instead?

No response

How to reproduce

No Idea. It hapens randomly. I tried increased the workers to see if there some limit hitting but still I have the rror.

Operating System

linux

Versions of Apache Airflow Providers

the docker version

Deployment

Other Docker-based deployment

Deployment details

a master node on a ec2 with a lot of workers on a ec2.
airflow on the master is running in a container.
workers run airflow also in containers.

Anything else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

potiuk · 2025-02-20T20:31:12Z

potiuk
Feb 20, 2025
Collaborator

Converted to a discussion. Likely your python code executed in the task crashes (maybe due to a bug in C library). And details need to be looked at in the logs of celery worker in this case, not task - because if your python code crashes in low-leve c code, there is no way to send information back from worker.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tasks do not start and vanishes #46940

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Tasks do not start and vanishes #46940

gudata Feb 20, 2025

Apache Airflow version

If "Other Airflow 2 version" selected, which one?

What happened?

What you think should happen instead?

How to reproduce

Operating System

Versions of Apache Airflow Providers

Deployment

Deployment details

Anything else?

Are you willing to submit PR?

Code of Conduct

Replies: 1 comment

potiuk Feb 20, 2025 Collaborator

gudata
Feb 20, 2025

potiuk
Feb 20, 2025
Collaborator