Replies: 1 comment
-
Converted to a discussion. Likely your python code executed in the task crashes (maybe due to a bug in C library). And details need to be looked at in the logs of celery worker in this case, not task - because if your python code crashes in low-leve c code, there is no way to send information back from worker. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
v2.9.1
What happened?
I have 20 ec2 worker instances. On each ec2 in a docker I run worker, and the trigger containers.
I am starting 100 dags at the same time. - at 5:00utc.
Very often I see some tasks to be scheduled and then killed/removed/do not started.
Something is failing but i can't see why. At the UI there is a attempt to run _execute_in_fork which fails/timeouts.
I checked on the workers - there is enough memory and cpu, disk space, file descriptors, no limit on the processes, no max forks are reached.
Any ideas what might be?
On the server/scheduler:
on the worker in the logs of worker process (in a docker container)
And in celery in the UI
What you think should happen instead?
No response
How to reproduce
No Idea. It hapens randomly. I tried increased the workers to see if there some limit hitting but still I have the rror.
Operating System
linux
Versions of Apache Airflow Providers
the docker version
Deployment
Other Docker-based deployment
Deployment details
a master node on a ec2 with a lot of workers on a ec2.
airflow on the master is running in a container.
workers run airflow also in containers.
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions