-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
airflow logging to S3 #34
Comments
@adib-next are you following the steps under the |
Yes, I have added the configuration below and the logs are being written to S3. But it is not stable and we get sometimes errors like the screenshot I shared where the logs file does not exist. any idea why this is happening? airflow: scheduler: web: workers: |
@adib-next I think it could be that if the worker pod crashes before uploading the logs to S3, they will be lost (as they are only stored on the temp disk of the pod until they are uploaded) |
@thesuperzapper any recommendation on how to prevent this issue/ minimize it? |
@adib-next, I might be encountering the same problem you are these days -- I'm speculating that, if you do this in your
You'll end up seeing your pod with an From my experience so far, I'm speculating that I'm not running the task instance pod (if you're rolling with KubernetesExecutor) is falling into this really unfortunate trap found here. The But ultimately, your task instance pods don't run because the security context for the task instance pod isn't set to be able to read the web identity token file (more on that here, via AWS's documentation). Right now, I'm trying to figure out a way to give every pod spawned by the Kubernetes Executor a security context in order to actually read that token and be able to assume the correct IAM role (in your and my case, for logging). It might be the way of @adib-next, did you end up having any luck in the time since? |
Ended up getting it to work. For Kubernetes 1.10.14 and lower, setting this for
Results in the fs_group being set for the containers in the worker pods. Not sure if that's a viable solution for you, but it unblocked me in actually getting the pods to run. Documentation here for 1.10.14: https://airflow.apache.org/docs/apache-airflow/1.10.14/configurations-ref.html#fs-group |
Can @adib-next or @johngtam clarify if this is still an issue after version |
@adib-next or @johngtam any updates? |
@thesuperzapper, I haven't given this a shot yet, but hope to do so soon! Really excited about version 8 of this Helm chart :) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@thesuperzapper, ended up upgrading to both Airflow 2 and the version 8 chart here. Unfortunately, this ended up breaking logging :/. Looking at the logs:
I see that the conn_id 'aws_default' isn't defined. I'm using the KubernetesExecutor. However, in my helm chart, I have the values (with some templating):
Do you know why I might be getting this error with |
D'oh, all I had to do was define aws_default in the connections. I'm able to get the logs. |
@johngtam, good to hear, for those finding this, please look at the docs for creating connections with |
@johngtam how did you configure the connection, because when using eks iam service accounts, we don't want to configure aws_access_key_id and aws_secret_access_key. |
@Legion2 the "How to persist airflow logs?" FAQ explains how to use EKS - IAM Roles for Service Accounts for authorization. |
Yes, I used ISRA (IAM Roles for Service Accounts)! That worked out nicely for me. |
@thesuperzapper the faq does not explain how to configure the |
@Legion2 thanks for highlighting that the docs could be clearer! I have updated the following docs to make it easier to understand: Please tell me if you think there is still room for improvement. |
@thesuperzapper thanks that documentation is much better |
We have been encountering some issues with the logging to S3. Seems like in some cases where the dag fails the log is missing as well and we get the following error. Has anyone come across this issue? any suggestions?
The text was updated successfully, but these errors were encountered: