-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Ephemeral Storage request is not respected #4965
Comments
Thank you for opening your first issue here! 🛠 |
🤖 Based on the changes made between Flyte 1.10.0 and Flyte 1.10.7, it seems like there has been a shift in how ephemeral storage is handled. Specifically, the term These changes could potentially be the reason why your ephemeral storage request is not being respected. However, without more information, it's hard to say for certain. To help troubleshoot this issue, could you please provide the following information:
This information will help us understand the context in which the task is being executed and why the ephemeral storage request might not be respected. In the meantime, you might want to consider downgrading to Flyte 1.10.0 if that version was working as expected for your use case. Please note that this is just a workaround and not a long-term solution. We'll continue to investigate this issue to find a proper fix. For more information about resource configuration in Flyte, you can refer to the Flyte documentation. Sources
|
It's fixed here. |
Hi @Future-Outlier, thanks for your quick response. I am addressing my response to the questions asked by the bot to you.
We already downgraded and since then everything works fine again. But is is sad, that we cannot update to a more recent version. Looking at the piece of code, from which the logs are generated, I wonder why the resource requests from our task are overwritten at all. That should not happen and I am concerned that it will still happen even with #4929 |
If you set the configuration with ephemeral storage no limit, did you restart the flyte cluster deployment? |
@Future-Outlier, I maybe do not understand your last statement correctly, but what we did was to limit the ephemeral storage use in a task decorator to 100 GB, then starting the workflow with pyflyte. In the end the limit of the Kubernetes container was set to only 20 MiB. It is not an option for us to fully lift the limit on the ephemeral storage. |
@robert-ulbrich-mercedes-benz, I mean that did you
|
Hi @Future-Outlier , it is hard for me to understand what your actual point is. At least for me it would make things easier if you could please provide a few more sentences about your intentions. We deployed Flyte using the Flyte Core Helm Chart, so we do not have a flyte-sandbox deployment that I could restart. Which exact config map are your referring to? There are quite a lot of them for the Flyte Core helm deployment. I could find the
This is the config on Flyte v1.10.7 in our dev environment. The question again is: Why is our config provided in the task overwritten by the defaults? It should rather be the other way around in my eyes |
Hi, @robert-ulbrich-mercedes-benz sorry for the misunderstanding. |
Hi @Future-Outlier, okay, I will then request help in the Slack channel then |
@robert-ulbrich-mercedes-benz , I was not able to repro the issue using the example in the description, only if I used pod templates. For example, let's say a task
If the configured default ephemeral storage is set to any value, then that's what's used (since that's what passes the validation). |
Hi @eapolinario, we also have default pod templates configured for our workflows. But those pod templates do not configure default ephemeral storage. But as mentioned in this ticket, there are task resource defaults. As mentioned we can easily reproduce the issue. Best regards Rob Ulbrich |
@robert-ulbrich-mercedes-benz , just to be clear, we have an outstanding bug in Flyte that essentially does not validate ephemeral storage values in pod templates. The moment we added a default value for ephemeral storage as a task resource default, that value was the one used by the task resource validation, regardless of the original value defined in the pod template. Just to be crystal clear, let's say we have this task:
This is the values as shown in the registered task template: And task resource defaults are defined as such:
Notice that we're just using the (non-sensical) values defined in 1.10.7. Upon running that task
The values for This bug is being fixed by #5019. After that's released we'll see an error during registration if the values defined in the pod template do not pass task validation. FYI: we are planning a release, 1.11.0, for next week. |
@robert-ulbrich-mercedes-benz can you confirm this is fixed in your environment? |
Hey @davidmirror-ops, yes, we do not see this issue anymore. It is fixed in our environment. |
Describe the bug
When submitting a task with ephemeral storage request set (e.g. to 100Gi) this is taken to the task definition, also see first attached screenshot.
The Kubernetes pod created from this request will nevertheless only set a limit of 20Mi for the ephemeral storage: see second attached screenshot.
Expected behavior
The Kubernetes pod should have a limit on ephemeral storage equal to the requested amount of storage.
Additional context to reproduce
Screenshots
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: