-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce carbon aware scaler #3467
Comments
I'm leaning towards using the Green Web Foundation's Go SDK but open to thoughts |
I like this idea! A couple of questions:
|
It depends on the workload but some "secondary"/low-prio workloads can just be scaled down in a given geo if the impact on the environment is too high. This is not specifically a multi-cluster scenario.
That's up to the end-user; the triggers are specific to a ScaledObject and thus on a per-workload basis. So it's up to you to choose what makes sense and what does not. For example, workloads that require GPU could be scaled down while lesser consuming workloads can continue to run. |
@tomkerkhove Thanks for adding this and improving the design. So we can implement this scaler sooner and combine with more scalers once the OR support is in place.
Of course we'd love it if you can use the SDK! Just a heads up that we need to make some breaking changes in thegreenwebfoundation/grid-intensity-go#44 to be able to support more providers of carbon intensity data. I'm working on the changes and they should be done soon. So I hope they won't be disruptive. |
Correct!
Good to know, thanks for sharing! If you are contributing to the SDK; are you willing to contributing the scaler as well? |
Hi @rootfs, I've been working with @Ross7 on the grid intensity go SDK thing. I've tried to provide some more background to the answers
The above example works by moving workloads geographically (as in, it moves them through space). You can also move workloads temporally (as in move them through time). The carbon intensity changes based on the time of day, so the same workload run at different times will have different emissions figures. The issue referred to one paper titled Let's Wait Awhile: How Temporal Workload Shifting Can Reduce Carbon Emissions in the Cloud, and it's a fun read, going into this in more detail. Last month at the recent ACM SIGEnergy workshop, there was a talk from some folks at VMware sharing some new findings, called Breaking the Barriers of Stranded Energy through Multi-cloud and Federated Data Centers. It's really worth a watch but this quote from the abstract gives an idea of why the time element is worth being able to act upon:
There's also some work by Facebook/Meta, where they have shared some results from using this same carbon aware workload scheduling as part of their sustainabilty strategy - see their recent carbon explorer repo. I think they might use their own scheduler, rather than Kubernetes, but the principle is the same - move work through space to make the most of cheaper green energy for your compute.
For the suitability question, that's down to the person running the cluster, and the job. Some jobs are better fits for moving through time (low latency, pause-able jobs), and some jobs better for moving through space (ones that don't have to be run within a specific jurisdiction). These are somewhat independent of the energy consumption. If you're curious about the the energy consumption part, I think Scaphandre provides some numbers you can use and labelling of jobs for k8s, and this piece here from the BBC gives an example of it in use. Hope that helps! |
@tomkerkhove Yes definitely, I'd like to contribute the scaler. We need to finish up the SDK changes and some other dev but I should be able to start on this later in the month. |
After discussing with @vaughanknight & @yelghali I've noticed that my proposal for just having a trigger does not make much sense because it will scale straight from min to max replicas given the emission does not change that often. Instead, I'm wondering if we should not make this part of the Imagine the following: apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: {scaled-object-name}
spec:
scaleTargetRef:
name: {name-of-target-resource} # Mandatory. Must be in the same namespace as the ScaledObject
maxReplicaCount: 100 # Optional. Default: 100
environmentalImpact:
carbon:
- measuredEmission: 5%
allowedMaxReplicaCount: 50
- measuredEmission: 10%
allowedMaxReplicaCount: 10
fallback: # Optional. Section to specify fallback options
failureThreshold: 3 # Mandatory if fallback section is included
replicas: 6 # Mandatory if fallback section is included
triggers:
# {list of triggers to activate scaling of the target resource} This allows end-users to define how their application should scale based on its needs by defining triggers. If we have to control how it should adapt based on the carbon emission, then the can define So if the emission is 5%, then the maximum replicas of 100 is overruled to 50 and:
If the emission is lower than 5%, then it will go back to 100 max replicas. Any thoughts on this @rossf7 / @zroubalik / @JorTurFer? |
Should we do this instead of a carbon aware scaler? No. But I think that one only makes sense once we do #3567 and with the above proposal we don't need a trigger for it anymore. |
I think it would make sens to have both features
A proposal for using both the "Core Carbon Awareness proposed above" and the "Carbon Aware Trigger"
In terms of adoption, I think the "Core Carbon Awareness" is simpler to adopt because it does not require the customers / companies to have "power telemetry available" (which only a few customers have, as of now). On the other hand "Carbon Aware Scaler" is also interesting because it offers actual Power / Carbon Metrics for the workloads. and It would fit with the AND / OR logic with other scalers. ps: a suggestion for the fields / usage for the "Core awareness feature"
|
We can add this but before we start building scalers we'd need to be sure what they look like though as once a scaler is added we can't simply introduce breaking changes. However, if my above proposal is agreed on then we can open a separate issue for it.
I think this is something we can document as details though, no need to be that verbose IMO. We can rename it to |
agreed, the scaler can be the next step. the proposal above has value and would be easy to build |
We need to keep in mind the KEDA users are exposing their services to end-users. The end-user, at the end, wants quality of service (shareholders too). We can justify a lower quality of service for a certain period of time, but the service needs to be usable. So, limiting the number of replicas to a fixed value does not seem appropriate to me at all. It would seem more relevant to me to apply a relative decline to the scaling rule, not in Imagine the following: In that example:
spec:
...
environmentalImpact:
carbon:
- measuredIntensity: 400
reducedReplicaPercent: 50%
- measuredIntensity: 200
reducedReplicaPercent: 25%
- measuredIntensity: 50
reducedReplicaPercent: 10%
triggers:
... |
A proposal to donate AKS's carbon aware operator is open on #4463 |
Proposal
Provide a carbon aware scaler that allows end-users to scale based on their impact on the environment.
As per @rossf7 on #3381:
Also:
Use-Case
Automatically scale workloads out while the impact on the environment is low, scale in if the impact is too high.
This is useful for batch-like workloads.
Anything else?
Relates to #3381
Related to our collaboration with the Environmental Sustainability TAG/WG (kedacore/governance#59)
The text was updated successfully, but these errors were encountered: