-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add resource constraints for CEL #3144
Add resource constraints for CEL #3144
Conversation
keps/sig-api-machinery/2876-crd-validation-expression-language/README.md
Outdated
Show resolved
Hide resolved
CEL also provides a [cost subsystem](https://github.com/google/cel-go/blob/dfef54b359b05532fb9695bc88937aa8530ab055/cel/program.go#L309) that could be used in the future, | ||
but the cost subsystem would need to know the length of any relevant lists in order to be useful. | ||
That information can be supplied using `maxLength`, but this is an optional field, and if not | ||
passed, CEL would not be able to provide a useful figure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's useful, but we could set a max length of a list based on the 3MB request limit and the size of the most compact encoding possible:
- int / float lists =
3*1024*1024 / 2
(0,0,0,0,0,0,0,0,0,...
encoding) - string lists =
3*1024*1024 / 3
("","","","","",...
encoding) - list lists =
3*1024*1024 / 3
([],[],[],[],[],...
encoding) - object lists =
3*1024*1024 / 3
({},{},{},{},{},...
encoding)
I'm not sure if passing such big list sizes to CEL would make the cost subsystem go nuts or not, but we could check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll give passing lists of that size through CEL a shot and see what happens. I think that list might work well to provide maxLength
with a default value, though I'm concerned that there will be a large discrepancy between what we'd estimate resource usage to be versus what we'd see in practice if we actually ran the expressions. I suspect any solution that works without maxLength
being explicitly set will run into the same issue, though.
For Beta, per-request execution time will be constrained via a context-passed timeout. For Beta, we are looking at a range between 100 milliseconds and 1 second for the timeout. We want to see how | ||
CEL performs in benchmarks once our Beta optimizations are in place before deciding what specific | ||
duration to set the timeout to. If the timeout is exceeded, the error message will include how | ||
long each expression took to evaluate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has the downside of permitting costly CEL expressions that time out to be persisted in the CRD, and fail at CR request time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely - only having a timeout doesn't provide nearly as strong guarantees as estimating expression cost. I think if we can guarantee good cost estimation, we should go with it. I'm a little concerned about accuracy, since I feel there could be cases where an incorrect estimate is given, and an expression that should be allowed isn't permitted (or vice versa).
|
||
We will provide time and space benchmarks to show what can be expected in typical and worst-case | ||
scenarios for CEL expressions, and that for most use cases, the timeout alone will be sufficient. | ||
The alternatives listed below would be more aimed at malformed/malicious expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guarding against malformed/malicious expressions is definitely in scope
cc @jiahuif |
`maxLength` on variable length data elements is generally considered good hygiene anyway, so having a cost | ||
system that incentivizes developers to set this field, has other benefits. | ||
|
||
We will use the minimum possible size of list elements to estimate the size (e.g. objects with many fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple edges to pay attention to:
- don't consider optional fields when determining the minimum item size (the request can omit them)
- don't consider required fields with a
default
specified when determining the minimum item size (the request can omit them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this doesn't need to be reflected in the KEP, but should be remembered for impl time)
keps/sig-api-machinery/2876-crd-validation-expression-language/README.md
Show resolved
Hide resolved
Also, not requiring `maxLength` on most O(n) CEL expressions keeps the cost system low friction; the majority | ||
of CEL expressions can be written and used without bumping into cost limits or needing to set `maxLength`. | ||
|
||
For the cases where a `maxLength` is needed to ensure cost is acceptable, encouraging CRD authors to include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the incoming object is determined to be invalid according to the openapi schema (violates maxLength, or omits required fields in items that informed our calculated minimum size, etc), do we still evaluate cel expressions anyway? I think we currently evaluate CEL in addition to openapi validation (@jpbetz can confirm). We may need to change that if we're counting on maxLength or required fields in our cost calculations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. We do run both openAPI validation and CEL validation. So at a minimum we would need to change the behavior and skip CEL expression validation when we fail maxLength validations that are relevant to that CEL expression. That needs to be stated here somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this comment is non-blocking for the KEP update, but we'll need to resolve what we do here for the impl... I'm ok with seeing if the runtime cost bounds protect us sufficiently before committing to run or not run CEL rules in that scenario)
In this example the validation rule is contained within a list, so its expression cost would be `cost(startsWith) * worst_case_length(x)`. Note that this means some of the cost calculation will | ||
happen outside of the CEL cost function (which would only calculate the `startsWith` cost). | ||
|
||
During CRD validation, when an expression exceeds the cost limit, the error message will include both the limit and the cost of the expression to facilitate debugging. The cost calculation rules will be documented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how confident are we in the stability of the cost calculations (the ones done in CEL and what we're planning to layer on top)? how will we ensure that we will not accept a CRD in one release and reject it in the future due to tweak in the cost heuristics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, will publishing the limit encourage construction of expressions that squeak in just under the limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do some things to boost our confidence. I would like to write a fuzzer that tries to find the most expensive CEL expression for a given "cost" limit. If we could get something like that working and show that we don't have any gapping discrepancies between the cost heuristic and reality, I think we could start to feel a bit more confident. That said, regex matching worries me.
re: expressions that squeak in just under the limit. Maybe? I'd like to see how developers respond to the ability to set maxLength, I suspect that the vast majority of cost problems will be most trivally solved by setting that to a value that is higher than they ever expect to actually hit, but still much lower than the max the cost heuristics would pick.
resource size limits (specifically the 3MB request limit) and we can use this to establish more manageable | ||
worst case sizes. | ||
|
||
If a list contains small elements (e.g. an int64) then the worst case list size can be 3MB/4bytes=~786k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2 bytes per item, right? 0,0,0,0,0,...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's true, CRDs are stored in JSON. (Kermit and I were thinking about in-memory representation when writing this...)
|
||
If the per-expression timeout is exceeded, the error message will include how long the expression in question took to execute. If the per-request timeout is exceeded, the error message will instead include how long every expression took to execute up to and including the one where the timeout was exceeded. | ||
|
||
We will continue to refine and improve this cost data. For example, optimizations to how CEL integrates with Kubernetes data may allow us to increase our cost bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the compatibility implications of changing permitted cost bounds over time? if we tighten these in the future, we'll break existing CRDs. if we relax them, new CRDs that use the expanded capacity won't work on older versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my understanding.
This portability issue applies to pretty much anything we change about CEL.
- per-expression cost: 120,000 | ||
- per-request cost: 12,000,000 | ||
- per-expression timeout limit: 100 milliseconds | ||
- per-request timeout limit: 1 second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be fixed or tunable by cluster admins? if tunable, should we define and exercise a minimum allowable expression and request cost in conformance to give CRD authors a portable target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Portability I think is a big reason to keep the timeouts/cost limits fixed. Maybe in the future we could make these tunable if it turns out that sort of flexibility is something cluster admins need? It'll probably be easier to go from fixed > flexible than the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to not make them configurable since:
- My intuition is that if a cost limit is hit, we want to incentivize developers to set maxLengths, not bump limits
- Portability is something we're already worried about and this would potentially make it a lot worse
- If we ping the timeout with the request timeout (like discussed in the above comments) I don't think we have any other need to make that configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use time limits, I don't think they should be tunable, it will make a CRD behave differently on different clusters.
|
||
We believe CEL cost will be sufficient to bound the resource utilization of the vast majority of CEL | ||
expressions. As a backstop we will use timeouts. We will set these timeouts high enough that they are | ||
not disruptive to all but the most extremely resource starved api-servers. We don't believe this will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno... I've seen some pretty resource-starved servers... if a server is starved and is context-switching between requests, is artificially failing the CEL evaluation at 100 ms or all expressions at 1 second helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be slightly better if this were counting CPU time and not wall time, but off the top of my head I'm not sure how to do that easily.
not disruptive to all but the most extremely resource starved api-servers. We don't believe this will | ||
cause problems in practice because there are other timeouts that will trigger under those conditions. | ||
|
||
Per-request and per-expression execution time will be constrained via go cancellation and a context-passed timeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at minimum, we should definitely constrain it to the request lifetime using the request context. I'm less certain about constraining it to 100ms/1s targets without knowledge of the server power and load
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll be okay with just using the request context timeout so long as we can build enough confidence that the cost heuristic working reasonably well, which I think we need to do anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an idea on how CEL is going to implement cancellation? Does it check if the context is cancelled in every step of loop, a few steps, or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elements, but as the element size grows, the worst case list size quickly becomes manageable. We ran a [series of experiments](https://docs.google.com/document/d/1yR746Rf-rw-_zoq36Ypzu8LTqOa-LaCk1pv3e0kjF3A/edit?usp=sharing) to test the resource utilization of CEL expressions that iterate across lists. | ||
We found that: | ||
|
||
- O(n) iteration of a worst case list (list length == 786k) is <150ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that mean an iteration of a list of integers would not fit in the proposed 100ms limit? edit: oh, I guess an actual list of integers in a 3MB CR...
also, I think worst-case is actually ~1572k (see above comment about items being 2 bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there are some corner cases that wouldn't fit within the wall clock timeout, at least on the hardware tested. For instance, the 3-million-long list scenario in the performance doc. Though in that case iteration took nearly a full .8 seconds, hence the timeouts/cost limits being built at least in part over what would provide a good experience request-wise (as opposed to covering all CEL cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the integer case is a bit problematic that way (and you're correct that it's 2x what we had calculated, so it's cost estimate would be closer to a worse case of 300ms). I suspect we can get the expected latency down for this once we optimize some of our CEL type integration code. If we get this under 100ms, then I think we can get to the point where all O(n) ops can be within our cost limits. But there is some risk that this specific case won't, in which case we'd either need to make a special exception to allow it anyway (since most list in practice won't be nearly this bad), or require users set maxLength on the field in order to use CEL iteration operations on the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd be ok with basically any cel expression touching things within lists requiring a maxLength be specified on the list
not disruptive to all but the most extremely resource starved api-servers. We don't believe this will | ||
cause problems in practice because there are other timeouts that will trigger under those conditions. | ||
|
||
Per-request and per-expression execution time will be constrained via go cancellation and a context-passed timeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that the time limit could make this inconsistent depending on how busy apiserver is. Does CEL offer a feature for counting up & limiting actual runtime cost? I'd really prefer a limit that was reproducible given the input object(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that the time limit could make this inconsistent depending on how busy apiserver is.
agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems doable to me. At compile time we can compute a cost heuristic with incomplete information. Running a cost counter at evaluation time seems no harder, and maybe quite a bit easier. I'd be interested in exploring this. If we can get it working, I'd be in favor of using it instead of wall clock time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly prefer this to a time-based approach if it is doable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, hrmm… I wasn't thinking about calculating cost per request. The worst case pre-calculation at CRD edit time seems more palatable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should compute cost-per-request, I think you should give CEL an expression budget and let it halt if the expression goes over its budget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A case that seems relevant to this discussion:
spec:
type: object
properties:
x:
type: array
items:
type: object
properties:
name: y
type: int
x-kubernetes-validation:
- rule: "self < 100"
Here the rule is cheap (I'm pretty sure the cost is 1). But it's contained in a list which might be huge so it might be evaluated many times.
(edit: If we pre-compute this cost at CRD update time, we were planning to multiply the expression cost with the worst case length of the array. If we instead give the expression a budget, should the budget be for all evaluations of that expression that happen during a request?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current alternatives I'm seeing:
- Proactive: We pre-compute the worst case costs of each expression when validation rules are written to a CRD. Expressions that are O(n) or worse will usually need to set maxLengths to bring their worst case cost down into range
- Reactive: We count cost at CEL expression runtime, accumulating all the cost that an expression has consumed and exiting early if the cost limit is hit (because this is at runtime APF can be used to pick limits?)
- We do both Proactive and Reactive (but with different limits?)
- We do one of the above but also with a per-request limit (to handle the case where a CRD has many expensive CEL expressions)
- We do one of the above but limit the total number of expressions allowed in a CRD (to handle the same situation as 4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should keep two budgets, overall and per expression.
When running an expression, its budget is the minimum of the remaining overall budget and of the per-expression budget.
After running an expression, you should decrement whatever it used from the remaining overall budget.
(Assuming you can do this with CEL)
- per-expression cost: 120,000 | ||
- per-request cost: 12,000,000 | ||
- per-expression timeout limit: 100 milliseconds | ||
- per-request timeout limit: 1 second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike time spent on webhooks, time spent evaluating CEL expressions is likely to be very CPU intensive, so this sounds pretty high. This may require changes in e.g. APF configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do a runtime cost calculation in CEL like you suggested in https://github.com/kubernetes/enhancements/pull/3144/files#r798120466 and use that to bound CPU maybe this will be less bad?
|
||
We will continue to refine and improve this cost data. For example, optimizations to how CEL integrates with Kubernetes data may allow us to increase our cost bounds. | ||
|
||
Based on our current data we believe the limits should be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a write request with both CEL expressions and webhooks, how many times will the CEL expressions be evaluated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's run with API server validation, so once per internal write attempt to etcd (if there's an etcd conflict that results in an internal retry, validateupdate(new,old)
runs again)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't validate before sending to webhooks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mutating admission (including webhooks) run first, then validation, then validating admission (including webhooks), then we write to etcd. if there's an etcd conflict and the request can be retried (patch request or unconditional update with no resourceVersion precondition) the whole chain runs again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- I cleared my entire cache while OOO :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/label tide/merge-method-squash
|
||
Based on our current data we believe the limits should be: | ||
|
||
- per-expression cost: 120,000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, do we know the unit of this cost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it maps roughly to how many calls to Interpretable.Eval are made. Of course, different Eval
implementations will take up slightly different amounts of time (even on the same hardware), so I don't think a completely exact 1-to-1 mapping between cost units and time can be made.
message will include how much the limit was exceeded. | ||
|
||
A major problem with the cost system is assigning the cost of list iteration. The cost of a CEL expression is | ||
computed statically without any knowledge about the data that will be validated. Only the CEL expression is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And don't forget boolean short circuit. basically it became n-SAT Problem, which is NP Complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. CEL is very constrained though. We're using a CEL runtime with a limit of 32 levels of expression nesting, 32 terms separated by ||
in a row, the expression can't define its own functions and has to fit within a CRD. I'm curious what the worst case CEL expression would be here? It might still be pretty expensive? If so we might want to tighten some bounds. It seems containable to me, but I'd love to see someone prove me wrong (or right).
not disruptive to all but the most extremely resource starved api-servers. We don't believe this will | ||
cause problems in practice because there are other timeouts that will trigger under those conditions. | ||
|
||
Per-request and per-expression execution time will be constrained via go cancellation and a context-passed timeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an idea on how CEL is going to implement cancellation? Does it check if the context is cancelled in every step of loop, a few steps, or something else?
Based on our current data we believe the limits should be: | ||
|
||
- per-expression cost: 120,000 | ||
- per-request cost: 12,000,000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should feed this information into APF to throttle requests accordingly; this will incentivize authors to be stingy but also not prevent them from doing something expensive if it's truly necessary. May be worth mentioning as follow-up work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I've amended the PR to include that APF will be taken into account for the finalized/refined bounds. I think we can definitely include APF as follow-up work.
/assign @lavalamp |
|
||
- per-expression cost: 8,000,000 | ||
- per-request cost: 800,000,000 | ||
- per-expression budget: Selected by API Priority and Fairness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works the other way around -- you provide an estimate, and APF throttles calls appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say we'll set the budget after gaining some experience.
keps/sig-api-machinery/2876-crd-validation-expression-language/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/2876-crd-validation-expression-language/README.md
Outdated
Show resolved
Hide resolved
|
||
- per-expression cost: 8,000,000 | ||
- per-request cost: 800,000,000 | ||
- per-expression budget: Selected by API Priority and Fairness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this right? I thought we told APF the cost, and it acted as a filter to determine if the requester had budget for it
The 8,000,000 figure is based around a worst case scenario of a list of 2,000,000 elements iterated through | ||
in O(n) fashion with a loop body with a cost of 4 (enough for a basic regex). Since basic CEL expressions have | ||
a cost of 1, this should be plenty in most cases, but `maxLength` can be set on any lists where more | ||
computation needs to be done. This figure will also prevent things such as expressions from accidentally | ||
being O(n<sup>2</sup>) or worse; the cost for iterating in O(n<sup>2</sup>) fashion over a list of only | ||
30,000 elements with a loop body of cost 1 is about 36,000,000. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we can indicate this will be pinned down at impl time experimentally with benchmarking rather than committing to specific numbers here
/lgtm Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DangerOnTheRanger, lavalamp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/sig api-machinery