-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 495: Validate Pravega manifest settings before deployment #586
Conversation
Codecov Report
@@ Coverage Diff @@
## master #586 +/- ##
==========================================
+ Coverage 74.17% 74.94% +0.76%
==========================================
Files 16 16
Lines 4229 4402 +173
==========================================
+ Hits 3137 3299 +162
- Misses 962 972 +10
- Partials 130 131 +1
Continue to review full report at Codecov.
|
@@ -1194,6 +1206,66 @@ func (p *PravegaCluster) validateConfigMap() error { | |||
return nil | |||
} | |||
|
|||
func (p *PravegaCluster) ValidateSegmentStore() error { | |||
totalMemoryLimitsQuantity := p.Spec.Pravega.SegmentStoreResources.Limits[corev1.ResourceMemory] | |||
totalMemoryRequestsQuantity := p.Spec.Pravega.SegmentStoreResources.Requests[corev1.ResourceMemory] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you verifying the scenario, limit is set but requests are not set for resource, also in that case do we need to set request same as limits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the request to the limit value should be done in the defaults section, if I am not mistaken.
@nishant-yt Did you verify that totalMemoryRequestsQuantity.Value()
below produces a zero when the memory request is NOT set by the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the code in defaults section to make sure requests is equal to limits in case requests is not set by the user. Also, I have verified totalMemoryRequestsQuantity.Value()
does produces 0
value if memory requests is not set by the user
Options: map[string]string{"pravegaservice.cache.size.max": "1610612736"}, | ||
SegmentStoreJVMOptions: []string{"-Xmx1g", "-XX:MaxDirectMemorySize=2560m"}, | ||
SegmentStoreResources: &corev1.ResourceRequirements{ | ||
Requests: corev1.ResourceList{ | ||
corev1.ResourceCPU: resource.MustParse("1000m"), | ||
corev1.ResourceMemory: resource.MustParse("4Gi"), | ||
}, | ||
Limits: corev1.ResourceList{ | ||
corev1.ResourceCPU: resource.MustParse("2000m"), | ||
corev1.ResourceMemory: resource.MustParse("4Gi"), | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do these values come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting all the defaults here without verifying user supplied values is extremely dangerous.
If the user supplies memory limit less than 4Gi
(say, 2Gi
) without memory request,
they will end with a 4Gi
memory request (default) and 2Gi
memory limit (user-supplied).
What we rather need to do while setting the defaults is:
- See if the user has set a memory request
- If the request is not set, then set it to memory limit (the admission webhook would be verifying that the memory limit is set by the user).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default values are only for e2e tests and the checks are already in place for all the scenarios
@@ -1194,6 +1206,66 @@ func (p *PravegaCluster) validateConfigMap() error { | |||
return nil | |||
} | |||
|
|||
func (p *PravegaCluster) ValidateSegmentStore() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add comments in this method explaining what it does and why. Also, the name of the method should be more concrete, like ValidateSegmentStoreMemorySettings
, as ValidateSegmentStore
is not really descriptive about what it is doing.
maxDirectMemorySize := maxDirectMemoryQuantity.Value() | ||
cacheSize := cacheSizeQuantity.Value() | ||
|
||
if totalMemoryLimits <= maxDirectMemorySize+xmx { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please, leave a space after and before the +
symbol (it is hard to read otherwise).
return fmt.Errorf("MaxDirectMemorySize(%v B) along with JVM Xmx value(%v B) is greater than or equal to the total available memory(%v B)!", maxDirectMemorySize, xmx, totalMemoryLimits) | ||
} | ||
|
||
if maxDirectMemorySize <= xmx { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, remove this part of the validation (also from the PR description), as this may be completely legitimate. You may want to configure a JVM of 4GB with a cache of 3GB (and therefore setting maxDirectMemorySize=4GB
). And there is nothing strictly wrong with that, is just a user choice. We have to validate to prevent what we know for sure is wrong, but not more than that, as we also want to give some degree of flexibility to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this document, we should generally provision the Segment Store with more direct memory than heap memory. So, shouldn't this validation be there. I agree with your example and can remove the equality
from the check so as to provide flexibility to the user.
@nishant-yt Could you please add end to end tests to cover the webhook validations? |
// JVM Direct memory > Segment Store read cache size (pravegaservice.cache.size.max). | ||
func (p *PravegaCluster) ValidateSegmentStoreMemorySettings() error { | ||
if p.Spec.Pravega.SegmentStoreResources == nil { | ||
return fmt.Errorf("Missing required value for field spec.pravega.segmentStoreResources.limits.memory") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaulGracia Should we allow deployments without setting the resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the offline discussion, we do need a check for p.Spec.Pravega.SegmentStoreResources
not to be nil
.
If we allow the nil
object here, skip the below validation and let the defaults be assigned later by withDefaults()
function, we cannot perform a check for user-supplied pravegaservice.cache.size.max
value or JVM memory settings. Therefore it would make sense to make p.Spec.Pravega.SegmentStoreResources
a required block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for usability in the first usage, allowing a cluster to be deployed with default values makes sense, but it would not be at all a production-ready configuration. And of course, not at the cost of not doing the memory checks.
pkg/apis/pravega/v1beta1/pravega.go
Outdated
if s.SegmentStoreResources.Limits[v1.ResourceCPU] == (resource.Quantity{}) { | ||
changed = true | ||
s.SegmentStoreResources.Limits[v1.ResourceCPU] = resource.MustParse(DefaultSegmentStoreLimitCPU) | ||
} | ||
|
||
if s.SegmentStoreResources.Requests[v1.ResourceCPU] == (resource.Quantity{}) { | ||
changed = true | ||
s.SegmentStoreResources.Requests[v1.ResourceCPU] = resource.MustParse(DefaultSegmentStoreRequestCPU) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is a good idea.
If the user sets the CPU request and is not aware of the default limit, we can create a situation where the limit is below the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should always ask for the memory limit
, while the request
may be optional. In case of setting only limit
, request
can be set to limit
, as this seems to be the behavior of Kubernetes: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaulGracia What about CPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think that CPU is less of a problem, it may be good also bound the CPU resources similarly to what is being done here for memory. In this case, we should i) enforce that at least limit
on CPU is set (to prevent a busy or malfunctioning pod starving all other pods in the node), ii) if requests
is not set, default it to limit
, and iii) if requests
is set, check that it is <= limit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaulGracia The code I'm commenting on is located in withDefaults()
function and is invoked after the CR has passed the validation webhook. Therefore there are no checks or enforcement are done in this area of the code.
My concern is that when the user creates a CR with high CPU request and no CPU limit, like this (memory settings aside):
segmentStore:
resources:
requests:
cpu: 2000m
then line 333 in the current code will transform it into the following, which is wrong:
resources:
requests:
cpu: 2000m
limits:
cpu: 1
Again, if the user creates a CR with low CPU limit and no CPU request, like this:
segmentStore:
resources:
limits:
cpu: 200m
then line 338 of the current code will again transform it into a wrong configuration:
segmentStore:
resources:
requests:
cpu: 500m
limits:
cpu: 200m
My ask is to not assign the CPU request/limit defaults independently of each other (i.e., of what the user might have set for the other parameter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkhalack actually I will be modifying the withDefaults()
method as shown below so as to have the same check what we have for memory in addition to the checks what @RaulGracia mentioned :
if s.SegmentStoreResources.Requests[v1.ResourceCPU] == (resource.Quantity{}) {
changed = true
s.SegmentStoreResources.Requests[v1.ResourceCPU] = s.SegmentStoreResources.Limits[v1.ResourceCPU]
}
return fmt.Errorf("Missing required value for field spec.pravega.segmentStoreResources.limits.memory") | ||
} | ||
|
||
totalMemoryLimitsQuantity := p.Spec.Pravega.SegmentStoreResources.Limits[corev1.ResourceMemory] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need a check that p.Spec.Pravega.SegmentStoreResources.Limits != nil
and that p.Spec.Pravega.SegmentStoreResources.Requests != nil
before we go ahead and check for memory settings (the motivation being that the user might set requests without limits or limits without requests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that limits
should not be null
. We could leave requests
nullable, and if so, set requests=limits
as it seems to be the underlying behavior in Kubernetes itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There still needs to be a check for a nil
Requests
object before we try to assign Requests[corev1.ResourceMemory]
to anything (like we currently do it on the next line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed , I will be adding a nil
check for both Requests
and Limits
Objects.
if p.Spec.Pravega.SegmentStoreResources.Requests == nil { | ||
return fmt.Errorf("spec.pravega.segmentStoreResources.requests cannot be empty") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR changes in pravega.go
do allow for the requests not being set.
The better way to deal with the situation is to set totalMemoryRequests
to zero if segmentStoreResources.requests
is not set (otherwise proceed normally).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one comment.
@@ -323,6 +323,24 @@ func (s *PravegaSpec) withDefaults() (changed bool) { | |||
} | |||
} | |||
|
|||
if s.SegmentStoreResources.Requests == nil { | |||
changed = true | |||
s.SegmentStoreResources.Requests = map[corev1.ResourceName]resource.Quantity{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure at this point in the code that s.SegmentStoreResources.Limits
exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes because in ValidateSegmentStoreMemorySettings
method we are first checking whether p.Spec.Pravega.SegmentStoreResources.Limits == nil
and in turn making sure user does pass Limits
"autoScale.tokenSigningKey": "secret", | ||
"pravega.client.auth.token": "YWRtaW46MTExMV9hYWFh", | ||
"pravega.client.auth.method": "Basic", | ||
"pravegaservice.container.count": "4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, this was still with the old nomenclature,
Signed-off-by: Nishant Gupta <[email protected]>
Signed-off-by: Nishant Gupta <[email protected]>
Signed-off-by: Nishant Gupta <[email protected]>
Signed-off-by: Nishant Gupta <[email protected]>
Signed-off-by: Nishant Gupta <[email protected]>
Signed-off-by: Nishant Gupta <[email protected]>
0ef2886
to
6783fe3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Nishant Gupta [email protected]
Change log description
While deploying Pravega , Segment Store settings should adhere to the following rules:
POD_MEM_LIMIT > JVM Heap (
-Xmx
) + JVM Direct Memory (-XX:MaxDirectMemorySize
)JVM Direct memory > Segment Store read cache size (
pravegaservice.cache.size.max
)Here we are validating the above scenarios before deployment by the operator happens instead of manually relying on the user to set them right.
Purpose of the change
Fixes #495
What the code does
The code makes sure the following checks are performed by the operator before deploying Pravega:
POD_MEM_LIMIT > JVM Heap + JVM Direct Memory
JVM Direct memory > Segment Store read cache size
How to verify it
Try setting different values for
.spec.pravega.segmentStoreResources.requests.memory
,-Xmx
,-XX:MaxDirectMemorySize
andpravegaservice.cache.size.max
and the operator should return error in case the above mentioned rules are not met.