Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multi-networkpolicy to deploy to mlab-oti with canary nodeSelector #913

Merged
merged 4 commits into from
Oct 24, 2024

Conversation

nkinkade
Copy link
Contributor

@nkinkade nkinkade commented Oct 23, 2024

The primary intent of this PR is to allow the multi-networkpolicy DaemonSet to deploy to production, but only to nodes with the special label mlab/run: multi-networkpolicy-canary. The plan is to manually label a subset of nodes, perhaps 10 in various metros, to be sure that it works as intended and does not for any reason impact performance.

Additionally, the PR modifies modifies the memory limits for the container. I discovered that for some reason the containers in sandbox only use around 30MB of memory, but in staging they use around 120MB, sometimes spiking higher and causing the container to be killed for going over the old memory limit of 150Mi.

I also made some small modifications to the pod labels to be more in line with how we label other pods.


This change is Reviewable

I noticed that some multi-networkpolicy containers were getting OOM-killed.
Looking at typical memory usage of the container across all staging nodes, the
usage seems to hover right around 125MB for most of them, with usage sometimes
spiking up to the limit of 150MB and getting OOM-killed. This commit increases
the request to 150MB and sets the limit to 250MB. Based on monitoring data,
these should be more reasonable settings.

Additionally, I changed the label "name" label on the pod to "workload", which
is more consistent with how we label other pods. I removed the "app" label,
since we don't use that convention and it was, I believe, unused.
For sandbox and staging it's fine to deploy multi-networkpolicy everywhere, but
for production we want to do some canaries before deploying widely to be sure
that it works as intended and that it doesn't impact performance in any way.

Also, increase the memory limit to 500Mi. I have noticed that on physical
machines in sandbox multi-networkpolicy only uses around 30 or 40MB, but in
staging it uses more like 120MB. The only way I can account for this is that
staging has more nodes and perhaps the service somehow keeps track other nodes
for some reason.
For now it has a canary nodeSelector, so it will only to deploy to nodes that
we manually label with that nodeSelector.
Copy link
Contributor

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

An unrelated request to please tag the measurementlab/multi-networkpolicy-iptables image so we're not running an unversioned image in production.

Reviewed 2 of 2 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @nkinkade)


k8s/daemonsets/core/multi-networkpolicy.jsonnet line 44 at r1 (raw file):

          {
            name: 'multi-networkpolicy',
            image: 'measurementlab/multi-networkpolicy-iptables:latest',

Please additionally tag the version we're using in production.

Copy link
Contributor Author

@nkinkade nkinkade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set up automated builds for this in Docker Hub, and released a v1.0.0 of our fork of the repo. The container image for the DaemonSet is now using the image built against the v1.0.0 release.

Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)


k8s/daemonsets/core/multi-networkpolicy.jsonnet line 44 at r1 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

Please additionally tag the version we're using in production.

Done.

@nkinkade nkinkade merged commit 9358136 into main Oct 24, 2024
2 of 3 checks passed
@nkinkade nkinkade deleted the sandbox-kinkade branch October 24, 2024 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants