-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1.7] routing: Fix route collisions in AWS ENI #14337
[v1.7] routing: Fix route collisions in AWS ENI #14337
Conversation
aeeec7a
to
33f56ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits that I think would make the code more readable below.
The solution itself LGTM.
Two other remaining avenues for followup:
- I'd guess Azure could be affected by the same issue?
- Upgrade/downgrade. We had discussed some solutions out-of-band but my understanding is that this current implementation is sufficient assuming that you cordon the nodes prior to upgrade.
On the latter point, I'm not entirely sure how cleanup of previous rules/tables would happen during upgrade. Even if you cordon the node and vacate all workloads, will the old route tables / rules be removed? What about the ones for the cilium-health endpoint?
Yes Azure is affected and I plan to address this in a future PR. I didn't want to pollute the v1.7 tree with references to an IPAM mode that we don't support.
Using
Edit: This comment is out-of-date. Please refer to the PR body. |
2c0af08
to
dac0da6
Compare
003a0ce
to
06086f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look over the test as well now but I already have a few preliminary comments on the new migration code below.
06086f4
to
1c22be6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I wondered whether there's some more filtering we could do to reduce the number of rules being dumped out of the kernel (eg by requesting with a more specific filter of matching rules, ie "give me only the rules with priority X"), but I didn't look so closely at the libraries/kernel API to figure out whether that's possible.
3d57827
to
1b28823
Compare
This is useful to aggregate the items to revert in one stack, so that it can all be done at once. Signed-off-by: Chris Tarazi <[email protected]>
ce6161a
to
f6480ab
Compare
test-backport-1.7 |
For reviewers: The latest push includes a major change in the approach to migrate the ENI datapath. Previous approach treated the migration as all-or-nothing. This meant that any error encountered during the migration would cause us to abort and revert all successful operations leading to the failure. The new approach is not all-or-nothing, but rather per-rule (or per-endpoint). This means that any rule (or endpoint) can fail the migration, but we will still carry on and attempt to migrate the rest. We can do as such because we are able to guarantee that a failure at any point in the migration process for the rule will not disrupt connectivity (see d100429#diff-4d5ad11dc62beee3c82e4eaee46e08a8edc1232423564923d5bd0e97864714f7R90-R137 for details). Either the "newer" datapath or the "older" datapath will be in-place. In some cases, it is possible to have both datapaths in-place, but connectivity is not disrupted in that case. We take measures to warn the user of such cases if it occurs. The main reason for the new approach is that it is less risky due to being iterative, rather than all-or-nothing. In addition, if a failure were to occur in the very late stages of the previous approach, the number of operations to revert would be quite large. The more operations we do, the more likely a failure occurs. The new approach performs the minimum required operations, by relying on the atomic nature of the migration. |
scopedLog.WithError(err).WithField("revertError", revErr).Warn(downgradeRevertWarning) | ||
} | ||
|
||
return nil, fmt.Errorf("failed to delete old rule: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this should actually
return nil, fmt.Errorf("failed to delete old rule: %w", err) | |
return nil, nil |
Since the rule insertion above actually made the downgrade effective. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^^ Something I missed in my original post here, if so we should also remove the RevertStack handling logic a few lines above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, would it be useful for the user to be informed of this, so that they can cleanup the leftover state themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should log like we do for the other "failed cleanup" steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 Just a few minor nits but this is looking good. Main function feedback is just on adjusting the downgrade path so we call it successful after the higher priority rule is put in place, regardless of any subsequent failure in operations.
f6480ab
to
8f1ec3a
Compare
Incremental diff implementing Joe's feedback from above.diff --git a/Documentation/install/upgrade.rst b/Documentation/install/upgrade.rst
index 274f5c4b8..718bcda01 100644
--- a/Documentation/install/upgrade.rst
+++ b/Documentation/install/upgrade.rst
@@ -329,9 +329,7 @@ node. This version fixes the issue and introduces a new flag to handle
compatibility. This flag is ``--egress-multi-home-ip-rule-compat``.
The flag defaults to ``false`` meaning it will instruct Cilium to perform a
-migration to the new ENI datapath upon Cilium upgrade. This operation will
-modify the routing table to fix a collision that in rare cases can cause
-connectivity loss to the node.
+migration to the new ENI datapath upon Cilium upgrade.
If the flag is ``true`` upon upgrade, Cilium will **not** perform a migration.
diff --git a/pkg/datapath/linux/routing/migrate.go b/pkg/datapath/linux/routing/migrate.go
index add344b5e..47caaa0d1 100644
--- a/pkg/datapath/linux/routing/migrate.go
+++ b/pkg/datapath/linux/routing/migrate.go
@@ -26,8 +26,8 @@ import (
// MigrateENIDatapath migrates the egress rules inside the Linux routing policy
// database (RPDB) for ENI IPAM mode. It will return the number of rules that
// were successfully migrated and the number of rules we've failed to migrated.
-// A -1 is returned for the failed number of rules indicating we couldn't even
-// start the migration.
+// A -1 is returned for the failed number of rules if we couldn't even start
+// the migration.
//
// The compat flag will control what Cilium will do in the migration process.
// If the flag is false, this instructs Cilium to ensure the datapath is newer
@@ -119,12 +119,22 @@ func (m *migrator) MigrateENIDatapath(compat bool) (int, int) {
// If a failure occurs at (3), we have already succeeded in getting the new
// state in-place to direct traffic for the endpoint. In any case of
// upgrade or downgrade, it is possible for both states to be in-place if
- // there are any failures, including any failures in reverting. The
- // datapath selected will depend on the rule priority. Concretely, for
- // upgrades, the newer rule will have a lower priority, so the original
- // datapath will be selected. For downgrades, the newer rule will have a
- // higher priority, so the newer datapath will be selected. In either case,
- // no connectivity is affected for the endpoint.
+ // there are any failures, especially if there were any failures in
+ // reverting. The datapath selected will depend on the rule priority.
+ //
+ // Concretely, for upgrades, the newer rule will have a lower priority, so
+ // the original datapath will be selected. The migration is deemed a
+ // failure because the original datapath (with a rule that has a higher
+ // priority) is being selected for the endpoint. It is necessary to attempt
+ // reverting the failed migration work [(1) & (2)], as leaving the state
+ // could block a user's from retrying this upgrade again.
+ //
+ // For downgrades, the newer rule will have a higher priority, so the newer
+ // datapath will be selected. The migration is deemed a success and we
+ // explicitly avoid reverting, because it's not necessary to revert this
+ // work merely because we failed to cleanup old, ineffectual state.
+ //
+ // In either case, no connectivity is affected for the endpoint.
//
// If we fail at (4), then the old rule will have been deleted and the new
// state is in-place, which would be servicing the endpoint. The old routes
@@ -156,17 +166,6 @@ func (m *migrator) MigrateENIDatapath(compat bool) (int, int) {
if isUpgrade {
for _, r := range v1Rules {
- // Let's say we have an ENI device attached to the node with
- // ifindex 3 and interface number 2. The following rule will exist
- // on the node _before_ migration.
- // 110: from 192.168.11.171 to 192.168.0.0/16 lookup 3
- // After the migration, this rule will become:
- // 111: from 192.168.11.171 to 192.168.0.0/16 lookup 12
- // The priority has been updated to 111 and the table ID is 12
- // because the interface number is 2 plus the routing table offset
- // (linux_defaults.RouteTableInterfacesOffset). See copyRoutes()
- // for what happens with routes.
-
if routes, err := m.upgradeRule(r); err != nil {
log.WithError(err).WithField("rule", r).Warn("Failed to migrate endpoint to new ENI datapath. " +
"Previous datapath is still intact and endpoint connectivity is not affected.")
@@ -184,16 +183,6 @@ func (m *migrator) MigrateENIDatapath(compat bool) (int, int) {
}
} else if isDowngrade {
for _, r := range v2Rules {
- // Let's say we have an ENI device attached to the node with
- // ifindex 9 and interface number 3. The following rule will exist
- // on the node _before_ migration.
- // 111: from 192.168.11.171 to 192.168.0.0/16 lookup 13
- // After the migration, this rule will become:
- // 110: from 192.168.11.171 to 192.168.0.0/16 lookup 9
- // The priority has been reverted back to 110 and the table ID back
- // to 9 because the ifindex is 9. See copyRoutes() for what happens
- // with routes.
-
if routes, err := m.downgradeRule(r); err != nil {
log.WithError(err).WithField("rule", r).Warn("Failed to downgrade endpoint to original ENI datapath. " +
"Previous datapath is still intact and endpoint connectivity is not affected.")
@@ -280,6 +269,17 @@ func NewMigrator(getter interfaceDB) *migrator {
// returns the old routes that the caller should remove at a later time, along
// with an error.
func (m *migrator) upgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
+ // Let's say we have an ENI device attached to the node with ifindex 3 and
+ // interface number 2. The following rule will exist on the node _before_
+ // migration.
+ // 110: from 192.168.11.171 to 192.168.0.0/16 lookup 3
+ // After the migration, this rule will become:
+ // 111: from 192.168.11.171 to 192.168.0.0/16 lookup 12
+ // The priority has been updated to 111 and the table ID is 12 because the
+ // interface number is 2 plus the routing table offset
+ // (linux_defaults.RouteTableInterfacesOffset). See copyRoutes() for what
+ // happens with routes.
+
scopedLog := log.WithField("rule", rule)
routes, err := m.rpdb.RouteListFiltered(netlink.FAMILY_V4, &netlink.Route{
@@ -335,9 +335,10 @@ func (m *migrator) upgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
return nil, fmt.Errorf("failed to create new rule: %w", err)
}
- if err := m.deleteOldRule(rule); err != nil {
- // We revert here because we want to ensure that the new state
- // that we just created above is reverted.
+ if err := m.rpdb.RuleDel(&rule); err != nil {
+ // We revert here because we want to ensure that the new state that we
+ // just created above is reverted. See long comment describing the
+ // migration in MigrateENIDatapath().
if revErr := stack.Revert(); revErr != nil {
scopedLog.WithError(err).WithField("revertError", revErr).Warn(upgradeRevertWarning)
}
@@ -353,6 +354,15 @@ func (m *migrator) upgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
// attached ENI device on the node. It returns the "old" routes (new datapath)
// that the caller should remove at a later time, along with an error.
func (m *migrator) downgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
+ // Let's say we have an ENI device attached to the node with ifindex 9 and
+ // interface number 3. The following rule will exist on the node _before_
+ // migration.
+ // 111: from 192.168.11.171 to 192.168.0.0/16 lookup 13
+ // After the migration, this rule will become:
+ // 110: from 192.168.11.171 to 192.168.0.0/16 lookup 9
+ // The priority has been reverted back to 110 and the table ID back to 9
+ // because the ifindex is 9. See copyRoutes() for what happens with routes.
+
scopedLog := log.WithField("rule", rule)
oldTable := rule.Table
@@ -379,12 +389,13 @@ func (m *migrator) downgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
return nil, fmt.Errorf("failed to create new routes: %w", err)
}
- revert, err = m.createNewRule(
+ // We don't need the revert stack return value because the next opertion to
+ // delete the rule will not revert the stack. See below comment on why.
+ _, err = m.createNewRule(
rule,
linux_defaults.RulePriorityEgress,
newTable,
)
- stack.Extend(revert)
if err != nil {
if revErr := stack.Revert(); revErr != nil {
scopedLog.WithError(err).WithField("revertError", revErr).Warn(downgradeRevertWarning)
@@ -393,12 +404,11 @@ func (m *migrator) downgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
return nil, fmt.Errorf("failed to create new rule: %w", err)
}
- if err := m.deleteOldRule(rule); err != nil {
- if revErr := stack.Revert(); revErr != nil {
- scopedLog.WithError(err).WithField("revertError", revErr).Warn(downgradeRevertWarning)
- }
-
- return nil, fmt.Errorf("failed to delete old rule: %w", err)
+ if err := m.rpdb.RuleDel(&rule); err != nil {
+ // We avoid reverting and returning an error here because the newer
+ // datapath is already in-place. See long comment describing the
+ // migration in MigrateENIDatapath().
+ return nil, nil
}
return routes, nil
@@ -472,8 +482,8 @@ func (m *migrator) retrieveTableIDFromInterfaceNumber(ifaceNum int) (int, error)
// copyRoutes upserts `routes` under the `from` table ID to `to` table ID. It
// returns a RevertStack and an error. The RevertStack contains functions that
// would revert all the successful operations that occurred in this function.
-// The caller of this function should be aware to ensure the stack is reverted
-// upon an error.
+// The caller of this function MUST revert the stack when this function returns
+// an error.
func (m *migrator) copyRoutes(routes []netlink.Route, from, to int) (revert.RevertStack, error) {
var revertStack revert.RevertStack
@@ -527,8 +537,8 @@ func (m *migrator) copyRoutes(routes []netlink.Route, from, to int) (revert.Reve
// createNewRule inserts `rule` with the table ID of `newTable` and a priority
// of `toPrio`. It returns a RevertStack and an error. The RevertStack contains
// functions that would revert all the successful operations that occurred in
-// this function. The caller of this function should be aware to ensure the
-// stack is reverted upon an error.
+// this function. The caller of this function MUST revert the stack when this
+// function returns an error.
func (m *migrator) createNewRule(rule netlink.Rule, toPrio, newTable int) (revert.RevertStack, error) {
var revertStack revert.RevertStack
@@ -549,14 +559,6 @@ func (m *migrator) createNewRule(rule netlink.Rule, toPrio, newTable int) (rever
return revertStack, nil
}
-func (m *migrator) deleteOldRule(rule netlink.Rule) error {
- if err := m.rpdb.RuleDel(&rule); err != nil {
- return fmt.Errorf("unable to delete old rule: %w", err)
- }
-
- return nil
-}
-
func (m *migrator) deleteOldRoutes(routes []netlink.Route) error {
for _, r := range routes {
if err := m.rpdb.RouteDel(&r); err != nil {
|
Two more pieces of feedback:
|
8f1ec3a
to
702ed0d
Compare
Done, incremental diffcommit baed31723b277de6f4bddea4dbd1b51b5336f44d
Author: Chris Tarazi <[email protected]>
Date: Tue Jan 5 11:42:16 2021 -0800
fixup! routing: Add ENI route table migration logic
Signed-off-by: Chris Tarazi <[email protected]>
diff --git a/pkg/datapath/linux/routing/migrate.go b/pkg/datapath/linux/routing/migrate.go
index 47caaa0d1..2740903fd 100644
--- a/pkg/datapath/linux/routing/migrate.go
+++ b/pkg/datapath/linux/routing/migrate.go
@@ -389,8 +389,8 @@ func (m *migrator) downgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
return nil, fmt.Errorf("failed to create new routes: %w", err)
}
- // We don't need the revert stack return value because the next opertion to
- // delete the rule will not revert the stack. See below comment on why.
+ // We don't need the revert stack return value because the next operation
+ // to delete the rule will not revert the stack. See below comment on why.
_, err = m.createNewRule(
rule,
linux_defaults.RulePriorityEgress,
@@ -408,6 +408,7 @@ func (m *migrator) downgradeRule(rule netlink.Rule) ([]netlink.Route, error) {
// We avoid reverting and returning an error here because the newer
// datapath is already in-place. See long comment describing the
// migration in MigrateENIDatapath().
+ scopedLog.WithError(err).Warn(downgradeFailedRuleDeleteWarning)
return nil, nil
}
@@ -419,6 +420,8 @@ const (
"Endpoint connectivity should not be affected. It is advised to retry the migration."
downgradeRevertWarning = "Reverting the new ENI datapath failed. However, both the new and previous datapaths are still intact. " +
"Endpoint connectivity should not be affected. It is advised to retry the migration."
+ downgradeFailedRuleDeleteWarning = "Downgrading the datapath has succeeded, but failed to cleanup the original datapath. " +
+ "It is advised to manually remove the old rule."
)
// retrieveTableIDFromIfIndex computes the correct table ID based on the |
test-backport-1.7 Edit: Cilium-Ginkgo-Tests failed to provision, everything else passed |
test-backport-1.7 |
@christarazi seems like there's a test breakage of the new tests, maybe related to the most recent changes? https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/20049/testReport/junit/(root)/Suite-runtime/RuntimePrivilegedUnitTests_Run_Tests/ |
This commit will fixup the ENI datapath depending on the egress-multi-home-ip-rule-compat flag (see previous commits for context). The migration logic supports both upgrading and downgrading the ENI datapath. This logic must run on startup before the API is served and before the health endpoint is created, so that no endpoints are prematurely crreated before Cilium has had the chance to migrate the entire datapath. See cilium#14336. Suggested-by: Joe Stringer <[email protected]> Suggested-by: Thomas Graf <[email protected]> Signed-off-by: Chris Tarazi <[email protected]>
This commit fixes a potential route collision in AWS ENI IPAM modes, where the ifindex could equal the main routing table ID (from 253-255) [1], causing traffic to be subject to these routes incorrectly. This is admittedly rare, but we've seen this from a user report. The impact is that most traffic on the node is suddenly blackholed. To fix this, we say that each device or interface (ENI) will have their own dedicated routing table. The table ID will start with an offset of 10 because it is highly unlikely to collide with the main routing table ID (from 253-255). We grab the number associated with the ENI device (`Number`) and add the offset. For example, if we have an ENI device "eni-0" which has a `Number` of 5, then the table ID will be 10 + 5 = 15. Another important piece to note is that only the egress rule will reside inside the per-device tables, whereas the ingress rule will stay in the main routing table. This is because we want the main routing table to hold the routes to the endpoint. Moving forward, the ENI datapath will now create rules under a new egress priority value (RulePriorityEgressv2), as long as the egress-multi-home-ip-rule-compat flag is false. If it's true, then the datapath will create rules under the original egress priority value (RulePriorityEgress). This helps disambiguate when running with the older or newer ENI datapath. See cilium#14336. [1]: See ip-route(8) Reported-by: Vlad Ungureanu <[email protected]> Suggested-by: Joe Stringer <[email protected]> Suggested-by: Thomas Graf <[email protected]> Signed-off-by: Chris Tarazi <[email protected]>
Signed-off-by: Chris Tarazi <[email protected]>
702ed0d
to
0124757
Compare
Ah, I see what went wrong @joestringer. I simply forgot to update the unit test which covers the failing downgrade case. This is because we changed the logic to not fail the migration if we aren't able to delete the old rule (item 3 from the long comment, if you recall). Here's the diff for the fix: diff --git a/pkg/datapath/linux/routing/migrate_test.go b/pkg/datapath/linux/routing/migrate_test.go
index d1f775144..9b7fc3aee 100644
--- a/pkg/datapath/linux/routing/migrate_test.go
+++ b/pkg/datapath/linux/routing/migrate_test.go
@@ -262,14 +262,13 @@ func (m *MigrateSuite) TestMigrateENIDatapathDowngradeSuccess(c *C) {
}
func (m *MigrateSuite) TestMigrateENIDatapathDowngradeFailure(c *C) {
- // This test case will cover one downgrade failure path where we
- // successfully migrate all the new rules and routes back to the original
- // scheme, but fail to delete the new rule. This test case will be set up
+ // This test case will cover one downgrade failure path where we failed to
+ // migrate the rule to the old scheme. This test case will be set up
// identically to the successful case. "New" meaning the rules and routes
// using the new datapath scheme, hence downgrading. After we call
// MigrateENIDatapath(), we assert that we failed to migrate 1 rule. We
// assert that the revert of the downgrade was successfully as well,
- // meaning we expect the "new" rules and routes to be reinstated.
+ // meaning we expect the "newer" rules and routes to be reinstated.
runFuncInNetNS(c, func() {
index := 5
@@ -278,16 +277,16 @@ func (m *MigrateSuite) TestMigrateENIDatapathDowngradeFailure(c *C) {
m.defaultNetlinkMock()
- // Here we inject the error on deleting a rule. The first call we want to
+ // Here we inject the error on adding a rule. The first call we want to
// fail, but the second we want to succeed, because that will be the
// revert.
- var onRuleDelCount int
- m.OnRuleDel = func(r *netlink.Rule) error {
- if onRuleDelCount == 0 {
- onRuleDelCount++
+ var onRuleAddCount int
+ m.OnRuleAdd = func(r *netlink.Rule) error {
+ if onRuleAddCount == 0 {
+ onRuleAddCount++
return errors.New("fake error")
}
- return netlink.RuleDel(r)
+ return netlink.RuleAdd(r)
}
// Set up the interfaceDB mock. The MAC address returned is coming from the |
test-backport-1.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A few nits mainly around UX, but they're minor so feel free to ignore them.
@@ -54,6 +54,7 @@ cilium-agent [flags] | |||
--disable-k8s-services Disable east-west K8s load balancing by cilium | |||
--dns-max-ips-per-restored-rule int Maximum number of IPs to maintain for each restored DNS rule (default 1000) | |||
--egress-masquerade-interfaces string Limit egress masquerading to interface selector | |||
--egress-multi-home-ip-rule-compat Use a new scheme to store rules and routes under ENI and Azure IPAM modes, if false. Otherwise, it will use the old scheme. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The words "new" and "old" will likely only make sense to people who have read this PR. Is it possible to find a more descriptive description. Maybe something like "Offset routing table IDs under ENI and Azure IPAM modes to avoid collisions with reserved table IDs."
@@ -770,6 +773,10 @@ func init() { | |||
flags.Bool(option.EnableAPIRateLimit, false, "Enables the use of the API rate limiting configuration") | |||
option.BindEnv(option.EnableAPIRateLimit) | |||
|
|||
flags.Bool(option.EgressMultiHomeIPRuleCompat, false, | |||
"Use a new scheme to store rules and routes under ENI and Azure IPAM modes, if false. Otherwise, it will use the old scheme.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: see nit about flag description in generated Documentation/cmdref/cilium-agent.md
above.
// RouteTableInterfacesOffset is the offset for the per-ENI routing tables. | ||
// Each ENI interface will have its own table starting with this offset. It | ||
// is 10 because it is highly unlikely to collide with the main routing | ||
// table which is between 253-255. See ip-route(8). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: table id 0 is also reserved (as well as 253-255).
scopedLog.WithError(err).WithField("routes", routes). | ||
Warnf("Failed to cleanup after successfully migrating endpoint to %s ENI datapath. "+ | ||
"It is recommended that theses routes are cleaned up, as it is possible in the future "+ | ||
"to collide with another endpoint with the same IP.", version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the user actually perform the cleanup? By running ip route del
manually to remove routes with priority 110 for example? It's maybe worth giving the user a hint here.
downgradeRevertWarning = "Reverting the new ENI datapath failed. However, both the new and previous datapaths are still intact. " + | ||
"Endpoint connectivity should not be affected. It is advised to retry the migration." | ||
downgradeFailedRuleDeleteWarning = "Downgrading the datapath has succeeded, but failed to cleanup the original datapath. " + | ||
"It is advised to manually remove the old rule." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note that old rules have priority 110.
@@ -0,0 +1,575 @@ | |||
// Copyright 2016-2020 Authors of Cilium |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Copyright year needs to be updated to 2021 I think. Same with other files touched in this PR.
return filterRulesByPriority(rules, prio), nil | ||
} | ||
|
||
func (m *MigrateSuite) defaultNetlinkMock() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but there are a number of projects that generate Go mocks, avoiding having to manually create them. One example is gomock.
Good feedback from @twpayne , mostly cosmetic / future-looking improvements that we can address in a follow-up PR. |
@twpayne Thanks for the review. I will address these items in a followup and I'll ping you on that PR. |
Problem
To recap, the ENI IPAM mode allocates a per-ENI (interface) routing table. Each endpoint in this IPAM mode has two ip rules, "ingress" and "egress", and 2 routes. These 4 items will be created under same table ID. This table ID is the ifindex of the interface. It is possible that the ifindex of the interface is equal to 253, 254, or 255 [1], which are reserved for the main routing table. Therefore, it is possible for us to inject routes (and rules) reserved for the main routing of the system, which can obviously cause connectivity disruption.
For more on this issue, see #14336.
Proposed Solution
The solution is to compute the proper table ID by taking the "interface number" and adding an offset of 10, to avoid the possibility of colliding with the main routing table.
The "interface number" is given to us by the AWS API for the interface (
CiliumNode.Status.ENI.ENIs[*].Number
). This number is sequential meaning the first ENI has "interface number" of 1, and so on. The max number of ENIs on a given node is 15. So, the max possible table ID is 25. Therefore, we avoid the possibility of colliding with the main routing table. An offset of 10 was chosen because it avoids colliding with other system components which may allocate lower numbered table IDs and because it will be smaller than the main routing table ID range.In order to rollout this solution, we must migrate the old ENI datapath on upgrade. Meaning, we must fixup all the old rules and routes which use the old table ID scheme, to the new table ID scheme. This includes creating new rules, routes, then deleting the old rules and routes only if all previous steps succeeded. This is a sensitive operation, as any failure could render the node's connectivity useless. If we encounter any errors during the migration process, we will attempt to rollback to the previous table ID scheme, to hopefully restore the node connectivity.
If the user wants to downgrade Cilium, the user must first set
--egress-multi-home-ip-rule-compat=true
and restart the agent. Cilium will see that the ENI datapath has been migrated and with the flagtrue
, it will revert the migration. After this is complete, the user may downgrade their version of Cilium. The docs have been updated to mention this.Note, while this does affect Azure IPAM mode as well, v1.7 does not support Azure mode, therefore support for it is missing in this PR. In the forward-ported version of this PR, it will be included.
[1]: See
ip-route(8)
This is a direct backport to the v1.7 branch.
See commit msgs.
Related: #14336