-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ringhash: handle config updates properly #5557
Conversation
// picker is pushed on the channel. | ||
func (s) TestUpdateClientConnState_NewRingSize(t *testing.T) { | ||
addrs := []resolver.Address{{Addr: testBackendAddrStrs[0]}} | ||
cc, b, p0 := setupTest(t, addrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I know it's p0 elsewhere, but for some reason p1 and p2 make more sense in my mind than p0 and p1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
// If the ring configuration has changed, we need to regenerate the ring and | ||
// send a new picker. | ||
var regenerateRing bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In correspondence to your comment on my Outlier Detection PR:
regenerateRing := b.updateAddresses(s.ResolverState.Addresses)
if b.config == nil || b.config.MinRingSize != newConfig.MinRingSize || b.config.MaxRingSize != newConfig.MaxRingSize {
regenerateRing = true
}
b.config = newConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
} | ||
|
||
if regenerateRing { | ||
ring, err := newRing(b.subConns, b.config.MinRingSize, b.config.MaxRingSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes logic, and I don't know if this is correct. I still think it needs to write to b.ring. Otherwise it would persist a built ring map from the previous unrelated good update? Then subsequent ResolverError calls (line 318) and UpdateSubConnState() calls (line 369) call into regeneratePicker which reads b.ring, which would not be in the correct state and "synchronized" with the UpdateClientConnState that caused the SubConn error from the address list. Please triage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier, when we were assigning to b.ring
here and if err != nil
, we were still calling ResolverError
, but not sending a new picker. So, we would have still ended up using the old ring (from the old picker) to service RPCs. So, in that sense, i feel that the behavior has not changed. But either ways, I'm not sure if our existing behavior was right. So, I have posted on our chat room, let's see what comes out of it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, based on the discussion in the chat room, I've change our ring creation routine newRing()
to not return an error.
return balancer.ErrBadResolverState | ||
} | ||
// Successful resolution; clear resolver error and return nil. | ||
b.resolverErr = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't know if this breaks it, but this doesn't clear it if hits line 295, whereas in master it did. Although it feels appropriate here, but please triage if this breaks anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one looks totally fine to me.
// TestUpdateClientConnState_NewRingSize tests the scenario where the ringhash | ||
// LB policy receives new configuration which specifies new values for the ring | ||
// min and max sizes. The test verifies that a new ring is created and a new | ||
// picker is pushed on the channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we're not testing whether the picker is pushed on "the channel". We're testing whether the new picker gets forwarded to this balancer's client conn, which in this case sends on the channel for verification purposes. Would prefer rewording "is pushed on the channel" to "sent to the ClientConn".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
// getWeightAttribute() returns 1 if the weight attribute is not found | ||
// on the address. And since this function is guaranteed to be called | ||
// with a non-empty subConns map, weightSum is guaranteed to be | ||
// non-zero. So, we need not worry about divide by zero error here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, super super minor nit: "need not worry about a divide"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// | ||
// Must be called with a non-empty subConns map. | ||
func normalizeWeights(subConns *resolver.AddressMap) ([]subConnWithWeight, float64) { | ||
var weightSum float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of keeping this uint32, and instead of having another var to convert it like in master, you convert in the denominator float64(weightsum) to avoid cast on line 119. I don't care/don't feel strongly about this but was just proposing a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
// normalizeWeights divides all the weights by the sum, so that the total weight | ||
// is 1. | ||
func normalizeWeights(subConns *resolver.AddressMap) ([]subConnWithWeight, float64, error) { | ||
// | ||
// Must be called with a non-empty subConns map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing "non-empty" encapsualtes nil and len(0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right.
@@ -291,6 +272,29 @@ func (b *ringhashBalancer) UpdateClientConnState(s balancer.ClientConnState) err | |||
b.ResolverError(errors.New("produced zero addresses")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct. This happens before b.subConns gets written to. Thus, line 303 won't hit guaranteed. Say you get a first update with 3 addresses, you make 3 subconns. Then, the next UpdateClientConnState comes in with 0 addresses. Then, this conditional hits, the SubConn len is still 3, and doens't set to connectivity.TransientFailure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I moved it to be right before we go about constructing the ring. So, if we end up with zero addresses, we save the effort of making a new ring.
Thanks for the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I found my first correctness issue ever I'm so proud of myself :D!
Yay Yay Yay !!! |
Fixes #5460
Fix a few things in the config handling routine of the ringhash LB policy:
RELEASE NOTES: