-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(spanner): Adding GFE Latency and Header Missing Count Metrics #5199
Conversation
Allows users to supply a custom ErrorFunc which will be used to determine whether the error is retryable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to change the mock server so it actually returns a server-timing header. To do this, add the following lines to just before the return statement of the BatchCreateSessions
method in internal/testutil/inmem_spanner_server.go
(before line 690):
header := metadata.New(map[string]string{"server-timing": "123"})
if err := grpc.SendHeader(ctx, header); err != nil {
return nil, gstatus.Errorf(codes.Internal, "unable to send 'server-timing' header")
}
(And this also needs to be added to all the other RPCs that should also return a server timing value)
e4ae8e1
to
7c462cf
Compare
30558b1
to
bcc20d2
Compare
7e2430f
to
f2646e9
Compare
f2646e9
to
83cba34
Compare
spanner/batch.go
Outdated
if err != nil { | ||
return client, err | ||
} | ||
md, errGFE := client.Header() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer err
instead of errGFE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Knut, the reason why I did this was because in someplaces I was worries that the error was getting returned at the end of the function, example here . So will have to use a separate error in such places. Hence to seem consistent I used errGFE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I understand that. But I think that it is better to follow the standard naming of Golang when possible, and only make an exception when you need to. In this case for example, you no longer need to return err
at the end of the function anymore, as you know that it is nil
because of the if
statement that was added above. So the last statement in the function can now be changed to return client, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I looked into it further and I realised if I define err inside the loop even in the case I described I can use it as the scope of that variable will be limited to the if condition loop. So i think it should be okay to do that. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure I understand what you mean in this case. But if you mean that you can define err
only inside the if-block, then that would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what I mean
13d936d
to
1641bce
Compare
spanner/batch.go
Outdated
if err != nil { | ||
return client, err | ||
} | ||
md, errGFE := client.Header() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I understand that. But I think that it is better to follow the standard naming of Golang when possible, and only make an exception when you need to. In this case for example, you no longer need to return err
at the end of the function anymore, as you know that it is nil
because of the if
statement that was added above. So the last statement in the function can now be changed to return client, nil
@@ -136,7 +140,13 @@ func (t *BatchReadOnlyTransaction) PartitionReadUsingIndexWithOptions(ctx contex | |||
Columns: columns, | |||
KeySet: kset, | |||
PartitionOptions: opt.toProto(), | |||
}) | |||
}, gax.WithGRPCOptions(grpc.Header(&md))) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere: I think it is better to add an if err != nil
{ return err }` statement before checking for the latency values. I worry that we otherwise might start returning errors that are harder to interpret for users if the following happens:
- The
client.PartitionRead
returns an error. That error is not checked here at the moment. - The RPC did return a header (so
md != nil
), but that header does not contain a GFE latency value. So instead of the error that was returned fromPartitionedRead
, we return an error that indicates that it could not find a GFE Latency value.
I worry that the above would be confusing for users, and make it harder to debug errors that occur because of for example invalid requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if there is no GFE Latency value it won't throw an error. Instead it will add to the header missing count. It only throws an error if it got a garbage value in the value for the key value pair "server-timing". Also currently in the Java implementation, they intercept every rpc call(even when there is a failed operation) and use it to increase the header missing count. I feel if we don't add to the Header missing count in this case then across languages the header missing count might be different in someway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel if we don't add to the Header missing count in this case then across languages the header missing count might be different in someway
That's a good point. So agreed to keep this as is on that.
I still worry a little bit that we might be returning an error from extracting the metadata instead of the error from the actual operation. I think we had a discussion about that earlier, and agreed on returning the error from those methods. I think I'm starting to change my mind on that. It is not an error that the user can do anything about, and it might just cover other potentially more important errors. So I would suggest then ignoring those errors. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log these errors instead then? How do we let the user know that an error has occurred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have access to the logger, then logging it is probably a good idea. I don't think users can do anything with these errors anyways, so I don't think we need to actively inform them any more than that (and the chance that it happens should also be minimal, as it would mean that the data that is sent back is invalid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per our conversation going to add a trace instead of returning an error
1641bce
to
a3aff82
Compare
spanner/batch.go
Outdated
if err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this early return is no longer needed now that we don't return an error if getting the latency stats fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still needed as there might be an error in getting the Header, and in that case we return the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that happens on line 333. The reason that we added the early return above in the first place, was that we would return the error that might be returned createContextAndCaptureGFELatencyMetrics
. As we don't do that anymore, we can safely wait until line 333 before returning the error from the client.Header()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then won't md, err := client.Header() override the value of the original error. I will have to add a separate check as md, errGFE : = client.Header()
if errGFE != nil {
return nil, errGFE
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we mean the same return in this case. I think you can safely return the if err != nil
block starting at line 325. The err
on line 329 will not override the err
from the call to client.Header()
, as that is a different variable, although it does have the same name. That err
is only visible in the statement on line 329.
So to summarize:
- The
if err != nil
on line 321 should remain. - The
if err != nil
on line 325 can be removed. (And yes, themd, err := client.Header()
does override the previous value oferr
, but that is not a problem as we know thaterr
at that point isnil
) - The
if err := createContextAndCaptureGFELatencyMetrics
on line 329 can safely remain. That will not override anything, as it is a separate variable only defined on line 329. - The
return client, err
on line 333 will therefore always return the value oferr
that was returned byclient.Header()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay makes sense. My bad. I thought you meant remove the error return on line 321. Will do that.
Benchmarking:
Each function was performed 100 times in a test and the time for it is noted . This is averaged out across 50 runs. Below is the summarisation of the numbers:. Following are the results:
For full details of each run, check here