Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spanner): Adding GFE Latency and Header Missing Count Metrics #5199

Merged
merged 16 commits into from
Jan 4, 2022

Conversation

asthamohta
Copy link
Contributor

Benchmarking:
Each function was performed 100 times in a test and the time for it is noted . This is averaged out across 50 runs. Below is the summarisation of the numbers:. Following are the results:

Functionality ReadRow InsertOrUpdate Query
Time without adding GFE Latency Feature 18.47518476 23.79237414 17.92703158
Time with GFE Latency added and Disabled 18.84231938 21.38502053 23.93358834
Time with GFE Latency added and Enabled 19.45244371 22.03212286 21.28579487

For full details of each run, check here

tritone and others added 3 commits December 7, 2021 12:13
Allows users to supply a custom ErrorFunc which will be used
to determine whether the error is retryable.
Copy link
Contributor

@olavloite olavloite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to change the mock server so it actually returns a server-timing header. To do this, add the following lines to just before the return statement of the BatchCreateSessions method in internal/testutil/inmem_spanner_server.go (before line 690):

	header := metadata.New(map[string]string{"server-timing": "123"})
	if err := grpc.SendHeader(ctx, header); err != nil {
		return nil, gstatus.Errorf(codes.Internal, "unable to send 'server-timing' header")
	}

(And this also needs to be added to all the other RPCs that should also return a server timing value)

spanner/oc_test.go Outdated Show resolved Hide resolved
spanner/oc_test.go Outdated Show resolved Hide resolved
spanner/oc_test.go Show resolved Hide resolved
spanner/sessionclient.go Outdated Show resolved Hide resolved
spanner/sessionclient.go Outdated Show resolved Hide resolved
@asthamohta asthamohta force-pushed the GFELatency branch 3 times, most recently from e4ae8e1 to 7c462cf Compare December 9, 2021 20:34
@asthamohta asthamohta force-pushed the GFELatency branch 3 times, most recently from 30558b1 to bcc20d2 Compare December 12, 2021 20:10
@asthamohta asthamohta force-pushed the GFELatency branch 4 times, most recently from 7e2430f to f2646e9 Compare December 13, 2021 13:47
spanner/batch.go Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/oc_test.go Outdated Show resolved Hide resolved
spanner/sessionclient.go Outdated Show resolved Hide resolved
spanner/stats.go Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/client.go Outdated Show resolved Hide resolved
spanner/internal/testutil/inmem_spanner_server.go Outdated Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
spanner/transaction.go Show resolved Hide resolved
@asthamohta asthamohta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 15, 2021
@vi3k6i5 vi3k6i5 added kokoro:force-run Add this label to force Kokoro to re-run the tests. and removed kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 15, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 15, 2021
@asthamohta asthamohta requested a review from olavloite December 15, 2021 22:12
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated
if err != nil {
return client, err
}
md, errGFE := client.Header()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer err instead of errGFE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Knut, the reason why I did this was because in someplaces I was worries that the error was getting returned at the end of the function, example here . So will have to use a separate error in such places. Hence to seem consistent I used errGFE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand that. But I think that it is better to follow the standard naming of Golang when possible, and only make an exception when you need to. In this case for example, you no longer need to return err at the end of the function anymore, as you know that it is nil because of the if statement that was added above. So the last statement in the function can now be changed to return client, nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I looked into it further and I realised if I define err inside the loop even in the case I described I can use it as the scope of that variable will be limited to the if condition loop. So i think it should be okay to do that. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure I understand what you mean in this case. But if you mean that you can define err only inside the if-block, then that would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what I mean

spanner/client.go Outdated Show resolved Hide resolved
@asthamohta asthamohta force-pushed the GFELatency branch 2 times, most recently from 13d936d to 1641bce Compare December 20, 2021 13:44
@asthamohta asthamohta requested a review from olavloite December 21, 2021 04:22
spanner/batch.go Outdated Show resolved Hide resolved
spanner/batch.go Outdated
if err != nil {
return client, err
}
md, errGFE := client.Header()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand that. But I think that it is better to follow the standard naming of Golang when possible, and only make an exception when you need to. In this case for example, you no longer need to return err at the end of the function anymore, as you know that it is nil because of the if statement that was added above. So the last statement in the function can now be changed to return client, nil

spanner/client.go Outdated Show resolved Hide resolved
@@ -136,7 +140,13 @@ func (t *BatchReadOnlyTransaction) PartitionReadUsingIndexWithOptions(ctx contex
Columns: columns,
KeySet: kset,
PartitionOptions: opt.toProto(),
})
}, gax.WithGRPCOptions(grpc.Header(&md)))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere: I think it is better to add an if err != nil { return err }` statement before checking for the latency values. I worry that we otherwise might start returning errors that are harder to interpret for users if the following happens:

  1. The client.PartitionRead returns an error. That error is not checked here at the moment.
  2. The RPC did return a header (so md != nil), but that header does not contain a GFE latency value. So instead of the error that was returned from PartitionedRead, we return an error that indicates that it could not find a GFE Latency value.

I worry that the above would be confusing for users, and make it harder to debug errors that occur because of for example invalid requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually if there is no GFE Latency value it won't throw an error. Instead it will add to the header missing count. It only throws an error if it got a garbage value in the value for the key value pair "server-timing". Also currently in the Java implementation, they intercept every rpc call(even when there is a failed operation) and use it to increase the header missing count. I feel if we don't add to the Header missing count in this case then across languages the header missing count might be different in someway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel if we don't add to the Header missing count in this case then across languages the header missing count might be different in someway

That's a good point. So agreed to keep this as is on that.

I still worry a little bit that we might be returning an error from extracting the metadata instead of the error from the actual operation. I think we had a discussion about that earlier, and agreed on returning the error from those methods. I think I'm starting to change my mind on that. It is not an error that the user can do anything about, and it might just cover other potentially more important errors. So I would suggest then ignoring those errors. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log these errors instead then? How do we let the user know that an error has occurred?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have access to the logger, then logging it is probably a good idea. I don't think users can do anything with these errors anyways, so I don't think we need to actively inform them any more than that (and the chance that it happens should also be minimal, as it would mean that the data that is sent back is invalid)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our conversation going to add a trace instead of returning an error

spanner/batch.go Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
spanner/transaction.go Show resolved Hide resolved
spanner/transaction.go Show resolved Hide resolved
spanner/transaction.go Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
@asthamohta asthamohta added kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. labels Dec 22, 2021
@kokoro-team kokoro-team removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Dec 22, 2021
@asthamohta asthamohta requested a review from olavloite December 22, 2021 10:24
spanner/batch.go Outdated
Comment on lines 325 to 327
if err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this early return is no longer needed now that we don't return an error if getting the latency stats fails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still needed as there might be an error in getting the Header, and in that case we return the error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that happens on line 333. The reason that we added the early return above in the first place, was that we would return the error that might be returned createContextAndCaptureGFELatencyMetrics. As we don't do that anymore, we can safely wait until line 333 before returning the error from the client.Header() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then won't md, err := client.Header() override the value of the original error. I will have to add a separate check as md, errGFE : = client.Header()
if errGFE != nil {
return nil, errGFE
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we mean the same return in this case. I think you can safely return the if err != nil block starting at line 325. The err on line 329 will not override the err from the call to client.Header(), as that is a different variable, although it does have the same name. That err is only visible in the statement on line 329.

So to summarize:

  1. The if err != nil on line 321 should remain.
  2. The if err != nil on line 325 can be removed. (And yes, the md, err := client.Header() does override the previous value of err, but that is not a problem as we know that err at that point is nil)
  3. The if err := createContextAndCaptureGFELatencyMetrics on line 329 can safely remain. That will not override anything, as it is a separate variable only defined on line 329.
  4. The return client, err on line 333 will therefore always return the value of err that was returned by client.Header().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay makes sense. My bad. I thought you meant remove the error return on line 321. Will do that.

spanner/batch.go Outdated Show resolved Hide resolved
spanner/sessionclient.go Outdated Show resolved Hide resolved
spanner/stats.go Show resolved Hide resolved
spanner/stats.go Outdated Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
spanner/transaction.go Outdated Show resolved Hide resolved
@asthamohta asthamohta marked this pull request as ready for review December 22, 2021 17:33
@asthamohta asthamohta requested review from a team as code owners December 22, 2021 17:33
@codyoss codyoss changed the title Feat: Adding GFE Latency and Header Missing Count Metrics feat(spanner): Adding GFE Latency and Header Missing Count Metrics Dec 22, 2021
@asthamohta asthamohta merged commit 3d8a9ea into googleapis:main Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants