-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for listening gRPC over UNIX socket #1159
Add support for listening gRPC over UNIX socket #1159
Conversation
thanks for the change! i will review this sometime this week. do you have any benchmarks to show case improvements in latency (or throughput) |
Sorry, I had too much fun with Tensorflow and it took a while to get back here. :) Here are some benchmark results from a project that runs Tensorflow Serving on GPUs on Google Kubernetes Engine. Our client app uses Applifier/go-tensorflow to interface with Tensorflow Serving over gRPC. We also built a benchmark tool with the same library. We did two separate runs with the benchmark tool, first by calling Tensorflow Serving over a UNIX domain socket and then by calling it over the default TCP socket. There was a small pause between the runs. This first graph shows the average rate of successful predictions per second. The second graph shows the average endpoint latency seen by the gRPC client. There's a huge difference in latency with the default TCP socket, which is actually why our test run eventually failed to finish. Peak median latency increased from 30 ms to 135 ms, where p99 latency increased from 260 ms to 480 ms. Unfortunately I don't have a pathological and reproducible example about this. |
@netfs Do you still need more info? |
nope. this looks good. thanks for the change, started to take a look/review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great. minor nit and we should be done :-)
thanks for the change!
PiperOrigin-RevId: 231284886
Hi @vtorhonen |
One typical deployment model for Tensorflow Serving is to run it as a sidecar container. With this approach the model is often served over HTTP through a loopback interface. For performance reasons it would make sense to offer a possibility to access Serving over UNIX sockets. This would remove TCP overhead and reduce context switching.
This PR adds a new CLI flag
--grpc_socket_path
. If defined, Serving will listen to a UNIX domain socket at this path. It can be a relative or an absolute path. Note that abstract UNIX sockets are not supported with gRPC. There is an issue about this at grpc/grpc#4677.