Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for listening gRPC over UNIX socket #1159

Merged

Conversation

thevilledev
Copy link
Contributor

One typical deployment model for Tensorflow Serving is to run it as a sidecar container. With this approach the model is often served over HTTP through a loopback interface. For performance reasons it would make sense to offer a possibility to access Serving over UNIX sockets. This would remove TCP overhead and reduce context switching.

This PR adds a new CLI flag --grpc_socket_path. If defined, Serving will listen to a UNIX domain socket at this path. It can be a relative or an absolute path. Note that abstract UNIX sockets are not supported with gRPC. There is an issue about this at grpc/grpc#4677.

@netfs
Copy link
Collaborator

netfs commented Oct 29, 2018

thanks for the change! i will review this sometime this week.

do you have any benchmarks to show case improvements in latency (or throughput)
when going over unix domain sockets vs tcp/ip for ML inference? i'd be curious to see
the results.

@thevilledev
Copy link
Contributor Author

Sorry, I had too much fun with Tensorflow and it took a while to get back here. :)

Here are some benchmark results from a project that runs Tensorflow Serving on GPUs on Google Kubernetes Engine. Our client app uses Applifier/go-tensorflow to interface with Tensorflow Serving over gRPC. We also built a benchmark tool with the same library. We did two separate runs with the benchmark tool, first by calling Tensorflow Serving over a UNIX domain socket and then by calling it over the default TCP socket. There was a small pause between the runs.

This first graph shows the average rate of successful predictions per second.

pr1159-predictions-per-second

The second graph shows the average endpoint latency seen by the gRPC client.

pr1159-grpc-endpoint-latency

There's a huge difference in latency with the default TCP socket, which is actually why our test run eventually failed to finish. Peak median latency increased from 30 ms to 135 ms, where p99 latency increased from 260 ms to 480 ms.

Unfortunately I don't have a pathological and reproducible example about this.

@orktes
Copy link

orktes commented Jan 20, 2019

@netfs Do you still need more info?

tensorflow_serving/model_servers/server.cc Outdated Show resolved Hide resolved
tensorflow_serving/model_servers/server.cc Outdated Show resolved Hide resolved
tensorflow_serving/model_servers/server.cc Show resolved Hide resolved
@netfs
Copy link
Collaborator

netfs commented Jan 22, 2019

@netfs Do you still need more info?

nope. this looks good. thanks for the change, started to take a look/review!

Copy link
Collaborator

@netfs netfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. minor nit and we should be done :-)

thanks for the change!

@tensorflow-copybara tensorflow-copybara merged commit dbab51d into tensorflow:master Jan 28, 2019
tensorflow-copybara pushed a commit that referenced this pull request Jan 28, 2019
@ndeepesh
Copy link

Hi @vtorhonen
Do you have the code for the benchmark that you performed? I am trying a similar thing and latency (p99) is worse for unix domain socket. Just wanted to understand if you ran your configuration with some specific configurations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants