This repository contains a Proof of Concept (PoC) implementation of the Prequal load balancing approach, specifically focusing on asynchronous probing. The implementation is based on the concepts outlined in the "Load is Not What You Should Balance: Introducing Prequal" paper presented at NSDI 2024 by Google. This load balancing paradigm is being used in YouTube and other production systems at Google to run at much higher utilization.
- Implements asynchronous probing to select the best server replica.
- Uses gRPC for communication between the load balancer and backend servers.
- Built using Rust, Tokio, and tonic for asynchronous execution.
- Dynamically selects a backend server based on requests-in-flight (RIF) and estimated latency.
- Avoids traditional CPU load balancing in favor of real-time request latency reduction.
prequal/
│── Cargo.toml # Rust workspace configuration
│── proto/helloworld.proto # gRPC service definitions
│── crates/
│ ├── load-balancer/ # Load balancer implementation
│ ├── clients/ # Client implementations
│ │ ├── client-1/
│ │ ├── client-2/
│ │ ├── client-3/
│ ├── servers/ # Backend server implementations
│ │ ├── server-1/
│ │ ├── server-2/
│ │ ├── server-3/
│ ├── utils/ # Utility functions (latency calculations, median finder, etc.)
- Rust (Latest stable version)
- Cargo (Rust package manager)
- Tokio (Asynchronous runtime)
- tonic (gRPC library for Rust)
- Protobuf Compiler (For generating gRPC bindings)
git clone https://github.com/karribalu/rs-prequal.git
cd rs-prequal
cargo build --release
Start the gRPC load balancer:
cargo run -p load-balancer
Start multiple backend gRPC servers:
cargo run -p server-1
cargo run -p server-2
cargo run -p server-3
Once the load balancer and backend servers are running, you can send gRPC requests using a client:
cargo run -p client-1
The list of backend servers is defined in the .env
file:
SERVER_URLS=http://127.0.0.1:50051,http://127.0.0.1:50052,http://127.0.0.1:50053
- The load balancer receives incoming gRPC requests.
- It asynchronously probes multiple backend servers to measure latency and requests-in-flight (RIF).
- The best server is selected based on median latency and RIF.
- The request is forwarded, and the response is returned to the client.
- Implement hot-cold lexicographic (HCL) rule for better load balancing decisions.
- Add benchmarking tests to compare performance against WRR-based load balancers.
- Improve error handling and fault tolerance mechanisms.
- Introduce health checks for better server selection.
This PoC was implemented by Balasubramanyam as an exploration of Prequal's asynchronous probing mechanism. 🚀