a Ray Serve Chat Demo Serving Hugging Face Models
-
Open Up io.net account
-
Follow through standard procedure on launching a Ray Cluster. Select a small cluster, for example 4 T4.
-
When the cluster is ready, select Visual Studio Code (VSCode)
-
Launch Visual studio code terminal and clone this repo
git clone https://github.com/ionet-official/io-ray-serve-chat-demo.git
- Go to the folder
cd io-ray-serve-chat-demo
- Start the chat server via
serve run chat.yaml
- Wait till the Ray serve deploys the chat app across workers. You will see on the terminal a "Model loaded" message.
- Test your Chatbot from the cluster. Open a new terminal and run the sample chat client
python chat_client.py
- Test your Chatbot server endpoint from outside the Cluster
- Server endpoint:
https://exposed-service-[YOUR-CLUSTER-SUFFIX].tunnels.io.systems/
- If your cluster suffix is
1d47a
, then:https://exposed-service-1d47a.tunnels.io.systems/
- One way to identify your prefix is from the the VSCode URL, which looks like
https://vscode-1d47a.tunnels.io.systems/
- You can use below code snippet to interact with the Ray serve application created (update the endpoint to your server)
- Server endpoint:
import requests
SERVER_ENDPOINT = "https://exposed-service-1d47a.tunnels.io.systems/"
message = "What is the capital of France?"
history = []
response = requests.post(SERVER_ENDPOINT, json={"user_input": message, "history": history})
print(response.json())
or on a terminal:
curl -X POST https://exposed-service-1d47a.tunnels.io.systems/ \
-H "Content-Type: application/json" \
-d '{"user_input": "What is the capital of France?", "history": []}'