Skip to content

Latest commit

 

History

History
36 lines (23 loc) · 1.06 KB

README.md

File metadata and controls

36 lines (23 loc) · 1.06 KB

NoxtuaCompliance

Get Started

This repository contains the logical code to run NoxtuaCompliance with vllm. A Gradio application is used for quick testing with a chat.

Prerequisites

  1. Install Docker and Python (tested with version 3.11.2)

  2. Run vllm

    docker run --runtime nvidia --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host vllm/vllm-openai:v0.6.6.post1 --model xaynetwork/NoxtuaCompliance --tensor-parallel-size=8 --disable-log-requests --max-model-len 120000 --gpu-memory-utilization 0.95

    Adjust tensor-parallel-size to be the amount of available GPUs and to be the same number as specified for the docker command.

  3. Validate hosted model

    curl http://0.0.0.0:8000/v1/models

Setup

pip install -r requirements.txt

Gradio Application

python app.py

This command starts the Gradio application with a chat in the localhost under the specified port 8020. Open the displayed link in the browser, e.g. "http://0.0.0.0:8020" or "http://localhost:8020".