Skip to content

xaynetwork/noxtua-specialized

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NoxtuaCompliance

Get Started

This repository contains the logical code to run NoxtuaCompliance with vllm. A Gradio application is used for quick testing with a chat.

Prerequisites

  1. Install Docker and Python (tested with version 3.11.2)

  2. Run vllm

    docker run --runtime nvidia --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host vllm/vllm-openai:v0.6.6.post1 --model xaynetwork/NoxtuaCompliance --tensor-parallel-size=8 --disable-log-requests --max-model-len 120000 --gpu-memory-utilization 0.95

    Adjust tensor-parallel-size to be the amount of available GPUs and to be the same number as specified for the docker command.

  3. Validate hosted model

    curl http://0.0.0.0:8000/v1/models

Setup

pip install -r requirements.txt

Gradio Application

python app.py

This command starts the Gradio application with a chat in the localhost under the specified port 8020. Open the displayed link in the browser, e.g. "http://0.0.0.0:8020" or "http://localhost:8020".

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages