Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added vLLM doc page since we support it #545

Merged
merged 3 commits into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/koboldcpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the
...
```

If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
If you have an existing agent that you want to move to the koboldcpp backend, add extra flags to `memgpt run`:
```sh
memgpt run --agent your_agent --model-endpoint-type koboldcpp --model-endpoint http://localhost:5001
```
2 changes: 1 addition & 1 deletion docs/llamacpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the
...
```

If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
If you have an existing agent that you want to move to the llama.cpp backend, add extra flags to `memgpt run`:
```sh
memgpt run --agent your_agent --model-endpoint-type llamacpp --model-endpoint http://localhost:8080
```
2 changes: 1 addition & 1 deletion docs/lmstudio.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the
...
```

If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
If you have an existing agent that you want to move to the LM Studio backend, add extra flags to `memgpt run`:
```sh
memgpt run --agent your_agent --model-endpoint-type lmstudio --model-endpoint http://localhost:1234
```
2 changes: 1 addition & 1 deletion docs/ollama.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ In your terminal where you're running MemGPT, run `memgpt configure` to set the
...
```

If you have an existing agent that you want to move to the web UI backend, add extra flags to `memgpt run`:
If you have an existing agent that you want to move to the Ollama backend, add extra flags to `memgpt run`:
```sh
# use --model to switch Ollama models (always include the full Ollama model name with the tag)
# use --model-wrapper to switch model wrappers
Expand Down
26 changes: 26 additions & 0 deletions docs/vllm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
1. Download + install [vLLM](https://docs.vllm.ai/en/latest/getting_started/installation.html) and the model you want to test with
2. Launch a vLLM **OpenAI-compatible** API server using [the official vLLM documentation](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)


For example, if we want to use the model `dolphin-2.2.1-mistral-7b` from [HuggingFace](https://huggingface.co/ehartford/dolphin-2.2.1-mistral-7b), we would run:
```sh
python -m vllm.entrypoints.openai.api_server \
cpacker marked this conversation as resolved.
Show resolved Hide resolved
--model ehartford/dolphin-2.2.1-mistral-7b
```

vLLM will automatically download the model (if it's not already downloaded) and store it in your [HuggingFace cache directory](https://huggingface.co/docs/datasets/cache).

In your terminal where you're running MemGPT, run `memgpt configure` to set the default backend for MemGPT to point at vLLM:
```
# if you are running vLLM locally, the default IP address + port will be http://localhost:8000
? Select LLM inference provider: local
? Select LLM backend (select 'openai' if you have an OpenAI compatible proxy): vllm
? Enter default endpoint: http://localhost:8000
? Enter HuggingFace model tag (e.g. ehartford/dolphin-2.2.1-mistral-7b): ehartford/dolphin-2.2.1-mistral-7b
...
```

If you have an existing agent that you want to move to the vLLM backend, add extra flags to `memgpt run`:
```sh
memgpt run --agent your_agent --model-endpoint-type vLLM --model-endpoint http://localhost:8000
sarahwooders marked this conversation as resolved.
Show resolved Hide resolved
```
5 changes: 3 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,9 @@ nav:
# - 'oobabooga web UI (on RunPod)': webui_runpod.md
- 'LM Studio': lmstudio.md
- 'llama.cpp': llamacpp.md
- 'koboldcpp': koboldcpp.md
- 'ollama': ollama.md
- 'KoboldCpp': koboldcpp.md
- 'Ollama': ollama.md
- 'vLLM': vllm.md
- 'Troubleshooting': local_llm_faq.md
- 'Customizing MemGPT':
- 'Creating new MemGPT presets': presets.md
Expand Down