Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: NVIDIA API Catalog and NIMs #545

Merged
merged 5 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 66 additions & 2 deletions docs/user_guides/configuration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,73 @@ The meaning of the attributes is as follows:

You can use any LLM provider that is supported by LangChain, e.g., `ai21`, `aleph_alpha`, `anthropic`, `anyscale`, `azure`, `cohere`, `huggingface_endpoint`, `huggingface_hub`, `openai`, `self_hosted`, `self_hosted_hugging_face`. Check out the LangChain official documentation for the full list.

**NOTE**: to use any of the providers, you will need to install additional packages; when you first try to use a configuration with a new provider, you will typically receive an error from LangChain that will instruct you on what packages should be installed.
```{note}
To use any of the providers, you will need to install additional packages; when you first try to use a configuration with a new provider, you will typically receive an error from LangChain that will instruct you on what packages should be installed.
```

```{important}
While from a technical perspective, you can instantiate any of the LLM providers above, depending on the capabilities of the model, some will work better than others with the NeMo Guardrails toolkit. The toolkit includes prompts that have been optimized for certain types of models (e.g., `openai`, `nemollm`). For others, you can optimize the prompts yourself (see the [LLM Prompts](#llm-prompts) section).
```
#### NIM for LLMs

[NVIDIA NIM](https://docs.nvidia.com/nim/index.html) is a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across the cloud, data center, and workstations.
[NVIDIA NIM for LLMs](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html) brings the power of state-of-the-art LLMs to enterprise applications, providing unmatched natural language processing and understanding capabilities. [Learn more about NIMs](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/).

NeMo Guardrails supports connecting to a NIM as follows:

```yaml
models:
- type: main
engine: nim
model: <MODEL_NAME>
parameters:
base_url: <NIM_ENDPOINT_URL>
```

For example, to connect to a locally deployed `meta/llama3-8b-instruct` model, on port 8000, use the following model configuration:

```yaml
models:
- type: main
engine: nim
model: meta/llama3-8b-instruct
parameters:
base_url: http://localhost:8000/v1
```

```{important}
To use the `nim` LLM provider, you must install the `langchain-nvidia-ai-endpoints` package (`pip install langchain-nvidia-ai-endpoints`).
```


#### NVIDIA AI Endpoints

[NVIDIA AI Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Llama 3, Mixtral 8x7B, Stable Diffusion, etc.
These models, hosted on the [NVIDIA API catalog](https://build.nvidia.com/), are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.

To use an LLM model through the NVIDIA AI Endpoints, use the following model configuration:

```yaml
models:
- type: main
engine: nvidia_ai_endpoints
model: <MODEL_NAME>
```

For example, to use the `llama3-8b-instruct` model, use the following model configuration:

```yaml
models:
- type: main
engine: nvidia_ai_endpoints
model: meta/llama3-8b-instruct
```

```{important}
To use the `nvidia_ai_endpoints` LLM provider, you must install the `langchain-nvidia-ai-endpoints` package (`pip install langchain-nvidia-ai-endpoints`) and configure a valid `NVIDIA_API_KEY`.
```

**IMPORTANT**: while from a technical perspective, you can instantiate any of the LLM providers above, depending on the capabilities of the model, some will work better than others with the NeMo Guardrails toolkit. The toolkit includes prompts that have been optimized for certain types of models (e.g., `openai`, `nemollm`). For others, you can optimize the prompts yourself (see the [LLM Prompts](#llm-prompts) section).
For more details, check out this [user guide](./llm/nvidia_ai_endpoints/README.md).

Here's an example configuration for using `llama3` model with [Ollama](https://ollama.com/):

Expand Down
20 changes: 7 additions & 13 deletions docs/user_guides/llm/nvidia_ai_endpoints/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Using LLMs hosted on NVIDIA API Catalog

This guide teaches you how to use NeMo Guardrails with LLMs hosted on NVIDIA API Catalog. It uses the [ABC Bot configuration](../../../../examples/bots/abc) and changes the model to `ai-mixtral-8x7b-instruct`.
This guide teaches you how to use NeMo Guardrails with LLMs hosted on NVIDIA API Catalog. It uses the [ABC Bot configuration](../../../../examples/bots/abc) and changes the model to `meta/llama3-70b-instruct`.

## Prerequisites

Expand All @@ -12,17 +12,11 @@ Before you begin, ensure you have the following prerequisites in place:
pip install -U --quiet langchain-nvidia-ai-endpoints
```

```

[notice] A new release of pip is available: 23.3.2 -> 24.0
[notice] To update, run: pip install --upgrade pip
```

2. An NVIDIA NGC account to access AI Foundation Models. To create a free account go to [NVIDIA NGC website](https://ngc.nvidia.com/).

3. An API key from NVIDIA API Catalog:
- Generate an API key by navigating to the AI Foundation Models section on the NVIDIA NGC website, selecting a model with an API endpoint, and generating an API key.
- Export the NVIDIA API key as an environment variable:
- Generate an API key by navigating to the AI Foundation Models section on the NVIDIA NGC website, selecting a model with an API endpoint, and generating an API key. You can use this API key for all models available in the NVIDIA API Catalog.
- Export the NVIDIA API key as an environment variable:

```bash
export NVIDIA_API_KEY=$NVIDIA_API_KEY # Replace with your own key
Expand Down Expand Up @@ -51,13 +45,13 @@ Update the `models` section of the `config.yml` file to the desired model suppor
models:
- type: main
engine: nvidia_ai_endpoints
model: ai-mixtral-8x7b-instruct
model: meta/llama3-70b-instruct
...
```

## Usage

Load the guardrails configuration:
Load the guardrail configuration:

```python
from nemoguardrails import LLMRails, RailsConfig
Expand All @@ -82,11 +76,11 @@ print(response['content'])
```

```
The ABC Company provides eligible employees with 20 days of paid vacation time
According to the employee handbook, eligible employees are entitled to 20 days of paid vacation per year, accrued monthly.
```

You can see that the bot responds correctly.

## Conclusion

In this guide, you learned how to connect a NeMo Guardrails configuration to an NVIDIA API Catalog LLM model. This guide uses `ai-mixtral-8x7b-instruct`, however, you can connect any other model by following the same steps.
In this guide, you learned how to connect a NeMo Guardrails configuration to an NVIDIA API Catalog LLM model. This guide uses `meta/llama3-70b-instruct`, however, you can connect any other model by following the same steps.
Loading