Skip to content

Commit

Permalink
Merge pull request #647 from NVIDIA/docs/doc-updates-0.9.1
Browse files Browse the repository at this point in the history
Documentation updates for 0.9.1.
  • Loading branch information
drazvan authored Jul 24, 2024
2 parents d9a6215 + 20da46c commit 6354436
Show file tree
Hide file tree
Showing 4 changed files with 55 additions and 67 deletions.
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ The input and output format for the `generate` method is similar to the [Chat Co

#### Async API

NeMo Guardrails is an async-first toolkit, i.e., the core mechanics are implemented using the Python async model. The public methods have both a sync and an async version (e.g., `LLMRails.generate` and `LLMRails.generate_async`).
NeMo Guardrails is an async-first toolkit, which means that the core mechanics are implemented using the Python async model. The public methods have both a sync and an async version, such as `LLMRails.generate` and `LLMRails.generate_async`.

### Supported LLMs

Expand All @@ -116,7 +116,7 @@ NeMo Guardrails supports five main types of guardrails:

1. **Input rails**: applied to the input from the user; an input rail can reject the input, stopping any additional processing, or alter the input (e.g., to mask potentially sensitive data, to rephrase).

2. **Dialog rails**: influence how the LLM is prompted; dialog rails operate on canonical form messages (more details [here](https://docs.nvidia.com/nemo/guardrails/user_guides/colang-language-syntax-guide.html)) and determine if an action should be executed, if the LLM should be invoked to generate the next step or a response, if a predefined response should be used instead, etc.
2. **Dialog rails**: influence how the LLM is prompted; dialog rails operate on canonical form messages for details see [Colang Guide](https://docs.nvidia.com/nemo/guardrails/user_guides/colang-language-syntax-guide.html)) and determine if an action should be executed, if the LLM should be invoked to generate the next step or a response, if a predefined response should be used instead, etc.

3. **Retrieval rails**: applied to the retrieved chunks in the case of a RAG (Retrieval Augmented Generation) scenario; a retrieval rail can reject a chunk, preventing it from being used to prompt the LLM, or alter the relevant chunks (e.g., to mask potentially sensitive data).

Expand All @@ -140,7 +140,7 @@ The standard structure for a guardrails configuration folder looks like this:
│ ├── ...
```

The `config.yml` contains all the general configuration options (e.g., LLM models, active rails, custom configuration data), the `config.py` contains any custom initialization code and the `actions.py` contains any custom python actions. For a complete overview, check out the [Configuration Guide](https://docs.nvidia.com/nemo/guardrails/user_guides/configuration-guide.html).
The `config.yml` contains all the general configuration options, such as LLM models, active rails, and custom configuration data". The `config.py` file contains any custom initialization code and the `actions.py` contains any custom python actions. For a complete overview, see the [Configuration Guide](https://docs.nvidia.com/nemo/guardrails/user_guides/configuration-guide.html).

Below is an example `config.yml`:

Expand Down Expand Up @@ -208,19 +208,25 @@ define flow

To configure and implement various types of guardrails, this toolkit introduces **Colang**, a modeling language specifically created for designing flexible, yet controllable, dialogue flows. Colang has a python-like syntax and is designed to be simple and intuitive, especially for developers.

**NOTE**: Currently two versions of Colang are supported (1.0 and 2.0-beta) and Colang 1.0 is the default. Versions 0.1.0 up to 0.7.1 of NeMo Guardrails used Colang 1.0 exclusively. Versions 0.8.0 introduced Colang 2.0-alpha and version 0.9.0 introduced Colang 2.0-beta. We expect Colang 2.0 to go out of Beta and replace 1.0 as the default option in NeMo Guardrails version 0.11.0.
**NOTE**: Currently two versions of Colang, 1.0 and 2.0, are supported and Colang 1.0 is the default. Versions 0.1.0 up to 0.7.1 of NeMo Guardrails used Colang 1.0 exclusively. Versions 0.8.0 introduced Colang 2.0-alpha and version 0.9.0 introduced Colang 2.0-beta. We expect Colang 2.0 to go out of Beta and replace 1.0 as the default option in NeMo Guardrails version 0.11.0.

For a brief introduction to the Colang 1.0 syntax, check out the [Colang 1.0 Language Syntax Guide](https://docs.nvidia.com/nemo/guardrails/user_guides/colang-language-syntax-guide.html).
For a brief introduction to the Colang 1.0 syntax, see the [Colang 1.0 Language Syntax Guide](https://docs.nvidia.com/nemo/guardrails/user_guides/colang-language-syntax-guide.html).

To get started with Colang 2.0, check out the [Colang 2.0 Documentation](https://docs.nvidia.com/nemo/guardrails/colang_2/overview.html).
To get started with Colang 2.0, see the [Colang 2.0 Documentation](https://docs.nvidia.com/nemo/guardrails/colang_2/overview.html).

### Guardrails Library

NeMo Guardrails comes with a set of [built-in guardrails](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html).

> **NOTE**: The built-in guardrails are only intended to enable you to get started quickly with NeMo Guardrails. For production use cases, further development and testing of the rails are needed.

Currently, the guardrails library includes guardrails for: [jailbreak detection](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#jailbreak-detection-heuristics), [output moderation](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#self-check-output), [fact-checking](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#fact-checking), [sensitive data detection](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#presidio-based-sensitive-data-detection), [hallucination detection](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#hallucination-detection), [input moderation using ActiveFence](<<https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#activefence>), [hallucination detection for RAG applications using Got It AI's TruthChecker API](docs/user_guides/guardrails-library.md#got-it-ai), and [RAG hallucination detection using Patronus Lynx](docs/user_guides/guardrails-library.md#patronus-lynx-based-rag-hallucination-detection).
Currently, the guardrails library includes guardrails for:
- [jailbreak detection](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#jailbreak-detection-heuristics)
- [output moderation](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#self-check-output)
- [fact-checking](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#fact-checking)
- [sensitive data detection](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#presidio-based-sensitive-data-detection), [hallucination detection](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#hallucination-detection)
- [input moderation using ActiveFence] (https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html#activefence>), [hallucination detection for RAG applications using Got It AI's TruthChecker API](docs/user_guides/guardrails-library.md#got-it-ai)
[RAG hallucination detection using Patronus Lynx](docs/user_guides/guardrails-library.md#patronus-lynx-based-rag-hallucination-detection)

## CLI

Expand Down Expand Up @@ -266,7 +272,7 @@ Sample output:

#### Docker

To start a guardrails server, you can also use a Docker container. NeMo Guardrails provides a [Dockerfile](./Dockerfile) that you can use to build a `nemoguardrails` image. For more details, check out the guide for [using Docker](https://docs.nvidia.com/nemo/guardrails/user_guides/advanced/using-docker.html).
To start a guardrails server, you can also use a Docker container. NeMo Guardrails provides a [Dockerfile](./Dockerfile) that you can use to build a `nemoguardrails` image. For further information, see the [using Docker](https://docs.nvidia.com/nemo/guardrails/user_guides/advanced/using-docker.html) section.

## Integration with LangChain

Expand Down
22 changes: 11 additions & 11 deletions docs/user_guides/configuration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,11 @@ The meaning of the attributes is as follows:
You can use any LLM provider that is supported by LangChain, e.g., `ai21`, `aleph_alpha`, `anthropic`, `anyscale`, `azure`, `cohere`, `huggingface_endpoint`, `huggingface_hub`, `openai`, `self_hosted`, `self_hosted_hugging_face`. Check out the LangChain official documentation for the full list.

```{note}
To use any of the providers, you will need to install additional packages; when you first try to use a configuration with a new provider, you will typically receive an error from LangChain that will instruct you on what packages should be installed.
To use any of the providers, you must install additional packages; when you first try to use a configuration with a new provider, you will typically receive an error from LangChain that will instructs which packages you should install.
```

```{important}
While from a technical perspective, you can instantiate any of the LLM providers above, depending on the capabilities of the model, some will work better than others with the NeMo Guardrails toolkit. The toolkit includes prompts that have been optimized for certain types of models (e.g., `openai`, `nemollm`). For others, you can optimize the prompts yourself (see the [LLM Prompts](#llm-prompts) section).
Although you can instantiate any of the previously mentioned LLM providers, depending on the capabilities of the model, the NeMo Guardrails toolkit works better with some providers than others. The toolkit includes prompts that have been optimized for certain types of models, such as `openai` and `nemollm`. For others, you can optimize the prompts yourself following the information in the [LLM Prompts](#llm-prompts) section.
```
#### NIM for LLMs

Expand Down Expand Up @@ -120,13 +120,13 @@ models:
```

```{important}
To use the `nim` LLM provider, you must install the `langchain-nvidia-ai-endpoints` package (`pip install langchain-nvidia-ai-endpoints`).
To use the `nim` LLM provider, install the `langchain-nvidia-ai-endpoints` package using the command `pip install langchain-nvidia-ai-endpoints`.
```


#### NVIDIA AI Endpoints

[NVIDIA AI Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Llama 3, Mixtral 8x7B, Stable Diffusion, etc.
[NVIDIA AI Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models such as Llama 3, Mixtral 8x7B, and Stable Diffusion.
These models, hosted on the [NVIDIA API catalog](https://build.nvidia.com/), are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.

To use an LLM model through the NVIDIA AI Endpoints, use the following model configuration:
Expand All @@ -148,10 +148,10 @@ models:
```

```{important}
To use the `nvidia_ai_endpoints` LLM provider, you must install the `langchain-nvidia-ai-endpoints` package (`pip install langchain-nvidia-ai-endpoints`) and configure a valid `NVIDIA_API_KEY`.
To use the `nvidia_ai_endpoints` LLM provider, you must install the `langchain-nvidia-ai-endpoints` package using the command `pip install langchain-nvidia-ai-endpoints`, and configure a valid `NVIDIA_API_KEY`.
```

For more details, check out this [user guide](./llm/nvidia_ai_endpoints/README.md).
For further information, see the [user guide](./llm/nvidia_ai_endpoints/README.md).

Here's an example configuration for using `llama3` model with [Ollama](https://ollama.com/):

Expand Down Expand Up @@ -257,7 +257,7 @@ models:

### The Embeddings Model

To configure the embedding model used for the various steps in the [guardrails process](../architecture/README.md) (e.g., canonical form generation, next step generation), you can add a model configuration in the `models` key as shown below:
To configure the embedding model used for the various steps in the [guardrails process](../architecture/README.md), such as canonical form generation and next step generation, add a model configuration in the `models` key as shown in the following configuration file:

```yaml
models:
Expand All @@ -279,7 +279,7 @@ models:

#### Supported Embedding Providers

The complete list of supported embedding providers is the following:
The following tables lists the supported embedding providers:

| Provider Name | `engine_name` | `model` |
|----------------------|------------------------|------------------------------------|
Expand All @@ -289,7 +289,7 @@ The complete list of supported embedding providers is the following:
| NVIDIA AI Endpoints | `nvidia_ai_endpoints` | `nv-embed-v1`, etc. |

```{note}
For any of the supported embedding providers you can use any of the supported models.
You can use any of the supported models for any of the supported embedding providers.
The previous table includes an example of a model that can be used.
```

Expand All @@ -298,7 +298,7 @@ The previous table includes an example of a model that can be used.
You can also register a custom embedding provider by using the `LLMRails.register_embedding_provider` function.

To register a custom LLM provider,
you need to create a class that inherits from `EmbeddingModel` and register it in your `config.py`.
create a class that inherits from `EmbeddingModel` and register it in your `config.py`.

```python
from typing import List
Expand Down Expand Up @@ -354,7 +354,7 @@ models:

### Embedding Search Provider

NeMo Guardrails uses embedding search (a.k.a. vector databases) for implementing the [guardrails process](../architecture/README.md#the-guardrails-process) and for the [knowledge base](#knowledge-base-documents) functionality. The default embedding search uses FastEmbed for computing the embeddings (the `all-MiniLM-L6-v2` model) and [Annoy](https://github.com/spotify/annoy) for performing the search. As shown in the previous section, the embeddings model supports both FastEmbed and OpenAI. SentenceTransformers is also supported.
NeMo Guardrails uses embedding search, also called vector databases, for implementing the [guardrails process](../architecture/README.md#the-guardrails-process) and for the [knowledge base](#knowledge-base-documents) functionality. The default embedding search uses FastEmbed for computing the embeddings (the `all-MiniLM-L6-v2` model) and [Annoy](https://github.com/spotify/annoy) for performing the search. As shown in the previous section, the embeddings model supports both FastEmbed and OpenAI. SentenceTransformers is also supported.

For advanced use cases or integrations with existing knowledge bases, you can [provide a custom embedding search provider](advanced/embedding-search-providers.md).

Expand Down
14 changes: 5 additions & 9 deletions docs/user_guides/llm/nvidia_ai_endpoints/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Using LLMs hosted on NVIDIA API Catalog

This guide teaches you how to use NeMo Guardrails with LLMs hosted on NVIDIA API Catalog. It uses the [ABC Bot configuration](../../../../examples/bots/abc) and changes the model to `meta/llama3-70b-instruct`.
This guide teaches you how to use NeMo Guardrails with LLMs hosted on NVIDIA API Catalog. It uses the [ABC Bot configuration](../../../../examples/bots/abc) and with the `meta/llama-3.1-70b-instruct` model. Similarly, you can use `meta/llama-3.1-405b-instruct`, `meta/llama-3.1-8b-instruct` or any other [AI Foundation Model](https://build.nvidia.com/explore/discover).

## Prerequisites

Expand All @@ -15,7 +15,7 @@ pip install -U --quiet langchain-nvidia-ai-endpoints
2. An NVIDIA NGC account to access AI Foundation Models. To create a free account go to [NVIDIA NGC website](https://ngc.nvidia.com/).

3. An API key from NVIDIA API Catalog:
- Generate an API key by navigating to the AI Foundation Models section on the NVIDIA NGC website, selecting a model with an API endpoint, and generating an API key. You can use this API key for all models available in the NVIDIA API Catalog.
- Generate an API key by navigating to the [AI Foundation Models](https://build.nvidia.com/explore/discover) section on the NVIDIA NGC website, selecting a model with an API endpoint, and generating an API key. You can use this API key for all models available in the NVIDIA API Catalog.
- Export the NVIDIA API key as an environment variable:

```bash
Expand Down Expand Up @@ -45,7 +45,7 @@ Update the `models` section of the `config.yml` file to the desired model suppor
models:
- type: main
engine: nvidia_ai_endpoints
model: meta/llama3-70b-instruct
model: meta/llama-3.1-70b-instruct
...
```

Expand All @@ -60,10 +60,6 @@ config = RailsConfig.from_path("./config")
rails = LLMRails(config)
```

```
Fetching 7 files: 0%| | 0/7 [00:00<?, ?it/s]
```

Test that it works:

```python
Expand All @@ -76,11 +72,11 @@ print(response['content'])
```

```
According to the employee handbook, eligible employees are entitled to 20 days of paid vacation per year, accrued monthly.
According to our company policy, you are eligible for 20 days of vacation per year, accrued monthly.
```

You can see that the bot responds correctly.

## Conclusion

In this guide, you learned how to connect a NeMo Guardrails configuration to an NVIDIA API Catalog LLM model. This guide uses `meta/llama3-70b-instruct`, however, you can connect any other model by following the same steps.
In this guide, you learned how to connect a NeMo Guardrails configuration to an NVIDIA API Catalog LLM model. This guide uses `meta/llama-3.1-70b-instruct`, however, you can connect any other model by following the same steps.
Loading

0 comments on commit 6354436

Please sign in to comment.