Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RAG demo to the app #4

Merged
merged 6 commits into from
Jan 25, 2024
Merged

Conversation

Shreyanand
Copy link
Contributor

An example interaction

[{'role': 'system',
  'content': 'You are a helpful assistant that is comfortable speaking\n                 with C level executives in a professional setting.'},
 {'role': 'user',
  'content': 'What is the difference between ROSA and Kubernetes?'},
 {'role': 'assistant',
  'content': "  Great, I'd be happy to help!\nROSA (Red Hat OpenShift Service on AWS) and Kubernetes are both container orchestration platforms, but they differ in their scope, functionality, and deployment models. Here are some key differences:\n1. Scope: ROSA is a complete platform that includes everything needed to deploy and manage containers, while Kubernetes is an open-source platform for managing containerized workloads and services across multiple hosts.\n2. Functionality: ROSA provides additional features such as automated cluster operations, networking, load balancing, service mesh, CI/CD, firewall, monitoring, registry, authentication, and authorization capabilities, while Kubernetes offers management tools for deploying, automating, monitoring, and scaling containerized apps with minimal to no manual intervention.\n3. Deployment model: ROSA is a managed platform provided by AWS, while Kubernetes can be deployed on-premises or in the cloud, including AWS, Google Cloud Platform, Microsoft Azure, and other cloud providers.\n4. Control plane: A Kubernetes cluster consists of a control plane and one or more compute nodes, while ROSA uses Red Hat OpenShift, which is an enterprise Kubernetes platform.\n5. Compute nodes: In Kubernetes, compute nodes are organized into compute pools based on CPU, memory, operating system, attached disks, and other properties, while in ROSA, these are managed by a centralized control plane that centrally controls and monitors all Kubernetes resources in the cluster.\n6. Automation: ROSA provides automated cluster operations, including over-the-air platform upgrades, while Kubernetes also offers automation tools for deploying and managing containerized workloads.\n7. Security: Both platforms provide security features such as authentication and authorization capabilities, but ROSA adds additional features like firewall and registry.\n8. Integration: ROSA is designed to work seamlessly with AWS services, while Kubernetes can be integrated with other cloud providers or on-premises environments.\nIn summary, ROSA is a complete platform that provides a managed environment for deploying and managing containers on AWS, while Kubernetes is an open-source platform for managing containerized workloads and services across multiple hosts, offering more flexibility in terms of deployment models and integration with other cloud providers or on-premises environments."},
 {'role': 'user', 'content': 'How much do I have to pay for ROSA?'},
 {'role': 'assistant',
  'content': '  To get started with ROSA, you will need to pay for the following two components:\n1. ROSA service fees: These are the fees charged by AWS for using the ROSA platform. The pricing varies based on the level of service you choose and the number of nodes in your cluster. You can find more details on the pricing page.\n2. AWS infrastructure fees: These are the fees charged by AWS for the underlying infrastructure resources that are used to run your ROSA cluster, such as EC2 instances, storage, and networking. The pricing for these resources is based on the actual usage and can be found on the AWS pricing page.\nThe total cost of ROSA will consist of both the ROSA service fees and the AWS infrastructure fees. You can find more details on the pricing and billing for ROSA in the AWS documentation.'},
 {'role': 'user', 'content': 'Thank you!'},
 {'role': 'assistant',
  'content': "  You're welcome! It was my pleasure to help you understand the pricing for ROSA. If you have any other questions or need further assistance, feel free to ask!"}]

To do

  • Test if the application with the new changes works in a container image
  • Decide if we want to build the embedding model in the image as well

src/chat.py Outdated
with C level executives in a professional setting."""},
]
with C level executives in a professional setting."""},]
self.embeddings = SentenceTransformerEmbeddings(model_name="BAAI/bge-base-en-v1.5")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to download a model for embedding unless it already exists in the the cache?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have our embedding model part of the image in such a way that its not coming from the internet every time we instantiate a Chat object? Like read it from a local file instead that we downloaded a head of time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes certainly, It's gonna add ~500Mib to the image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has been pre-vectorzied right? What would the code look like if I had a new set of docs I wanted to use instead?

@lstocchi
Copy link
Collaborator

Playing with it 👀

Copy link
Collaborator

@lstocchi lstocchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on my MAC with applehv and worked great. I can see the difference when using or not using the rag.

Comment on lines +43 to +54
```python
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

raw_documents = TextLoader("data/fake_meeting.txt").load()
text_splitter = CharacterTextSplitter(separator = ".", chunk_size=150, chunk_overlap=0)
docs = text_splitter.split_documents(raw_documents)
e = SentenceTransformerEmbeddings(model_name="BAAI/bge-base-en-v1.5",cache_folder="models/")
db = Chroma.from_documents(docs,e,persist_directory="./data/chromaDB")
```
Copy link
Collaborator

@lstocchi lstocchi Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copy/paste this script to run the sample but can we save it somewhere? And maybe that it accepts the document url to load.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sure. I can work on a python script that can just do this for a user given the documents location. Let's do this in a follow up PR though if that is ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, ok for me 👍

from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

raw_documents = TextLoader("data/fake_meeting.txt").load()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it accepts any document, right? Any limitations (file size, format.. )?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will just read in a text file (of arbitrary size), but I have not fully tested it out. However, the package it comes from "langchain" has many different document loader if we want to use a different sort of data.

https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/document_loaders

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cool 👍 Thanks

```
### Download the embedding model

To encode our additional data and populate our vector database, we need an embedding model (a second language model) for this workflow. Here we will use `BAAI/bge-large-en-v1.5` all the necessary model files can be found and downloaded from https://huggingface.co/BAAI/bge-large-en-v1.5.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So bge will always be the one to use or there could be different models? Does it depends on the primary model type we use? E.g this only works with llama2 and mistral

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there could be different models. But I wanted to keep this simple for now and not provide the additional options of choosing an embedding model as well. But maybe its better to make it another user option now?

It does not depend on the primary model. It only matters that what ever embedding model was used to create your vector database is the same one used at run time with the vector database.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a desktop perspective i thought that when users want to build/play with a rag sample, we make them pick a model, a rag and then proceed with the usual tasks (download, build, run, ....).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, there would just be an additional step so it would be like, pick a model, select rag recipe, (rag requires an embedding model) select an embedding model, then proceed as usual.

Comment on lines +6 to +14
- name: rag-demo-service
contextdir: model_services
containerfile: base/Containerfile
model-service: true
backend:
- llama
arch:
- arm64
- amd64
Copy link
Collaborator

@lstocchi lstocchi Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a property so we know that it supports rag, and maybe the type if there could be alternatives (e.g this works with BAAI/bge-base-en-v1.5).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, what would that look like:

backend:
   - llama
rag:
   - True

What would that be used for? Right now this example uses an in-memory vectordatabase in the same container. However, I would like to extend this example (in a follow up PR). Where the vectordatabase is a stand alone container. How would that impact the ai-studio.yaml file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we should add a completely new container to the list. We can do it later when you push the follow up PR

@MichaelClifford MichaelClifford changed the title Add RAG for ROSA docs to the app Add RAG demo to the app Jan 25, 2024
@MichaelClifford MichaelClifford merged commit ae63f36 into containers:main Jan 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants