Add RAG demo to the app #4

Shreyanand · 2023-12-08T16:13:23Z

An example interaction

[{'role': 'system',
  'content': 'You are a helpful assistant that is comfortable speaking\n                 with C level executives in a professional setting.'},
 {'role': 'user',
  'content': 'What is the difference between ROSA and Kubernetes?'},
 {'role': 'assistant',
  'content': "  Great, I'd be happy to help!\nROSA (Red Hat OpenShift Service on AWS) and Kubernetes are both container orchestration platforms, but they differ in their scope, functionality, and deployment models. Here are some key differences:\n1. Scope: ROSA is a complete platform that includes everything needed to deploy and manage containers, while Kubernetes is an open-source platform for managing containerized workloads and services across multiple hosts.\n2. Functionality: ROSA provides additional features such as automated cluster operations, networking, load balancing, service mesh, CI/CD, firewall, monitoring, registry, authentication, and authorization capabilities, while Kubernetes offers management tools for deploying, automating, monitoring, and scaling containerized apps with minimal to no manual intervention.\n3. Deployment model: ROSA is a managed platform provided by AWS, while Kubernetes can be deployed on-premises or in the cloud, including AWS, Google Cloud Platform, Microsoft Azure, and other cloud providers.\n4. Control plane: A Kubernetes cluster consists of a control plane and one or more compute nodes, while ROSA uses Red Hat OpenShift, which is an enterprise Kubernetes platform.\n5. Compute nodes: In Kubernetes, compute nodes are organized into compute pools based on CPU, memory, operating system, attached disks, and other properties, while in ROSA, these are managed by a centralized control plane that centrally controls and monitors all Kubernetes resources in the cluster.\n6. Automation: ROSA provides automated cluster operations, including over-the-air platform upgrades, while Kubernetes also offers automation tools for deploying and managing containerized workloads.\n7. Security: Both platforms provide security features such as authentication and authorization capabilities, but ROSA adds additional features like firewall and registry.\n8. Integration: ROSA is designed to work seamlessly with AWS services, while Kubernetes can be integrated with other cloud providers or on-premises environments.\nIn summary, ROSA is a complete platform that provides a managed environment for deploying and managing containers on AWS, while Kubernetes is an open-source platform for managing containerized workloads and services across multiple hosts, offering more flexibility in terms of deployment models and integration with other cloud providers or on-premises environments."},
 {'role': 'user', 'content': 'How much do I have to pay for ROSA?'},
 {'role': 'assistant',
  'content': '  To get started with ROSA, you will need to pay for the following two components:\n1. ROSA service fees: These are the fees charged by AWS for using the ROSA platform. The pricing varies based on the level of service you choose and the number of nodes in your cluster. You can find more details on the pricing page.\n2. AWS infrastructure fees: These are the fees charged by AWS for the underlying infrastructure resources that are used to run your ROSA cluster, such as EC2 instances, storage, and networking. The pricing for these resources is based on the actual usage and can be found on the AWS pricing page.\nThe total cost of ROSA will consist of both the ROSA service fees and the AWS infrastructure fees. You can find more details on the pricing and billing for ROSA in the AWS documentation.'},
 {'role': 'user', 'content': 'Thank you!'},
 {'role': 'assistant',
  'content': "  You're welcome! It was my pleasure to help you understand the pricing for ROSA. If you have any other questions or need further assistance, feel free to ask!"}]

To do

Test if the application with the new changes works in a container image
Decide if we want to build the embedding model in the image as well

MichaelClifford · 2023-12-15T16:34:39Z

src/chat.py

-                with C level executives in a professional setting."""},
-                ]
+                 with C level executives in a professional setting."""},]
+        self.embeddings = SentenceTransformerEmbeddings(model_name="BAAI/bge-base-en-v1.5")        


Is this going to download a model for embedding unless it already exists in the the cache?

Can we have our embedding model part of the image in such a way that its not coming from the internet every time we instantiate a Chat object? Like read it from a local file instead that we downloaded a head of time?

Yes certainly, It's gonna add ~500Mib to the image

MichaelClifford · 2023-12-15T16:55:36Z

data/chroma.sqlite3

This file has been pre-vectorzied right? What would the code look like if I had a new set of docs I wanted to use instead?

lstocchi · 2024-01-25T11:21:02Z

Playing with it 👀

lstocchi

Tested on my MAC with applehv and worked great. I can see the difference when using or not using the rag.

lstocchi · 2024-01-25T12:02:13Z

rag/README.md

+```python
+from langchain_community.document_loaders import TextLoader
+from langchain.text_splitter import CharacterTextSplitter
+from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
+from langchain.vectorstores import Chroma
+
+raw_documents = TextLoader("data/fake_meeting.txt").load()
+text_splitter = CharacterTextSplitter(separator = ".", chunk_size=150, chunk_overlap=0)
+docs = text_splitter.split_documents(raw_documents)
+e = SentenceTransformerEmbeddings(model_name="BAAI/bge-base-en-v1.5",cache_folder="models/")
+db = Chroma.from_documents(docs,e,persist_directory="./data/chromaDB")
+```


I copy/paste this script to run the sample but can we save it somewhere? And maybe that it accepts the document url to load.

yeah, sure. I can work on a python script that can just do this for a user given the documents location. Let's do this in a follow up PR though if that is ok.

Sure, ok for me 👍

lstocchi · 2024-01-25T12:02:17Z

rag/README.md

+from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
+from langchain.vectorstores import Chroma
+
+raw_documents = TextLoader("data/fake_meeting.txt").load()


Does it accepts any document, right? Any limitations (file size, format.. )?

I think this will just read in a text file (of arbitrary size), but I have not fully tested it out. However, the package it comes from "langchain" has many different document loader if we want to use a different sort of data.

https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/document_loaders

Ok, cool 👍 Thanks

lstocchi · 2024-01-25T12:03:41Z

rag/README.md

+```
+### Download the embedding model
+
+To encode our additional data and populate our vector database, we need an embedding model (a second language model) for this workflow. Here we will use `BAAI/bge-large-en-v1.5` all the necessary model files can be found and downloaded from https://huggingface.co/BAAI/bge-large-en-v1.5.


So bge will always be the one to use or there could be different models? Does it depends on the primary model type we use? E.g this only works with llama2 and mistral

No, there could be different models. But I wanted to keep this simple for now and not provide the additional options of choosing an embedding model as well. But maybe its better to make it another user option now?

It does not depend on the primary model. It only matters that what ever embedding model was used to create your vector database is the same one used at run time with the vector database.

From a desktop perspective i thought that when users want to build/play with a rag sample, we make them pick a model, a rag and then proceed with the usual tasks (download, build, run, ....).

yeah, there would just be an additional step so it would be like, pick a model, select rag recipe, (rag requires an embedding model) select an embedding model, then proceed as usual.

lstocchi · 2024-01-25T12:07:46Z

rag/ai-studio.yaml

+    - name: rag-demo-service
+      contextdir: model_services
+      containerfile: base/Containerfile
+      model-service: true
+      backend: 
+        - llama
+      arch:
+        - arm64
+        - amd64


I think we should add a property so we know that it supports rag, and maybe the type if there could be alternatives (e.g this works with BAAI/bge-base-en-v1.5).

sure, what would that look like:

backend: - llama rag: - True

What would that be used for? Right now this example uses an in-memory vectordatabase in the same container. However, I would like to extend this example (in a follow up PR). Where the vectordatabase is a stand alone container. How would that impact the ai-studio.yaml file?

In that case we should add a completely new container to the list. We can do it later when you push the follow up PR

MichaelClifford force-pushed the main branch from ead1ef4 to d7cb0ba Compare December 15, 2023 16:26

MichaelClifford reviewed Dec 15, 2023

View reviewed changes

MichaelClifford force-pushed the main branch 2 times, most recently from 10ed08a to fa202f5 Compare January 4, 2024 18:33

MichaelClifford force-pushed the main branch from fa202f5 to ef2855f Compare January 23, 2024 19:55

MichaelClifford closed this Jan 23, 2024

MichaelClifford force-pushed the main branch from c3e2620 to d1c018f Compare January 23, 2024 21:14

MichaelClifford reopened this Jan 23, 2024

MichaelClifford force-pushed the main branch from 648e618 to 07ca19b Compare January 24, 2024 20:08

MichaelClifford requested review from sallyom and lstocchi January 24, 2024 20:09

lstocchi approved these changes Jan 25, 2024

View reviewed changes

lstocchi reviewed Jan 25, 2024

View reviewed changes

MichaelClifford changed the title ~~Add RAG for ROSA docs to the app~~ Add RAG demo to the app Jan 25, 2024

Shreyanand and others added 6 commits January 25, 2024 15:18

Add RAG for ROSA docs to the app

b8d4a40

make rag optional

71bc40f

reorg rag work

26ef992

support local embeddings and volume mounts

2215b2d

update RAG readme

ae1bb62

remove unnecessary files

ee9855c

MichaelClifford force-pushed the main branch from a63c7de to ee9855c Compare January 25, 2024 20:19

MichaelClifford merged commit ae63f36 into containers:main Jan 25, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RAG demo to the app #4

Add RAG demo to the app #4

Shreyanand commented Dec 8, 2023

MichaelClifford Dec 15, 2023

MichaelClifford Dec 15, 2023

Shreyanand Dec 15, 2023

MichaelClifford Dec 15, 2023

lstocchi commented Jan 25, 2024

lstocchi left a comment

lstocchi Jan 25, 2024 •

edited

Loading

MichaelClifford Jan 25, 2024

lstocchi Jan 25, 2024

lstocchi Jan 25, 2024

MichaelClifford Jan 25, 2024

lstocchi Jan 25, 2024

lstocchi Jan 25, 2024

MichaelClifford Jan 25, 2024

lstocchi Jan 25, 2024

MichaelClifford Jan 25, 2024

lstocchi Jan 25, 2024 •

edited

Loading

MichaelClifford Jan 25, 2024

lstocchi Jan 25, 2024

Add RAG demo to the app #4

Add RAG demo to the app #4

Conversation

Shreyanand commented Dec 8, 2023

An example interaction

To do

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lstocchi commented Jan 25, 2024

lstocchi left a comment

Choose a reason for hiding this comment

lstocchi Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lstocchi Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lstocchi Jan 25, 2024 •

edited

Loading

lstocchi Jan 25, 2024 •

edited

Loading