add: rerank samples

langchain-ai · Feb 1, 2025 · 5c8aca3 · 5c8aca3
1 parent 9d65d5e
commit 5c8aca3
Showing 1 changed file with 220 additions and 0 deletions.
diff --git a/samples/document_compressors/rerank.ipynb b/samples/document_compressors/rerank.ipynb
@@ -0,0 +1,220 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Rerank Document Compressor\n",
+    "\n",
+    "In this notebook we will go through how you can use a rerank document compressor with Bedrock.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import boto3\n",
+    "\n",
+    "session = boto3.Session()\n",
+    "client = session.client('bedrock')\n",
+    "foundation_model = client.get_foundation_model(modelIdentifier=\"amazon.rerank-v1:0\")\n",
+    "\n",
+    "model_arn = foundation_model[\"modelDetails\"][\"modelArn\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The code below processes a list of documents to determine their relevance to a given query using AWS Bedrock's reranking capabilities. It initializes a BedrockRerank instance, providing a list of documents and a query. The `compress_documents` method then evaluates and ranks the documents based on relevance, ensuring that the most relevant ones are prioritized."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Content: AWS Bedrock enables access to AI models.\n",
+      "Score: 0.07081620395183563\n",
+      "Content: Artificial intelligence is transforming the world.\n",
+      "Score: 2.8350802949717036e-06\n",
+      "Content: LangChain is a powerful library for LLMs.\n",
+      "Score: 1.5903378880466335e-06\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_core.documents import Document\n",
+    "from langchain_aws import BedrockRerank\n",
+    "\n",
+    "# Initialize the class\n",
+    "reranker = BedrockRerank(model_arn=model_arn)\n",
+    "\n",
+    "# List of documents to rerank\n",
+    "documents = [\n",
+    "    Document(page_content=\"LangChain is a powerful library for LLMs.\"),\n",
+    "    Document(page_content=\"AWS Bedrock enables access to AI models.\"),\n",
+    "    Document(page_content=\"Artificial intelligence is transforming the world.\"),\n",
+    "]\n",
+    "\n",
+    "# Query for reranking\n",
+    "query = \"What is AWS Bedrock?\"\n",
+    "\n",
+    "# Call the rerank method\n",
+    "results = reranker.compress_documents(documents, query)\n",
+    "\n",
+    "# Display the most relevant documents\n",
+    "for doc in results:\n",
+    "    print(f\"Content: {doc.page_content}\")\n",
+    "    print(f\"Score: {doc.metadata['relevance_score']}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let's enhance our base retriever by wrapping it with a `ContextualCompressionRetriever`. Here, we integrate `BedrockRerank`, which leverages AWS Bedrock's reranking capabilities to refine the retrieved results.\n",
+    "\n",
+    "When a query is executed, the retriever first retrieves relevant documents using FAISS and then reranks them based on relevance, providing more accurate and meaningful responses."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Content: AWS Bedrock provides cloud-based AI models.\n",
+      "Score: 0.07585818320512772\n",
+      "Content: Machine learning can be used for predictions.\n",
+      "Score: 2.8573158488143235e-06\n",
+      "Content: LangChain integrates LLM models.\n",
+      "Score: 1.640820528336917e-06\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_aws import BedrockEmbeddings\n",
+    "from langchain.retrievers.contextual_compression import ContextualCompressionRetriever\n",
+    "from langchain.vectorstores import FAISS\n",
+    "from langchain_core.documents import Document\n",
+    "from langchain_aws import BedrockRerank\n",
+    "\n",
+    "# Create a vector store using FAISS with Bedrock embeddings\n",
+    "documents = [\n",
+    "    Document(page_content=\"LangChain integrates LLM models.\"),\n",
+    "    Document(page_content=\"AWS Bedrock provides cloud-based AI models.\"),\n",
+    "    Document(page_content=\"Machine learning can be used for predictions.\"),\n",
+    "]\n",
+    "embeddings = BedrockEmbeddings()\n",
+    "vectorstore = FAISS.from_documents(documents, embeddings)\n",
+    "\n",
+    "# Create the document compressor using BedrockRerank\n",
+    "reranker = BedrockRerank(model_arn=model_arn)\n",
+    "\n",
+    "# Create the retriever with contextual compression\n",
+    "retriever = ContextualCompressionRetriever(\n",
+    "    base_compressor=reranker,\n",
+    "    base_retriever=vectorstore.as_retriever(),\n",
+    ")\n",
+    "\n",
+    "# Execute a query\n",
+    "query = \"How does AWS Bedrock work?\"\n",
+    "retrieved_docs = retriever.invoke(query)\n",
+    "\n",
+    "# Display the most relevant documents\n",
+    "for doc in retrieved_docs:\n",
+    "    print(f\"Content: {doc.page_content}\")\n",
+    "    print(f\"Score: {doc.metadata.get('relevance_score', 'N/A')}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Unlike `compress_documents`, which works with structured Document objects, the rerank method allows passing plain text strings. This simplifies the process of evaluating and ranking text data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Index: 1, Score: 0.07159119844436646\n",
+      "Document: AWS Bedrock provides access to cloud-based models.\n",
+      "Index: 2, Score: 9.666109690442681e-06\n",
+      "Document: Machine learning is revolutionizing the world.\n",
+      "Index: 0, Score: 8.25057043130073e-07\n",
+      "Document: LangChain is used to integrate LLM models.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from langchain_aws import BedrockRerank\n",
+    "\n",
+    "# Initialize BedrockRerank\n",
+    "reranker = BedrockRerank(model_arn=model_arn)\n",
+    "\n",
+    "# Unstructured documents\n",
+    "documents = [\n",
+    "    \"LangChain is used to integrate LLM models.\",\n",
+    "    \"AWS Bedrock provides access to cloud-based models.\",\n",
+    "    \"Machine learning is revolutionizing the world.\",\n",
+    "]\n",
+    "\n",
+    "# Query\n",
+    "query = \"What is the role of AWS Bedrock?\"\n",
+    "\n",
+    "# Rerank the documents\n",
+    "results = reranker.rerank(query=query, documents=documents)\n",
+    "\n",
+    "# Display the results\n",
+    "for res in results:\n",
+    "    print(f\"Index: {res['index']}, Score: {res['relevance_score']}\")\n",
+    "    print(f\"Document: {documents[res['index']]}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}