generated from langchain-ai/integration-repo-template
-
Notifications
You must be signed in to change notification settings - Fork 116
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
220 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Rerank Document Compressor\n", | ||
"\n", | ||
"In this notebook we will go through how you can use a rerank document compressor with Bedrock.\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import boto3\n", | ||
"\n", | ||
"session = boto3.Session()\n", | ||
"client = session.client('bedrock')\n", | ||
"foundation_model = client.get_foundation_model(modelIdentifier=\"amazon.rerank-v1:0\")\n", | ||
"\n", | ||
"model_arn = foundation_model[\"modelDetails\"][\"modelArn\"]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The code below processes a list of documents to determine their relevance to a given query using AWS Bedrock's reranking capabilities. It initializes a BedrockRerank instance, providing a list of documents and a query. The `compress_documents` method then evaluates and ranks the documents based on relevance, ensuring that the most relevant ones are prioritized." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Content: AWS Bedrock enables access to AI models.\n", | ||
"Score: 0.07081620395183563\n", | ||
"Content: Artificial intelligence is transforming the world.\n", | ||
"Score: 2.8350802949717036e-06\n", | ||
"Content: LangChain is a powerful library for LLMs.\n", | ||
"Score: 1.5903378880466335e-06\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from langchain_core.documents import Document\n", | ||
"from langchain_aws import BedrockRerank\n", | ||
"\n", | ||
"# Initialize the class\n", | ||
"reranker = BedrockRerank(model_arn=model_arn)\n", | ||
"\n", | ||
"# List of documents to rerank\n", | ||
"documents = [\n", | ||
" Document(page_content=\"LangChain is a powerful library for LLMs.\"),\n", | ||
" Document(page_content=\"AWS Bedrock enables access to AI models.\"),\n", | ||
" Document(page_content=\"Artificial intelligence is transforming the world.\"),\n", | ||
"]\n", | ||
"\n", | ||
"# Query for reranking\n", | ||
"query = \"What is AWS Bedrock?\"\n", | ||
"\n", | ||
"# Call the rerank method\n", | ||
"results = reranker.compress_documents(documents, query)\n", | ||
"\n", | ||
"# Display the most relevant documents\n", | ||
"for doc in results:\n", | ||
" print(f\"Content: {doc.page_content}\")\n", | ||
" print(f\"Score: {doc.metadata['relevance_score']}\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now let's enhance our base retriever by wrapping it with a `ContextualCompressionRetriever`. Here, we integrate `BedrockRerank`, which leverages AWS Bedrock's reranking capabilities to refine the retrieved results.\n", | ||
"\n", | ||
"When a query is executed, the retriever first retrieves relevant documents using FAISS and then reranks them based on relevance, providing more accurate and meaningful responses." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Content: AWS Bedrock provides cloud-based AI models.\n", | ||
"Score: 0.07585818320512772\n", | ||
"Content: Machine learning can be used for predictions.\n", | ||
"Score: 2.8573158488143235e-06\n", | ||
"Content: LangChain integrates LLM models.\n", | ||
"Score: 1.640820528336917e-06\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from langchain_aws import BedrockEmbeddings\n", | ||
"from langchain.retrievers.contextual_compression import ContextualCompressionRetriever\n", | ||
"from langchain.vectorstores import FAISS\n", | ||
"from langchain_core.documents import Document\n", | ||
"from langchain_aws import BedrockRerank\n", | ||
"\n", | ||
"# Create a vector store using FAISS with Bedrock embeddings\n", | ||
"documents = [\n", | ||
" Document(page_content=\"LangChain integrates LLM models.\"),\n", | ||
" Document(page_content=\"AWS Bedrock provides cloud-based AI models.\"),\n", | ||
" Document(page_content=\"Machine learning can be used for predictions.\"),\n", | ||
"]\n", | ||
"embeddings = BedrockEmbeddings()\n", | ||
"vectorstore = FAISS.from_documents(documents, embeddings)\n", | ||
"\n", | ||
"# Create the document compressor using BedrockRerank\n", | ||
"reranker = BedrockRerank(model_arn=model_arn)\n", | ||
"\n", | ||
"# Create the retriever with contextual compression\n", | ||
"retriever = ContextualCompressionRetriever(\n", | ||
" base_compressor=reranker,\n", | ||
" base_retriever=vectorstore.as_retriever(),\n", | ||
")\n", | ||
"\n", | ||
"# Execute a query\n", | ||
"query = \"How does AWS Bedrock work?\"\n", | ||
"retrieved_docs = retriever.invoke(query)\n", | ||
"\n", | ||
"# Display the most relevant documents\n", | ||
"for doc in retrieved_docs:\n", | ||
" print(f\"Content: {doc.page_content}\")\n", | ||
" print(f\"Score: {doc.metadata.get('relevance_score', 'N/A')}\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Unlike `compress_documents`, which works with structured Document objects, the rerank method allows passing plain text strings. This simplifies the process of evaluating and ranking text data." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Index: 1, Score: 0.07159119844436646\n", | ||
"Document: AWS Bedrock provides access to cloud-based models.\n", | ||
"Index: 2, Score: 9.666109690442681e-06\n", | ||
"Document: Machine learning is revolutionizing the world.\n", | ||
"Index: 0, Score: 8.25057043130073e-07\n", | ||
"Document: LangChain is used to integrate LLM models.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from langchain_aws import BedrockRerank\n", | ||
"\n", | ||
"# Initialize BedrockRerank\n", | ||
"reranker = BedrockRerank(model_arn=model_arn)\n", | ||
"\n", | ||
"# Unstructured documents\n", | ||
"documents = [\n", | ||
" \"LangChain is used to integrate LLM models.\",\n", | ||
" \"AWS Bedrock provides access to cloud-based models.\",\n", | ||
" \"Machine learning is revolutionizing the world.\",\n", | ||
"]\n", | ||
"\n", | ||
"# Query\n", | ||
"query = \"What is the role of AWS Bedrock?\"\n", | ||
"\n", | ||
"# Rerank the documents\n", | ||
"results = reranker.rerank(query=query, documents=documents)\n", | ||
"\n", | ||
"# Display the results\n", | ||
"for res in results:\n", | ||
" print(f\"Index: {res['index']}, Score: {res['relevance_score']}\")\n", | ||
" print(f\"Document: {documents[res['index']]}\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": ".venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.15" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |