Skip to content

Commit

Permalink
add: rerank samples
Browse files Browse the repository at this point in the history
  • Loading branch information
jpfcabral committed Feb 1, 2025
1 parent 9d65d5e commit 5c8aca3
Showing 1 changed file with 220 additions and 0 deletions.
220 changes: 220 additions & 0 deletions samples/document_compressors/rerank.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Rerank Document Compressor\n",
"\n",
"In this notebook we will go through how you can use a rerank document compressor with Bedrock.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"\n",
"session = boto3.Session()\n",
"client = session.client('bedrock')\n",
"foundation_model = client.get_foundation_model(modelIdentifier=\"amazon.rerank-v1:0\")\n",
"\n",
"model_arn = foundation_model[\"modelDetails\"][\"modelArn\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code below processes a list of documents to determine their relevance to a given query using AWS Bedrock's reranking capabilities. It initializes a BedrockRerank instance, providing a list of documents and a query. The `compress_documents` method then evaluates and ranks the documents based on relevance, ensuring that the most relevant ones are prioritized."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Content: AWS Bedrock enables access to AI models.\n",
"Score: 0.07081620395183563\n",
"Content: Artificial intelligence is transforming the world.\n",
"Score: 2.8350802949717036e-06\n",
"Content: LangChain is a powerful library for LLMs.\n",
"Score: 1.5903378880466335e-06\n"
]
}
],
"source": [
"from langchain_core.documents import Document\n",
"from langchain_aws import BedrockRerank\n",
"\n",
"# Initialize the class\n",
"reranker = BedrockRerank(model_arn=model_arn)\n",
"\n",
"# List of documents to rerank\n",
"documents = [\n",
" Document(page_content=\"LangChain is a powerful library for LLMs.\"),\n",
" Document(page_content=\"AWS Bedrock enables access to AI models.\"),\n",
" Document(page_content=\"Artificial intelligence is transforming the world.\"),\n",
"]\n",
"\n",
"# Query for reranking\n",
"query = \"What is AWS Bedrock?\"\n",
"\n",
"# Call the rerank method\n",
"results = reranker.compress_documents(documents, query)\n",
"\n",
"# Display the most relevant documents\n",
"for doc in results:\n",
" print(f\"Content: {doc.page_content}\")\n",
" print(f\"Score: {doc.metadata['relevance_score']}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's enhance our base retriever by wrapping it with a `ContextualCompressionRetriever`. Here, we integrate `BedrockRerank`, which leverages AWS Bedrock's reranking capabilities to refine the retrieved results.\n",
"\n",
"When a query is executed, the retriever first retrieves relevant documents using FAISS and then reranks them based on relevance, providing more accurate and meaningful responses."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Content: AWS Bedrock provides cloud-based AI models.\n",
"Score: 0.07585818320512772\n",
"Content: Machine learning can be used for predictions.\n",
"Score: 2.8573158488143235e-06\n",
"Content: LangChain integrates LLM models.\n",
"Score: 1.640820528336917e-06\n"
]
}
],
"source": [
"from langchain_aws import BedrockEmbeddings\n",
"from langchain.retrievers.contextual_compression import ContextualCompressionRetriever\n",
"from langchain.vectorstores import FAISS\n",
"from langchain_core.documents import Document\n",
"from langchain_aws import BedrockRerank\n",
"\n",
"# Create a vector store using FAISS with Bedrock embeddings\n",
"documents = [\n",
" Document(page_content=\"LangChain integrates LLM models.\"),\n",
" Document(page_content=\"AWS Bedrock provides cloud-based AI models.\"),\n",
" Document(page_content=\"Machine learning can be used for predictions.\"),\n",
"]\n",
"embeddings = BedrockEmbeddings()\n",
"vectorstore = FAISS.from_documents(documents, embeddings)\n",
"\n",
"# Create the document compressor using BedrockRerank\n",
"reranker = BedrockRerank(model_arn=model_arn)\n",
"\n",
"# Create the retriever with contextual compression\n",
"retriever = ContextualCompressionRetriever(\n",
" base_compressor=reranker,\n",
" base_retriever=vectorstore.as_retriever(),\n",
")\n",
"\n",
"# Execute a query\n",
"query = \"How does AWS Bedrock work?\"\n",
"retrieved_docs = retriever.invoke(query)\n",
"\n",
"# Display the most relevant documents\n",
"for doc in retrieved_docs:\n",
" print(f\"Content: {doc.page_content}\")\n",
" print(f\"Score: {doc.metadata.get('relevance_score', 'N/A')}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unlike `compress_documents`, which works with structured Document objects, the rerank method allows passing plain text strings. This simplifies the process of evaluating and ranking text data."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index: 1, Score: 0.07159119844436646\n",
"Document: AWS Bedrock provides access to cloud-based models.\n",
"Index: 2, Score: 9.666109690442681e-06\n",
"Document: Machine learning is revolutionizing the world.\n",
"Index: 0, Score: 8.25057043130073e-07\n",
"Document: LangChain is used to integrate LLM models.\n"
]
}
],
"source": [
"from langchain_aws import BedrockRerank\n",
"\n",
"# Initialize BedrockRerank\n",
"reranker = BedrockRerank(model_arn=model_arn)\n",
"\n",
"# Unstructured documents\n",
"documents = [\n",
" \"LangChain is used to integrate LLM models.\",\n",
" \"AWS Bedrock provides access to cloud-based models.\",\n",
" \"Machine learning is revolutionizing the world.\",\n",
"]\n",
"\n",
"# Query\n",
"query = \"What is the role of AWS Bedrock?\"\n",
"\n",
"# Rerank the documents\n",
"results = reranker.rerank(query=query, documents=documents)\n",
"\n",
"# Display the results\n",
"for res in results:\n",
" print(f\"Index: {res['index']}, Score: {res['relevance_score']}\")\n",
" print(f\"Document: {documents[res['index']]}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.15"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 5c8aca3

Please sign in to comment.