⚡FlashRAG: A Python Toolkit for Efficient RAG Research

[ English | 中文 ]

Installation | Features | Quick-Start | Components | Supporting Methods | Supporting Datasets | FAQs

FlashRAG is a Python toolkit for the reproduction and development of Retrieval Augmented Generation (RAG) research. Our toolkit includes 36 pre-processed benchmark RAG datasets and 15 state-of-the-art RAG algorithms.

With FlashRAG and provided resources, you can effortlessly reproduce existing SOTA works in the RAG domain or implement your custom RAG processes and components.

🔗 Navigation

Features
Roadmap
Changelog
Installation
Quick Start
Components
Supporting Methods
Supporting Datasets & Document Corpus
Additional FAQs
License
Citation

✨ Features

Extensive and Customizable Framework: Includes essential components for RAG scenarios such as retrievers, rerankers, generators, and compressors, allowing for flexible assembly of complex pipelines.
Comprehensive Benchmark Datasets: A collection of 36 pre-processed RAG benchmark datasets to test and validate RAG models' performances.
Pre-implemented Advanced RAG Algorithms: Features 15 advancing RAG algorithms with reported results, based on our framework. Easily reproducing results under different settings.
Efficient Preprocessing Stage: Simplifies the RAG workflow preparation by providing various scripts like corpus processing for retrieval, retrieval index building, and pre-retrieval of documents.
Optimized Execution: The library's efficiency is enhanced with tools like vLLM, FastChat for LLM inference acceleration, and Faiss for vector index management.
Easy to Use UI : We have developed a very easy to use UI to easily and quickly configure and experience the RAG baselines we have implemented, as well as run evaluation scripts on a visual interface.

🔎 Roadmap

FlashRAG is still under development and there are many issues and room for improvement. We will continue to update. And we also sincerely welcome contributions on this open-source toolkit.

Support OpenAI models
Provdide instructions for each component
Integrate sentence Transformers
Inlcude more RAG approaches
Add more evaluation metrics (e.g., Unieval, name-entity F1) and benchmarks (e.g., RGB benchmark)
Enhance code adaptability and readability

📃 Changelog

[25/01/07] We have currently support the aggregation of multiple retrievers, see it in multi retriever usage.

[25/01/07] We have integrated a very flexible and lightweight corpus chunking library Chunkie, which supports various custom chunking methods (tokens, sentences, semantic, etc.). Use it in chunking doc corpus.

[24/10/21] We have released a version based on the Paddle framework that supports Chinese hardware platforms. Please refer to FlashRAG Paddle for details.

[24/10/13] A new in-domain dataset and corpus - DomainRAG have been added to the dataset. The dataset is based on the internal enrollment data of Renmin University of China, covering seven types of tasks, which can be used for conducting domain-specific RAG testing.

[24/09/24] We have released a version based on the MindSpore framework that supports Chinese hardware platforms. Please refer to FlashRAG MindSpore for details.

[24/09/18] Due to the complexity and limitations of installing Pyserini in certain environments, we have introduced a lightweight BM25s package as an alternative (faster and easier to use). The retriever based on Pyserini will be deprecated in future versions. To use retriever with bm25s, just set bm25_backend to bm25s in config.

[24/09/09] We add support for a new method Adaptive-RAG, which can automatically select the RAG process to execute based on the type of query. See it result in result table.

[24/08/02] We add support for a new method Spring, significantly improve the performance of LLM by adding only a few token embeddings. See it result in result table.

[24/07/17] Due to some unknown issues with HuggingFace, our original dataset link has been invalid. We have updated it. Please check the new link if you encounter any problems.

[24/07/06] We add support for a new method: Trace, which refine text by constructing a knowledge graph. See it results and details.

[24/06/19] We add support for a new method: IRCoT, and update the result table.

[24/06/15] We provide a demo to perform the RAG process using our toolkit.

[24/06/11] We have integrated sentence transformers in the retriever module. Now it's easier to use the retriever without setting pooling methods.

[24/06/05] We have provided detailed document for reproducing existing methods (see how to reproduce, baseline details), and configurations settings.

[24/06/02] We have provided an introduction of FlashRAG for beginners, see an introduction to flashrag (中文版 한국어).

[24/05/31] We supported Openai-series models as generator.

🔧 Installation

To get started with FlashRAG, you can simply install it with pip:

pip install flashrag-dev --pre

Or you can clone it from Github and install (requires Python 3.10+):

git clone https://github.com/RUC-NLPIR/FlashRAG.git
cd FlashRAG
pip install -e .

If you want to use vllm, sentence-transformers or pyserini, you can install the optional dependencies:

# Install all extra dependencies
pip install flashrag[full]

# Install vllm for faster speed
pip install vllm>=0.4.1

# Install sentence-transformers
pip install sentence-transformers

# Install pyserini for bm25
pip install pyserini

Due to the incompatibility when installing faiss using pip, it is necessary to use the following conda command for installation.

# CPU-only version
conda install -c pytorch faiss-cpu=1.8.0

# GPU(+CPU) version
conda install -c pytorch -c nvidia faiss-gpu=1.8.0

Note: It is impossible to install the latest version of faiss on certain systems.

From the official Faiss repository (source):

The CPU-only faiss-cpu conda package is currently available on Linux (x86_64 and arm64), OSX (arm64 only), and Windows (x86_64)

faiss-gpu, containing both CPU and GPU indices, is available on Linux (x86_64 only) for CUDA 11.4 and 12.1

🚀 Quick Start

Corpus Construction

To build an index, you first need to save your corpus as a jsonl file with each line representing a document.

{"id": "0", "contents": "content"}
{"id": "1", "contents": "content"}
...

If you want to use Wikipedia as your corpus, you can refer to our documentation Processing Wikipedia to convert it into an indexable format.

Index Construction

You can use the following code to build your own index.

For dense retrieval methods, especially popular embedding models, we use faiss to build the index.
For sparse retrieval methods (BM25), we use Pyserini or bm25s to build the corpus into a Lucene inverted index. The built index contains the original documents.

For Dense Retrieval Methods

Modify the parameters in the following code to your own.

python -m flashrag.retriever.index_builder \
  --retrieval_method e5 \
  --model_path /model/e5-base-v2/ \
  --corpus_path indexes/sample_corpus.jsonl \
  --save_dir indexes/ \
  --use_fp16 \
  --max_length 512 \
  --batch_size 256 \
  --pooling_method mean \
  --faiss_type Flat

--pooling_method: If this parameter is not specified, we will automatically select it based on the model name and model file. However, since different embedding models use different pooling methods, we may not have fully implemented them. To ensure accuracy, you can specify the pooling method corresponding to the retrieval model you are using (mean, pooler, or cls).
---instruction: Some embedding models require additional instructions to be concatenated to the query before encoding, which can be specified here. Currently, we will automatically fill in the instructions for E5 and BGE models, while other models need to be supplemented manually.

If the retrieval model supports the sentence transformers library, you can use the following code to build the index (without considering the pooling method).

python -m flashrag.retriever.index_builder \
  --retrieval_method e5 \
  --model_path /model/e5-base-v2/ \
  --corpus_path indexes/sample_corpus.jsonl \
  --save_dir indexes/ \
  --use_fp16 \
  --max_length 512 \
  --batch_size 256 \
  --pooling_method mean \
  --sentence_transformer \
  --faiss_type Flat

For Sparse Retrieval Methods (BM25)

If building a bm25 index, there is no need to specify model_path.

Building Index with BM25s

python -m flashrag.retriever.index_builder \
  --retrieval_method bm25 \
  --corpus_path indexes/sample_corpus.jsonl \
  --bm25_backend bm25s \
  --save_dir indexes/

Building Index with Pyserini

python -m flashrag.retriever.index_builder \
  --retrieval_method bm25 \
  --corpus_path indexes/sample_corpus.jsonl \
  --bm25_backend pyserini \
  --save_dir indexes/

Using the ready-made pipeline

You can use the pipeline class we have already built (as shown in pipelines) to implement the RAG process inside. In this case, you just need to configure the config and load the corresponding pipeline.

Firstly, load the entire process's config, which records various hyperparameters required in the RAG process. You can input yaml files as parameters or directly as variables.

Please note that variables as input take precedence over files.

from flashrag.config import Config

# hybrid load configs
config_dict = {'data_dir': 'dataset/'}
my_config = Config(
    config_file_path = 'my_config.yaml',
    config_dict = config_dict

We provide comprehensive guidance on how to set configurations, you can see our configuration guidance. You can also refer to the basic yaml file we provide to set your own parameters.

Next, load the corresponding dataset and initialize the pipeline. The components in the pipeline will be automatically loaded.

from flashrag.utils import get_dataset
from flashrag.pipeline import SequentialPipeline
from flashrag.prompt import PromptTemplate
from flashrag.config import Config

config_dict = {'data_dir': 'dataset/'}
my_config = Config(
    config_file_path = 'my_config.yaml',
    config_dict = config_dict
)
all_split = get_dataset(my_config)
test_data = all_split['test']

pipeline = SequentialPipeline(my_config)

You can specify your own input prompt using PromptTemplete:

prompt_templete = PromptTemplate(
    config,
    system_prompt = "Answer the question based on the given document. Only give me the answer and do not output any other words.\nThe following are given documents.\n\n{reference}",
    user_prompt = "Question: {question}\nAnswer:"
)
pipeline = SequentialPipeline(
  my_config,
  prompt_template = prompt_templete
)

Finally, execute pipeline.run to obtain the final result.

output_dataset = pipeline.run(test_data, do_eval=True)

The output_dataset contains the intermediate results and metric scores for each item in the input dataset. Meanwhile, the dataset with intermediate results and the overall evaluation score will also be saved as a file (if save_intermediate_data and save_metric_score are specified).

Build your own pipeline!

Sometimes you may need to implement more complex RAG process, and you can build your own pipeline to implement it. You just need to inherit BasicPipeline, initialize the components you need, and complete the run function.

from flashrag.pipeline import BasicPipeline
from flashrag.utils import get_retriever, get_generator

class ToyPipeline(BasicPipeline):
  def __init__(self, config, prompt_templete=None):
    # Load your own components
    pass

  def run(self, dataset, do_eval=True):
    # Complete your own process logic

    # get attribute in dataset using `.`
    input_query = dataset.question
    ...
    # use `update_output` to save intermeidate data
    dataset.update_output("pred",pred_answer_list)
    dataset = self.evaluate(dataset, do_eval=do_eval)
    return dataset

Please first understand the input and output forms of the components you need to use from our documentation.

Just use components

If you already have your own code and only want to use our components to embed the original code, you can refer to the basic introduction of the components to obtain the input and output formats of each component.

⚙️ Components

In FlashRAG, we have built a series of common RAG components, including retrievers, generators, refiners, and more. Based on these components, we have assembled several pipelines to implement the RAG workflow, while also providing the flexibility to combine these components in custom arrangements to create your own pipeline.

RAG-Components

Type	Module	Description
Judger	SKR Judger	Judging whether to retrieve using SKR method
Retriever	Dense Retriever	Bi-encoder models such as dpr, bge, e5, using faiss for search
	BM25 Retriever	Sparse retrieval method based on Lucene
	Bi-Encoder Reranker	Calculate matching score using bi-Encoder
	Cross-Encoder Reranker	Calculate matching score using cross-encoder
Refiner	Extractive Refiner	Refine input by extracting important context
	Abstractive Refiner	Refine input through seq2seq model
	LLMLingua Refiner	LLMLingua-series prompt compressor
	SelectiveContext Refiner	Selective-Context prompt compressor
	KG Refiner	Use Trace method to construct a knowledge graph
Generator	Encoder-Decoder Generator	Encoder-Decoder model, supporting Fusion-in-Decoder (FiD)
	Decoder-only Generator	Native transformers implementation
	FastChat Generator	Accelerate with FastChat
	vllm Generator	Accelerate with vllm

Pipelines

Referring to a survey on retrieval-augmented generation, we categorized RAG methods into four types based on their inference paths.

Sequential: Sequential execuation of RAG process, like Query-(pre-retrieval)-retriever-(post-retrieval)-generator
Conditional: Implements different paths for different types of input queries
Branching : Executes multiple paths in parallel, merging the responses from each path
Loop: Iteratively performs retrieval and generation

In each category, we have implemented corresponding common pipelines. Some pipelines have corresponding work papers.

Type	Module	Description
Sequential	Sequential Pipeline	Linear execution of query, supporting refiner, reranker
Conditional	Conditional Pipeline	With a judger module, distinct execution paths for various query types
Branching	REPLUG Pipeline	Generate answer by integrating probabilities in multiple generation paths
Branching	SuRe Pipeline	Ranking and merging generated results based on each document
Loop	Iterative Pipeline	Alternating retrieval and generation
	Self-Ask Pipeline	Decompose complex problems into subproblems using self-ask
	Self-RAG Pipeline	Adaptive retrieval, critique, and generation
	FLARE Pipeline	Dynamic retrieval during the generation process
	IRCoT Pipeline	Integrate retrieval process with CoT

🤖 Supporting Methods

We have implemented 15 works with a consistent setting of:

Generator: LLAMA3-8B-instruct with input length of 2048
Retriever: e5-base-v2 as embedding model, retrieve 5 docs per query
Prompt: A consistent default prompt, template can be found in the method details.

For open-source methods, we implemented their processes using our framework. For methods where the author did not provide source code, we will try our best to follow the methods in the original paper for implementation.

For necessary settings and hyperparameters specific to some methods, we have documented them in the specific settings column. For more details, please consult our reproduce guidance and method details.

It’s important to note that, to ensure consistency, we have utilized a uniform setting. However, this setting may differ from the original setting of the method, leading to variations in results compared to the original outcomes.

Method	Type	NQ (EM)	TriviaQA (EM)	Hotpotqa (F1)	2Wiki (F1)	PopQA (F1)	WebQA(EM)	Specific setting
Naive Generation	Sequential	22.6	55.7	28.4	33.9	21.7	18.8
Standard RAG	Sequential	35.1	58.9	35.3	21.0	36.7	15.7
AAR-contriever-kilt	Sequential	30.1	56.8	33.4	19.8	36.1	16.1
LongLLMLingua	Sequential	32.2	59.2	37.5	25.0	38.7	17.5	Compress Ratio=0.5
RECOMP-abstractive	Sequential	33.1	56.4	37.5	32.4	39.9	20.2
Selective-Context	Sequential	30.5	55.6	34.4	18.5	33.5	17.3	Compress Ratio=0.5
Trace	Sequential	30.7	50.2	34.0	15.5	37.4	19.9
Spring	Sequential	37.9	64.6	42.6	37.3	54.8	27.7	Use Llama2-7B-chat with trained embedding table
SuRe	Branching	37.1	53.2	33.4	20.6	48.1	24.2	Use provided prompt
REPLUG	Branching	28.9	57.7	31.2	21.1	27.8	20.2
SKR	Conditional	33.2	56.0	32.4	23.4	31.7	17.0	Use infernece-time training data
Adaptive-RAG	Conditional	35.1	56.6	39.1	28.4	40.4	16.0
Ret-Robust	Loop	42.9	68.2	35.8	43.4	57.2	33.7	Use LLAMA2-13B with trained lora
Self-RAG	Loop	36.4	38.2	29.6	25.1	32.7	21.9	Use trained selfrag-llama2-7B
FLARE	Loop	22.5	55.8	28.0	33.9	20.7	20.2
Iter-Retgen, ITRG	Loop	36.8	60.1	38.3	21.6	37.9	18.2
IRCoT	Loop	33.3	56.9	41.5	32.4	45.6	20.7

📓 Supporting Datasets & Document Corpus

Datasets

We have collected and processed 36 datasets widely used in RAG research, pre-processing them to ensure a consistent format for ease of use. For certain datasets (such as Wiki-asp), we have adapted them to fit the requirements of RAG tasks according to the methods commonly used within the community. All datasets are available at Huggingface datasets.

For each dataset, we save each split as a jsonl file, and each line is a dict as follows:

{
  'id': str,
  'question': str,
  'golden_answers': List[str],
  'metadata': dict
}

Below is the list of datasets along with the corresponding sample sizes:

Task	Dataset Name	Knowledge Source	# Train	# Dev	# Test
QA	NQ	wiki	79,168	8,757	3,610
QA	TriviaQA	wiki & web	78,785	8,837	11,313
QA	PopQA	wiki	/	/	14,267
QA	SQuAD	wiki	87,599	10,570	/
QA	MSMARCO-QA	web	808,731	101,093	/
QA	NarrativeQA	books and story	32,747	3,461	10,557
QA	WikiQA	wiki	20,360	2,733	6,165
QA	WebQuestions	Google Freebase	3,778	/	2,032
QA	AmbigQA	wiki	10,036	2,002	/
QA	SIQA	-	33,410	1,954	/
QA	CommonSenseQA	-	9,741	1,221	/
QA	BoolQ	wiki	9,427	3,270	/
QA	PIQA	-	16,113	1,838	/
QA	Fermi	wiki	8,000	1,000	1,000
multi-hop QA	HotpotQA	wiki	90,447	7,405	/
multi-hop QA	2WikiMultiHopQA	wiki	15,000	12,576	/
multi-hop QA	Musique	wiki	19,938	2,417	/
multi-hop QA	Bamboogle	wiki	/	/	125
multi-hop QA	StrategyQA	wiki	2290	/	/
Long-form QA	ASQA	wiki	4,353	948	/
Long-form QA	ELI5	Reddit	272,634	1,507	/
Long-form QA	WikiPassageQA	wiki	3,332	417	416
Open-Domain Summarization	WikiASP	wiki	300,636	37,046	37,368
multiple-choice	MMLU	-	99,842	1,531	14,042
multiple-choice	TruthfulQA	wiki	/	817	/
multiple-choice	HellaSWAG	ActivityNet	39,905	10,042	/
multiple-choice	ARC	-	3,370	869	3,548
multiple-choice	OpenBookQA	-	4,957	500	500
multiple-choice	QuaRTz	-	2696	384	784
Fact Verification	FEVER	wiki	104,966	10,444	/
Dialog Generation	WOW	wiki	63,734	3,054	/
Entity Linking	AIDA CoNll-yago	Freebase & wiki	18,395	4,784	/
Entity Linking	WNED	Wiki	/	8,995	/
Slot Filling	T-REx	DBPedia	2,284,168	5,000	/
Slot Filling	Zero-shot RE	wiki	147,909	3,724	/
In-domain QA	DomainRAG	Web pages of RUC	/	/	485

Document Corpus

Our toolkit supports jsonl format for retrieval document collections, with the following structure:

{"id":"0", "contents": "...."}
{"id":"1", "contents": "..."}

The contents key is essential for building the index. For documents that include both text and title, we recommend setting the value of contents to {title}\n{text}. The corpus file can also contain other keys to record additional characteristics of the documents.

In the academic research, Wikipedia and MS MARCO are the most commonly used retrieval document collections. For Wikipedia, we provide a comprehensive script to process any Wikipedia dump into a clean corpus. Additionally, various processed versions of the Wikipedia corpus are available in many works, and we have listed some reference links.

For MS MARCO, it is already processed upon release and can be directly downloaded from its hosting link on Hugging Face.

Index

To facilitate easier replication of the experiments, we now provide a preprocessed index available in the ModelScope dataset page: FlashRAG_Dataset/retrieval_corpus/wiki18_100w_e5_index.zip.

The index was created using the e5-base-v2 retriever on our uploaded wiki18_100w dataset, which is consistent with the index used in our experiments.

🙌 Additional FAQs

How should I set different experimental parameters?
How to build my own corpus, such as a specific segmented Wikipedia?
How to index my own corpus?
How to reproduce supporting methods?

🔖 License

FlashRAG is licensed under the MIT License.

🌟 Citation

Please kindly cite our paper if helps your research:

@article{FlashRAG,
    author={Jiajie Jin and
            Yutao Zhu and
            Xinyu Yang and
            Chenghao Zhang and
            Zhicheng Dou},
    title={FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research},
    journal={CoRR},
    volume={abs/2405.13576},
    year={2024},
    url={https://arxiv.org/abs/2405.13576},
    eprinttype={arXiv},
    eprint={2405.13576}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

⚡FlashRAG: A Python Toolkit for Efficient RAG Research

Installation | Features | Quick-Start | Components | Supporting Methods | Supporting Datasets | FAQs

🔗 Navigation

✨ Features

🔎 Roadmap

📃 Changelog

🔧 Installation

🚀 Quick Start

Corpus Construction

Index Construction

For Dense Retrieval Methods

For Sparse Retrieval Methods (BM25)

Building Index with BM25s

Building Index with Pyserini

Using the ready-made pipeline

Build your own pipeline!

Just use components

⚙️ Components

RAG-Components

Pipelines

🤖 Supporting Methods

📓 Supporting Datasets & Document Corpus

Datasets

Document Corpus

Index

🙌 Additional FAQs

🔖 License

🌟 Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

⚡FlashRAG: A Python Toolkit for Efficient RAG Research

Installation | Features | Quick-Start | Components | Supporting Methods | Supporting Datasets | FAQs

🔗 Navigation

✨ Features

🔎 Roadmap

📃 Changelog

🔧 Installation

🚀 Quick Start

Corpus Construction

Index Construction

For Dense Retrieval Methods

For Sparse Retrieval Methods (BM25)

Building Index with BM25s

Building Index with Pyserini

Using the ready-made pipeline

Build your own pipeline!

Just use components

⚙️ Components

RAG-Components

Pipelines

🤖 Supporting Methods

📓 Supporting Datasets & Document Corpus

Datasets

Document Corpus

Index

🙌 Additional FAQs

🔖 License

🌟 Citation