Table of Contents
This project explores different methodologies in creating a QA chatbot for local knowledgebase.
Three approaches were used to create this chatbot. In the notebook folder, you can find different models.
The notebooks/baseline_model comprise a Question-Answering Search Engine utilizing Transformers. Initially, the documents were encoded by a Bi-Encoder, followed by retrieving the top 5 most relevant pieces of information from the corpus. Subsequently, the model re-ranks these top 5 candidates through a cross-encoder, and the highest-ranked result after re-ranking is selected as the answer.
Several other models were developed by leveraging Large Language Models.
The first solution is an open-source solution. We employed Hugging Face embeddings and Google's Flan-T5 as the Large Language Model for synthesis. Further details can be found here: notebooks/Flan_T5
The winning model is a hybrid solution, where we utilized Hugging Face for embeddings and OpenAI to synthesize the answer. Additional details can be found here: notebooks/Hybrid_Model
The material used to construct local knowledge base in this case, is a PDF. A sample statistics textbook is provided and can be found in the data folder. All notebooks were created using Google Colab. To execute the notebooks:
-
Upload the material to a Google drive folder, and get the folder id.
-
Copy the folder id of each of the subfolders and saved them into 'folder_id' variables
Results generated by the chatbots were compared manually with the sample answers. The outcomes are as follows:
Hybrid Model: 78% Flan-T5: 40%
In a remarkable feat, this chatbot has attained the ability to answer questions with human-like proficiency. Notably, it possesses multilingual capabilities. Despite training on an English corpus, the Chatbot can aptly address inquiries in virtually any language. This achievement was made possible by integrating Google Translator—when posed in German, questions receive responses in German.
Moreover, we employed prompt engineering techniques to ensure that the Chatbot responds exclusively to context-based queries. This enhances the precision and relevance of its responses
If you find this repo interesting or would like to suggest improvements, please get in touch. we would be happy to hear from you.