Skip to content

This is an app that the AIEMINES team developed during the ThinkAI competition.

Notifications You must be signed in to change notification settings

eniafou/Mustashari

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mustashari

Table of Contents

About the project

Mustashari is an app that enables Moroccans to access cheap legal consulting. Our goal is for Mustashari to be able to take questions in Moroccan Darija and give an appropiate answer based on the relevant articles from the Moroccan law. Right now Mustashari is able to understand text in Darija and pair it with the appropriate sections from the morrocan law. However, it doesn't generate satisfying answers because we currently lack access to a robust generative model API. It's now a prompt generator that can be used with a powerful AI like ChatGPT.

Team

Usage

To use Mustashari on your machine you can …

  • Step 1: clone the repo
  • Step 2: run the command "pip install -r requirements"
  • Step 3: run the command "streamlit run streamlit _app.py"

Or you can just click on this link to use it on the web.

You can also look throughs the notebooks to better understand how the code works.

Architecture

Flowchart

Difficulties and challenges

This was our first time working on a project about generative AI. We had to learn how to use APIs and combine multiple technologies from multiple sources to achieve a specific goal. We had to learn about retrieval models, embedding, cosine similarity, Hugging Face, prompt engineering ...

In the begenning we wanted to use the OpenAI API to generate the final answers, but we soon learned that it wasn't free. We decided to work with cohere as an alternative, however we learned later on that it wasn't a robust model or at least we weren't able to find a good prompt to feed to it.

We tried multiple retrieval and embeddings models : Chroma, CohereEmbedding, all-MiniLM-L12-v2,flaubert, spaCy, docquery. The results weren't good on our dataset of laws written in french. We tried some multilangual and french embedding models like : distiluse-base-multilingual-cased-v1, dangvantuan/sentence-camembert-large, dangvantuan/sentence-camembert-base. However, we couldn't get better results.

We decided to translate the laws to english, we simply used google translate. The retrieval results using all-MiniLM-L12-v2 with the translated laws were better, however due to the low quality of translation it is still not satisfactory. To solve this, we used two models (all-MiniLM-L12-v2 + spaCy). The first one is a semantic model and the later one is a statistical model, each one of them retrieve 3 chunks of texts from the law. We then combined their results using docquery.

We found some powerful models in published by Meta, However we were only able to run a the weakest model because of computational costs.

Finally, The app is still very slow and impractical and need more developpement.

Technology used and Credits

About

This is an app that the AIEMINES team developed during the ThinkAI competition.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published