Skip to content

A code-mixed annotation tool designed to significantly enhance annotation quality and efficiency. It reduces annotation time and operational overheads by providing advanced features tailored for code-mixed data. The tool offers intuitive interfaces, automated suggestions, and robust error-checking mechanisms.

License

Notifications You must be signed in to change notification settings

lingo-iitgn/commentator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Commentator ✍️

  • A Code-mixed Multilingual Text Annotation Framework.
  • Code-mixing on Hinglish Data.
  • Easy extensibility to other code-mixed language pairs such as Gujarati-English, Marathi-English etc., In order to extend COMMENTATOR, please read the Configuration Changes file in the Documents section of this repository.

1. Relevant Links πŸ”—

Source Code: https://github.com/lingo-iitgn/commentator/


Demo Video: https://bit.ly/commentator_video


Project Website: https://lingo.iitgn.ac.in/codemixing/


Usage

As an Annotator
  • Sign-Up to create a new annotator account

  • Login using the credentials

  • Special Credentials πŸ˜‰

    username: commentator
    password: commentator
    
As an admin
  username: admin
  password: admin

2. Folder Structure πŸ“š

backend
	app.py
	requirements.txt
	Dockerfile
	LID_tool
fronend
	build
	node_module
	public
	src
		Admin
		Auth
		Components
		Edit
                Matrix
                POS
		Home
		User
		utils
		Router.js
	.env
	.ignore
	package-lock.json
	package.json
frontend/src/.env
REACT_APP_BACKEND_URL=http://<YOUR_BACKEND_IP_ADDRESS>:5000
OR
REACT_APP_BACKEND_URL=http://localhost:5000

3. Database Schemas 🏬

lid LID based Language Identification of Tokens
matrix Matrix based Identification of Sentences
pos POS tags based Identification of Tokens
sentences Sentences to be annotated
users Admin & Annotator Accounts

4. Backend [ Local Server ] πŸ’»

Steps to Follow

a. Navigate inside backend folder

cd backend

b. Installing Dependencies

pip install -r requirements.txt

c. Updating Frontend URL

Open app.py in a code/text editor (Visual Studio Code, Sublime Text, Notepad etc)

frontend = YOUR_FRONTEND_HOST_URL
OR
frontend = http://localhost:3003

d. Updating MongoDB URL

Open app.py in a code/text editor (Visual Studio Code, Sublime Text, Notepad etc)

conn_str = YOUR_MONGODB_URL
OR
conn_str = "mongodb://127.0.0.1:27017/"

e. Running the local server

python app.py
OR
python3 app.py

5. Frontend [ Local Server ] πŸ’»

Steps to Follow

a. Navigate inside frontend folder

cd frontend

b. Install all frontend dependencies post 1st application download.

npm i

c. Start the frontend local server.

npm start

OR click on the frontend bash/shell file to run the frontend local server.


6. Administrative Configuration πŸ›‚

Steps to Follow
  1. Start Frontend and Backend Local Server. (Refer 4.e & 5.c)
  2. Create an admin account.
  3. Open MongoDB database and set admin=True to create superuser/admin account.
  4. Login to Admin Dashboard.
  5. Upload sentences to the database (csv).

7. Containerization of Backend using Docker πŸ‹

Steps to Follow

a. Creating a Docker Hub Account and a public repository

Visit https://hub.docker.com/

b. Updating Dockerfile

FROM python:3.9-slim-buster
WORKDIR /commentator
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
ENV FLASK_APP=app.py
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0"]
EXPOSE 5000/tcp

b. Push Image to Docker Hub

docker build . -t python-docker
docker tag python-docker <DOCKER_USERNAME>/<REPOSITORY_NAME>
docker push <DOCKER_USERNAME>/<REPOSITORY_NAME>

c. Run Docker server on port 5000

docker run -dp 5000:5000 <DOCKER_USERNAME>/<REPOSITORY_NAME>

d. List of active docker containers

docker ps

e. Stop Docker Container by Container ID.

docker stop <CONTAINER_ID>

8. Contributors πŸ‘₯

vs Rajvee Sheth https://www.linkedin.com/in/rajvee-sheth
tn Shubh Nisar https://shubh-nisar.github.io
vs Heenaben Prajapati https://www.linkedin.com/in/heena-prajapati-4977851a5/
tn Himanshu Beniwal https://himanshubeniwal.github.io/
vs Mayank Singh https://mayank4490.github.io/

9. Mentions πŸ‘€

Citation

If you use this framework in your research or work, please cite it as follows:

@inproceedings{sheth-etal-2024-commentator,
    title = "Commentator: A Code-mixed Multilingual Text Annotation Framework",
    author = "Sheth, Rajvee  and
      Nisar, Shubh  and
      Prajapati, Heenaben  and
      Beniwal, Himanshu  and
      Singh, Mayank",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-demo.11",
    pages = "101--109",
}

About

A code-mixed annotation tool designed to significantly enhance annotation quality and efficiency. It reduces annotation time and operational overheads by providing advanced features tailored for code-mixed data. The tool offers intuitive interfaces, automated suggestions, and robust error-checking mechanisms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published