Skip to content

Technologia-X/naive_bayes_text_classification_scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAIVE BAYES FROM SCRATCH

Basic of Naive Bayes

Image of Naive Bayes

Multinomial Naive Bayes to train and use to predict the sentiment (can be used for multi class text classification not just 1 and 0)

Tokenizer to remove all html tags and special chars as well english stopwords

Test Case

IMDB Movie Review Sentiment Analyzer

Naive Bayes used to predict whether the sentiment from a review is positive or negative (2 class predictor)

Accuracy (86%):

                precision    recall  f1-score   support

           0       0.89      0.81      0.85      2481
           1       0.83      0.90      0.86      2519

    accuracy                           0.86      5000
   macro avg       0.86      0.86      0.86      5000
weighted avg       0.86      0.86      0.86      5000

Confusion matrix

Conf

460 wrong negative predicted on test data 262 wrong positive predicted on test data

At the end, test data was used to see how well the model can predict sentiment on unseen data

Output is here

How to use

  1. Make sure you have numpy, seaborn, pandas, scikit-learn and wordcloud installed (if not just pip install libname)
  2. Clone the repo
  3. Download new data from kaggle or anywhere or you can use the data provided in data folder
  4. Run sentiment_predictor.py

Contributor

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages