Multinomial Naive Bayes to train and use to predict the sentiment (can be used for multi class text classification not just 1 and 0)
Tokenizer to remove all html tags and special chars as well english stopwords
Naive Bayes used to predict whether the sentiment from a review is positive or negative (2 class predictor)
Accuracy (86%):
precision recall f1-score support
0 0.89 0.81 0.85 2481
1 0.83 0.90 0.86 2519
accuracy 0.86 5000
macro avg 0.86 0.86 0.86 5000
weighted avg 0.86 0.86 0.86 5000
Confusion matrix
460 wrong negative predicted on test data 262 wrong positive predicted on test data
At the end, test data was used to see how well the model can predict sentiment on unseen data
Output is here
- Make sure you have numpy, seaborn, pandas, scikit-learn and wordcloud installed (if not just pip install libname)
- Clone the repo
- Download new data from kaggle or anywhere or you can use the data provided in data folder
- Run sentiment_predictor.py