Twitter has become an important communication channel in times of emergency.
The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time.
Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).
-
Total approach towards the project can be seen on kaggle
- Machine Learning approach : https://www.kaggle.com/mohitnirgulkar/disaster-tweets-classification-using-ml
- Deep Learning approach : https://www.kaggle.com/mohitnirgulkar/disaster-tweets-classification-using-deep-learning
- Exploratory Data Analysis
- EDA after Data Cleaning
- Data Preprocessing using NLP
- Machine Learning models for classifying Tweets data
- Deep Learning approach for classifying Tweets data
- Model Deployment
- Packages : Pandas, Numpy, Matplotlib, Plotly, Word-cloud, Tensorflow, Scikit-Learn, Keras, Keras-tuner, Nltk etc.
- Dataset : https://www.kaggle.com/c/nlp-getting-started
- Word Embeddings : https://www.kaggle.com/danielwillgeorge/glove6b100dtxt
-
Visualising Target Variable of the Dataset
-
Visualising Length of Tweets
-
Visualising Average word lengths of Tweets
-
Visualising most common stop words in the text data
-
Visualising most common punctuations in the text data
-
We use Python Regex library and nltk lemmatizing methods for Data Cleaning
-
Visualising words inside Real Disaster Tweets
-
Visualising words inside Fake Disaster Tweets
-
Visualising top 10 N-grams where N is 1,2,3
-
Data Preprocessing for ML models is done using two approaches
- Bag of Words using CountVectorizer
- Term Frequency and Inverse Document Frequency using TfidfVectorizer
-
Data Preprocessing for DL models using Tokenization
-
Machine Learning Models using different n-grams and both Bow and Tf-Idf and visualisation comparing there accuracy
-
The Best ML model trained as we can see above is Voting Classifer, whose report and confusion matrix is shown below
- Using Glove Word Embeddings of embedding dimension = 100 to get embedding matrix for our DL models
- For every DL model we create a function and use Keras-Tuner to tune the model
- Finally we choose Bidirectional LSTM for the Deployment
- Bidirectinal LSTM model obtained from Deep Learning approach is used for deployment
- Micro Web Framework Flask is used to create web app
- Heroku is used to deploy the our Web-app on https://disastertweetsdl.herokuapp.com/
- Deep Learning Web app working
- We can always use large dataset which covers almost every type of data for both machine learning and deep learning
- We can use the best pretrained models but they require a lot of computational power
- Also there are various ways to increase model accuracy like k-fold cross validation, different data preprocessing techniques better than used here
The Data analysis and modelling was sucessfully done, and the Deep Learning model was deployed on Heroku
Please do ⭐ the repository, if it helped you in anyway.