Skip to content

This repository contains files from a client project aimed at performing sentiment analysis on an Instagram comments dataset and developing a Streamlit app for model deployment.

Notifications You must be signed in to change notification settings

darren7753/sentiment_analysis_of_instagram_comments

Repository files navigation

Sentiment Analysis of Instagram Comments Open in Streamlit

This project was commissioned by a client on 2024-06-15. If you're interested in similar work, check out my freelance data analyst profile on Fastwork.

Objective

This project aims to perform sentiment analysis using a linear kernel SVM on an Instagram comment dataset provided by the client. Additionally, a Streamlit app was developed for seamless model deployment and user interaction.

Sentiment Analysis

The first step in the process was text preprocessing, which included a range of tasks from removing unnecessary characters to lemmatization. A significant challenge was translating Indonesian slang into formal English, as most text processing libraries, such as SpaCy, are optimized for English. To address this, I utilized Meta's 70B parameter LLaMA 3 model for translation, which proved more effective than tools like Google Translate. Below is a sample of 3 rows from the dataset after preprocessing:

username sentimen comment translated_comment case_folding cleaning lemmatization remove_stopwords
pkk_desakisik positif terima kasih Bu Yani beserta Rombongan sudah datang di Desa Kisik We would like to thank Mrs. Yani and her entourage for visiting Kisik Village. we would like to thank mrs. yani and her entourage for visiting kisik village. we would like to thank mrs yani and her entourage for visiting kisik village like thank mrs yani entourage visit kisik village like thank mrs yani entourage visit kisik village
abde_prastio positif alhamdulillah makin keren kabupatenku sekarang 😍 Praise be to God, my regency is amazing now. praise be to god, my regency is amazing now. praise be to god my regency is amazing now praise god regency amazing now praise god regency amazing
maarif1515 positif Jalan poros kabupaten yang menghubungkan dari desa dampaan sampai dungus mohon untuk di tinjau The highway connecting from Dampaan Village to Dungus, which passes through the district's axis, is requested to be reviewed. the highway connecting from dampaan village to dungus, which passes through the district's axis, is requested to be reviewed. the highway connecting from dampaan village to dungus which passes through the district s axis is requested to be reviewed highway connect dampaan village dungus pass district axis request review highway connect dampaan village dungus pass district axis request review

The dataset was then split into training and testing sets with an 80:20 ratio. The pipeline used consisted of:

  • TF-IDF (Term Frequency-Inverse Document Frequency): Converts text into numerical features by evaluating the importance of words within the context of the entire dataset.
  • SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors): Addresses class imbalance by oversampling minority classes and removing noise from the data.
  • Linear Kernel SVM (Support Vector Machine): A machine learning model that finds the optimal hyperplane for classifying data, effective for linearly separable data.

The model achieved an accuracy score of 78.24%.

Streamlit Web App

The Streamlit web app consists of 4 pages: Beranda (Home), Prediksi Data (Data Prediction), Prediksi Komentar (Comment Prediction), and Dataset, each offering unique features.

Beranda (Home)

Beranda (Home) Page

This page serves as an introduction. It provides an overview of the web app on the left side and displays a pie chart of sentiment distribution in the dataset on the right side.

Prediksi Data (Data Prediction)

Prediksi.Data.Data.Prediction.Page.mp4

This page allows users to upload a new dataset. The trained model will then predict the sentiments of the uploaded data.

Prediksi Komentar (Comment Prediction)

Prediksi.Komentar.Comment.Prediction.Page.mp4

This page allows users to type in text. The trained model will then predict the sentiment of the entered text.

Dataset

Dataset.Page.mp4

This page allows users to upload a new training dataset. Once the dataset is uploaded, the model will automatically retrain and update based on the new data.

About

This repository contains files from a client project aimed at performing sentiment analysis on an Instagram comments dataset and developing a Streamlit app for model deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published