- About The Project
- About the Data
- Technology Stack
- Getting Started
- Contributing
- License
- Contact
- Acknowledgements
CineReviewX is an IMDB movie sentiment analysis project that classifies movie reviews into two categories: positive or negative. Using deep learning techniques, this project processes textual movie reviews to predict the sentiment behind them. By leveraging Recurrent Neural Network (RNN) models, it analyzes large datasets of reviews, providing insights into how audiences emotionally react to films.
-
Data Collection: The project uses a dataset of movie reviews from IMDB, containing both positive and negative feedback from users.
-
Data Preprocessing: The raw text data is tokenized and converted into vectors using One-Hot Encoding, transforming the text into a numerical format suitable for analysis.
-
Model Building: The model is constructed using the following layers:
- Embedding Layer: Converts words into dense vector representations.
- SimpleRNN Layers: Captures patterns in the sequence of words to understand sentiment.
- Dense Output Layer: Uses a sigmoid activation function to output a probability value, classifying the sentiment as either positive (1) or negative (0).
-
Model Training: The model is trained on the preprocessed dataset, utilizing appropriate optimization techniques to minimize loss and improve accuracy. The early stopping mechanism monitors validation loss and halts training when performance plateaus.
-
Model Evaluation: After training, the model is evaluated based on metrics such as accuracy, precision, recall, and F1-score to assess its performance on unseen data.
-
Deployment: A Streamlit web application is developed to allow users to input movie reviews and get real-time sentiment predictions.
- Sentiment Classification: Classifies movie reviews as Positive or Negative.
- Data Preprocessing: Prepares and cleans raw text for machine learning models.
- Text Vectorization: Converts raw text into numerical format using One-Hot Encoding.
- Modeling: Implements a Simple RNN for sentiment analysis.
- Evaluation Metrics: Measures model performance using accuracy.
The dataset contains a collection of IMDB movie reviews and their corresponding sentiment labels (positive or negative). It includes user reviews, movie ratings, and associated metadata for each review. The dataset is preprocessed to remove irrelevant information such as stop words, HTML tags, and special characters.
The target variable for this project is Sentiment (either 1 for positive or 0 for negative), which is derived from the movie reviews. The model's goal is to predict the sentiment label based on the review text.
The dataset used in this project is publicly available on Kaggle IMDB Dataset. You can download it directly from there for local use or to train the models.
- Python
- Streamlit
- TensorFlow
- Tensorboard
- Scikit-learn
- Keras
- Pandas
- NumPy
- Pickle
To get started with this project locally, you’ll need Python 3.10+ installed on your machine along with some necessary Python packages. You can either clone the repository and install dependencies manually or use Docker for an isolated environment.
-
Clone the repository:
- Open your terminal or command prompt.
- Navigate to the directory where you want to install the project.
- Run the following command to clone the GitHub repository:
git clone https://github.com/shubhamprajapati7748/CineReviewX.git
-
Create a Virtual Environment (Optional)
- It's a good practice to create a virtual environment to manage project dependencies. Run the following command:
conda create -p <Environment_Name> python==<python version> -y
- It's a good practice to create a virtual environment to manage project dependencies. Run the following command:
-
Activate the Virtual Environment (Optional)
- Activate the virtual environment based on your operating system:
conda activate <Environment_Name>/
- Activate the virtual environment based on your operating system:
-
Install Dependencies
- Navigate to the project directory:
cd [project_directory]
- Run the following command to install project dependencies:
pip install -r requirements.txt
- Navigate to the project directory:
-
Run the Project
streamlit run app.py
-
Access the Project
- Visit
http://localhost:8501
in your browser to use the app.
- Visit
We welcome contributions to improve this project! Whether you are fixing bugs, adding features, or improving documentation, feel free to fork the repository and submit a pull request.
- Fork the repo.
- Create a new branch (
git checkout -b feature-name
). - Make your changes.
- Commit your changes (
git commit -am 'Add feature'
). - Push to your branch (
git push origin feature-name
). - Create a new Pull Request.
Distributed under the MIT License. See LICENSE
for more information.
Shubham Prajapati - @[email protected]
- TensorFlow: For providing the machine learning framework to train the predictive model.
- Streamlit: For creating the interactive web application.
- Scikit-learn: For preprocessing utilities such as scaling and encoding.
- Kaggle: For the inspiration behind the dataset, which is similar to the dataset used in this project.