Skip to content

alihassanml/fake-news-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Fake News Detection Project

This repository contains code for a machine learning project focused on detecting fake news. The project utilizes Python and various libraries such as Pandas, NumPy, Matplotlib, NLTK, and Scikit-learn.

Kaggle : https://www.kaggle.com/code/alihassanml/fake-news-detection

Overview

Fake news has become a significant issue in today's information age, where misinformation can spread rapidly through various media channels. This project aims to build a machine learning model that can automatically detect fake news articles based on their content.

##Libraries Used:##

  • pandas: Used for data manipulation and analysis.
  • numpy: Provides support for mathematical functions and operations on arrays.
  • matplotlib: A plotting library used for data visualization.
  • re: Offers support for regular expressions, helpful for text preprocessing.
  • seaborn: Works alongside matplotlib for enhanced data visualization.
  • nltk: Natural Language Toolkit, used for text processing tasks such as stopword removal and stemming.
  • scikit-learn (sklearn):
    • TfidfTransformer and TfidfVectorizer: Used for feature extraction from text data using TF-IDF (Term Frequency-Inverse Document Frequency).
    • train_test_split: For splitting the dataset into training and testing sets.
    • LogisticRegression: Implements logistic regression, a commonly used classification algorithm.
    • accuracy_score: Calculates the accuracy of the model.

Dataset

The dataset used in this project contains a collection of news articles labeled as either fake or real. It includes various features such as the title, text, and other metadata.

Approach

  1. Data Preprocessing: Text data is cleaned and preprocessed using techniques such as removing stopwords, stemming, and vectorization.
  2. Feature Engineering: Text features are extracted using TF-IDF (Term Frequency-Inverse Document Frequency) vectorization.
  3. Model Training: The preprocessed data is split into training and testing sets. A logistic regression model is trained on the training data.
  4. Model Evaluation: The trained model is evaluated on the test set using accuracy as the performance metric.

Project Workflow:

  1. Data Loading: The project likely starts with loading a dataset containing both real and fake news articles.
  2. Data Preprocessing:
    • Text Cleaning: Removing unnecessary characters, special symbols, and URLs.
    • Tokenization: Splitting the text into individual words or tokens.
    • Stopword Removal: Eliminating common words that do not carry significant meaning.
    • Stemming: Reducing words to their root form to normalize the text.
  3. Feature Extraction: Using TF-IDF to convert text data into numerical vectors.
  4. Model Training: Splitting the data into training and testing sets, then training a logistic regression model on the training data.
  5. Model Evaluation: Evaluating the trained model's performance using accuracy metrics on the testing set.
  6. Deployment: After successful evaluation, the model can be deployed to predict fake news on new data.

Repository Structure

  • data/: Contains the dataset used in the project.
  • notebooks/: Jupyter notebooks containing code for data preprocessing, model training, and evaluation.
  • scripts/: Python scripts for various functions and utilities used in the project.
  • README.md: This file, providing an overview of the project.

Usage

To run the project:

  1. Clone this repository to your local machine:
    git clone https://github.com/alihassanml/fake-news-detection.git
  2. Navigate to the project directory:
    cd fake-news-detection
  3. Install the required dependencies:
    pip install -r requirements.txt
  4. Open and run the Jupyter notebooks in the notebooks/ directory to explore the project.

Feel free to contribute to this project by opening issues or pull requests.


Replace [Ali Hassan](https://github.com/lihassanml) with your GitHub profile link. This README provides an overview of your project, its structure, and instructions for usage and contribution. Feel free to customize it further according to your project's specifics.

About

Fake News Detection Machine Learning Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published