Naive Bayes Spam Detection from Scratch

About

This project is home to a naive bayesian classifier implemented from scratch in go. The dataset this program was designed for is comprised of 12 binary features followed by binary class label, and each feature corresponds to indicators/attributes collected from spam and legitimate emails.

Bayes Theorem

Baye's Theorem states that the probability of a hypothesis H conditional on a given body of data E is the ratio of the unconditional probability of the conjunction of the hypothesis with the data to the unconditional probability of the data alone.

Baye's theorem is defined as the probability of H conditional on E is defined as PE(H) = P(H & E)/P(E), provided that both terms of this ratio exist and P(E) > 0.

Application

Baye's theorem is used to calculate the conditional probability of an input belonging to a specific class based on prior knowledge. In this case, the program takes two command-line arguments to a labelled and unlabelled dataset. First, it applies the knowledge gained by the model's during the training phase from the labelled dataset to determine the conditional probability of a given feature belonging to the spam or non-spam classes. It then uses this probability to label the unlabelled set, before displaying its findings.

This project was mainly completed as a way to learn and practice go, it was not intended for practical or varied use; some functions are designed around the specific datasets.

Tested on Mac and Linux.

Features

Five Step Process

Separate Data By Classes.
Summarize Dataset.
Summarize Data By Classes.
Run Gaussian Probability Density Function.
Estimate Class Probabilities.

Usage

Build the program from source using

 go build main.go

To run the program, first make sure it is marked as executable with

  chmod +x main

Then run the program with

  ./main <path-to>spamLabelled.dat <path-to>spamUnlabelled.dat

Contributing

Feel free to fork or make contributions. Any feedback is always welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
main.go		main.go
sampleoutput.txt		sampleoutput.txt
spamLabelled.dat		spamLabelled.dat
spamUnlabelled.dat		spamUnlabelled.dat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Naive Bayes Spam Detection from Scratch

About

Bayes Theorem

Application

Features

Usage

Contributing

About

Releases

Packages

Languages

txmxthy/Spam-Detection-Naive-Bayes

Folders and files

Latest commit

History

Repository files navigation

Naive Bayes Spam Detection from Scratch

About

Bayes Theorem

Application

Features

Usage

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages