Comparison and evaluation of machine learning models (kNN, Decision Tree, Random Forest, XGBoost) for analysis of the CICIDS2017 Dataset for an anomaly-based IDS

Introduction

Network security is a critical issue in the modern world, as cyberattacks can cause severe damage to individuals, organizations, and governments. More than ever before, improving malware detection mechanisms have grown to be more and more essential in the new world of big data. To address these issues, there are two means of detecting malware. Traditional signature-based IDSs rely on predefined rules or patterns to detect known attacks, but they are ineffective against novel or sophisticated attacks.

Anomaly-based IDSs can overcome this limitation by using machine learning models to learn the normal network behavior and detect any anomalies. Anomaly-based IDSs detect intrusions by comparing the network traffic with a normal baseline and flagging any deviations as potential attacks. Machine learning techniques can help to learn the normal behavior of the network and identify unknown or zero-day attacks.

This project aims to make a comparison anomaly-based intrusion detection system (AIDS) using supervised machine learning (ML) models such as the k-Nearest Neighbour, Decision Tree, Random Forest and XGBoost. It also evaluates their effectiveness by deriving their accuracy, f1, precision and recall scores.

Dataset

The CICIDS2017 dataset is a realistic and up-to-date dataset of network traffic analysis and intrusion detection systems, covering common attacks, background traffic, and features. It contains benign and the most up-to-date common attacks, which resembles the true real-world data (PCAPs).

The composition, or features, of the datatset is shown in the diagram below:

It also includes the results of the network traffic analysis using CICFlowMeter with labeled flows based on the time stamp, source, and destination IPs, source and destination ports, protocols and attack (CSV files).

Results

Training and Test Accuracies:

`F1`, `Recall` & `Precision` scores of:

XGBoost:

KNN

Random Forest (RF)

Decision Tree (DT)

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Decision_Tree.png		Decision_Tree.png
Distribution-of-labels-in-the-CICIDS2017-dataset.png		Distribution-of-labels-in-the-CICIDS2017-dataset.png
KNN.png		KNN.png
README.md		README.md
RandomForest.png		RandomForest.png
Screenshot 2024-04-09 114124.png		Screenshot 2024-04-09 114124.png
Types-of-Intrusion-Detection-Techniques.png		Types-of-Intrusion-Detection-Techniques.png
XGBoost.png		XGBoost.png
cicids_notebook.ipynb		cicids_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison and evaluation of machine learning models (kNN, Decision Tree, Random Forest, XGBoost) for analysis of the CICIDS2017 Dataset for an anomaly-based IDS

Introduction

Dataset

Results

`F1`, `Recall` & `Precision` scores of:

About

Releases

Packages

Contributors 2

Languages

sin4ch/anomaly-based-IDS-using-ML

Folders and files

Latest commit

History

Repository files navigation

Comparison and evaluation of machine learning models (kNN, Decision Tree, Random Forest, XGBoost) for analysis of the CICIDS2017 Dataset for an anomaly-based IDS

Introduction

Dataset

Results

F1, Recall & Precision scores of:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`F1`, `Recall` & `Precision` scores of:

Packages