Linear SVM Using Squared Hinge Loss

This is an implementation of a Linear SVM that uses a squared hinge loss. This algorithm was coded using Python. This is my submission for the polished code release for DATA 558 - Statistical Machine Learning. The code was developed by Tejas Hosangadi.

The Optimization Problem

The Linear SVM that Uses Squared Hinge Loss writes out as shown below:

The above equation is differentiable and convex, hence we can apply gradient descent. This implementation of the SVM uses the fast gradient algorithm, which improves the speed and accuracy of the descent. As is expected, the SVM is a binary classifier, and can be used to perform classification of data that exists in two classes. If more than two classes exist, multi class classification strategies like one versus one (1v1) and one versus rest (1vr) can be used.

The algorithm uses backtracking to determine the best step size. In addition, the algorithm stops when the maximum number of iterations are reached.

Directory Structure

The structure of the directory is shown below:

|- Linear-SVM-Using-Squared-Hinge-Loss\
	|- linearsvm\
		|- linearsvm.py
		|- real_data_demo.py
		|- simulated_data_demo.py
		|- sklearn_compare.py
	|- LICENSE
	|- README.md
	|- equation.jpeg

linearsvm.py: This contains all the coded methods required by the project. If you want to understand how I coded my methods, you can have a look at the file. It is not explicitly executable, but contains methods used by the other modules.
real_data_demo.py: This module contains the code that demonstrates the SVM for real data, in this case, spam data from The Elements of Statistical Learning.
simulated_data_demo.py: This module contains the code that demonstrates the SVM for synthetic/simulated data.
sklearn_compare.py: This module contains code that checks the performance of our SVM compared to the one that is implemented by sklearn.

Dependencies

This code uses a number of different dependencies. First and foremost, Python 3 is required. Most Unix based systems (including Mac), include Python installed. It's always ideal to check to make sure the correct version is installed.

The Python based packages that are required are listed below

Numpy
Scipy
Scikit-learn
Pandas

These can be installed using the pip command or the conda command if an Anaconda based environment is being used.

Data

Two types of data are used in the demo files, one is a simulated dataset, and the other is the spam dataset from The Elements of Statistical Learning. All data is downloaded on execution, and you are not required to manually download or install any data.

Instructions

First and foremost, clone or download the repository to your local system. Make sure all the dependencies are installed, and work as they are supposed to. Finally, execute the demo and sklearn comparison modules as shown below from your terminal:

$>python name_of_the_module.py

On execution, you will get some textual output in the terminal window, and a series of plots will be generated. Plots are generated in singular. What that means is, once a plot is generated and you have viewed it, you need to close the plot for the next one to be generated and displayed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear SVM Using Squared Hinge Loss

The Optimization Problem

Directory Structure

Dependencies

Data

Instructions

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
linearsvm		linearsvm
LICENSE		LICENSE
README.md		README.md
equation.jpeg		equation.jpeg

License

tejasmhos/Linear-SVM-Using-Squared-Hinge-Loss

Folders and files

Latest commit

History

Repository files navigation

Linear SVM Using Squared Hinge Loss

The Optimization Problem

Directory Structure

Dependencies

Data

Instructions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages