ML-based-SAST

ML-based-SAST is a tool that uses program slicing and BLSTM network to reduce the false positives during taint analysis.

Project Structure

.
├── README.md
├── ml # Machine Learning Module
│   ├── _theano # BLSTM implementation in theano
│   ├── api.py # API Server
│   ├── console.py # CLI
│   ├── data # Knowledge base for learning (includes slice and label)
│   ├── model # BLSTM model
│   ├── preprocessing.py # Data preprocessing, including tokenization
│   ├── settings.py
│   ├── tests # Test cases
│   ├── tf # BLSTM implementation in tensorflow
│   └── utils # Used for format conversion
└── report2slice # Slicing module
    ├── slice # Generates slice files
    ├── core # Core slicing and prediction module
    ├── cli # CLI entry for slicing/prediction
    ├── spotbugsGUI # Modified version of spotbugsGUI
    └── pom.xml

Build

ML-based-SAST relies on the modified version of Joana. Therefore, the first step is to build joana using the following command:
```
# Fetch sources
git clone https://github.com/anemone95/joana-mvn
cd joana
mvn clean install -DskipTests
```

Build the slicing and prediction module:

# Fetch sources
git clone https://github.com/Anemone95/MLBasedSAST
cd MLBasedSAST/report2slice
mvn clean package

Install the learning module environment. The learning module depends on the following libraries:
```
tensorflow==2.0.0
requests==2.22.0
flask==1.1.1
theano==1.0.4
fire==0.2.1
```

Usage

API.py - Prediction Server

Start a server for predictions, accepting slice and label, and initiate training:

cd MLBasedSAST/ml
python api.py --model-npz=xxx.npz # run api server

Spotbugs GUI

java -jar report2slice/spotbugsGUI/target/spotbugsGUI-1.0-SNAPSHOT.jar

After launching, you can see a modified version of the Spotbugs GUI. First, create/open a project and obtain analysis results. This step is similar to the original operation:

Set Server

Click "AI->Set Server" to set the server for predictions:

Slice and Get Prediction Results

Click "AI->Slice and Predict". The program will first analyze the taint propagation results and slice the related bugs:

After slicing, the program sends the results to the server for predictions, and you can see the prediction results on the left side.

Clear Data

If the analysis is interrupted, slicing again will start from the last successful step. If you want to start over, click "AI->Clean" to clear previous data.

Label Data

Regardless of whether a prediction is made, you can label a vulnerability instance (but slicing is required first). Right-click on the vulnerability instance to do so. The labeling results will be sent to the server for future learning:

CLI Entry

The CLI entry takes the Spotbugs xml report file as input and outputs the slice/prediction results in json format.

Slice Only

java -jar report2slice/cli/target/cli-1.0-SNAPSHOT.jar slice -f java-sec-code-1.0.0-spotbugs.xml # By default, slices are saved to ./slice/{project} folder. Use --output-dir to specify the output directory.

Slice and Predict

java -jar report2slice/cli/target/cli-1.0-SNAPSHOT.jar slice -f java-sec-code-1.0.0-spotbugs.xml --server http://127.0.0.1:8888/ # Specify the server for predictions. By default, prediction results are saved to ./predict. Use --output to specify the output directory.

console.py - Learning Console

Start a Learning Session

cd ml
python console.py train --slice-dir=data/slice/benchmark1.2 --label-dir=data/label/benchmark1.2 --epochs=20 # Slice data folder, label data folder, maximum number of iterations.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
README		README
ml		ml
report2slice		report2slice
.gitignore		.gitignore
README-zh.md		README-zh.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-based-SAST

Project Structure

Build

Usage

API.py - Prediction Server

Spotbugs GUI

Set Server

Slice and Get Prediction Results

Clear Data

Label Data

CLI Entry

Slice Only

Slice and Predict

console.py - Learning Console

Start a Learning Session

About

Releases

Packages

Contributors 2

Languages

Anemone95/MLBasedSAST

Folders and files

Latest commit

History

Repository files navigation

ML-based-SAST

Project Structure

Build

Usage

API.py - Prediction Server

Spotbugs GUI

Set Server

Slice and Get Prediction Results

Clear Data

Label Data

CLI Entry

Slice Only

Slice and Predict

console.py - Learning Console

Start a Learning Session

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages