GitHub - UnbSky/MRT-OAST: code, experimental data, and our trained models of MRT-OAST

MRT-OAST for Code Clone Detection

This repository includes the code, experimental data.

Repository structure

model: Contains trained Multiple Representation Transformer (MRT) models.
nnet: The btransform.py file includes the code for our Multiple Representation Transformer network.
origindata: The original OJClone/GCJ/BCB dataset we used. OJClone_with_AST+OAST.csv contains question numbers, file names, code, ASTs, and our Optimized AST (OAST). AST_dictionary.txt and OAST_dictionary.txt are the vocabularies for the two types of ASTs. The datasets of GCJ and BCB are in the same format.
main_batch.py:Entry point for executing training and testing.
preprocess_data.py:Packs the data from origindata into the model.
tutils.py:Code related to training, validation and evaluation.
quick_test.py:Code related to quick evaluation.

Requirements

python 3.7
pytorch 1.13.1
matplotlib 3.5.3
numpy 1.21.6
tqdm
javalang
GPU with CUDA support is also needed

Install pytorch according to your environment, see https://pytorch.org/

How to use

python main_batch.py --cuda To train and validate the MRT model. please refer to the specific parameters in main_batch.py.
python main_batch.py --cuda --is_test --quick_test To quick test the result on MRT model.

About dataset

We saved the OJClone OASTs processed with the clang-based libtooling tool in OJClone_with_AST+OAST.csv , OASTs of GCJ and BCB are generated by oast_builder.py. You can also use your own dataset as long as it includes AST results. We use sequences enclosed in square brackets to represent restorable ASTs. For example, for a node A with two children nodes B and C, it can be represented asA [ B C ]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
nnet		nnet
README.md		README.md
ast_builder.py		ast_builder.py
bigclonebenchdata_src.zip		bigclonebenchdata_src.zip
build_java_ast.py		build_java_ast.py
dast_builder.py		dast_builder.py
dataset.py		dataset.py
googlejam4_src.zip		googlejam4_src.zip
main_batch.py		main_batch.py
oast_builder.py		oast_builder.py
origindata.z01		origindata.z01
origindata.z02		origindata.z02
origindata.zip		origindata.zip
preprocess_data.py		preprocess_data.py
quick_test.py		quick_test.py
sast_builder.py		sast_builder.py
tutils.py		tutils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MRT-OAST for Code Clone Detection

Repository structure

Requirements

How to use

About dataset

About

Releases

Packages

Languages

UnbSky/MRT-OAST

Folders and files

Latest commit

History

Repository files navigation

MRT-OAST for Code Clone Detection

Repository structure

Requirements

How to use

About dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages