Skip to content

code, experimental data, and our trained models of MRT-OAST

Notifications You must be signed in to change notification settings

UnbSky/MRT-OAST

Repository files navigation

MRT-OAST for Code Clone Detection

This repository includes the code, experimental data.

Repository structure

  • model: Contains trained Multiple Representation Transformer (MRT) models.
  • nnet: The btransform.py file includes the code for our Multiple Representation Transformer network.
  • origindata: The original OJClone/GCJ/BCB dataset we used. OJClone_with_AST+OAST.csv contains question numbers, file names, code, ASTs, and our Optimized AST (OAST). AST_dictionary.txt and OAST_dictionary.txt are the vocabularies for the two types of ASTs. The datasets of GCJ and BCB are in the same format.
  • main_batch.py:Entry point for executing training and testing.
  • preprocess_data.py:Packs the data from origindata into the model.
  • tutils.py:Code related to training, validation and evaluation.
  • quick_test.py:Code related to quick evaluation.

Requirements

  • python 3.7
  • pytorch 1.13.1
  • matplotlib 3.5.3
  • numpy 1.21.6
  • tqdm
  • javalang
  • GPU with CUDA support is also needed

Install pytorch according to your environment, see https://pytorch.org/

How to use

  1. python main_batch.py --cuda To train and validate the MRT model. please refer to the specific parameters in main_batch.py.
  2. python main_batch.py --cuda --is_test --quick_test To quick test the result on MRT model.

About dataset

We saved the OJClone OASTs processed with the clang-based libtooling tool in OJClone_with_AST+OAST.csv , OASTs of GCJ and BCB are generated by oast_builder.py. You can also use your own dataset as long as it includes AST results. We use sequences enclosed in square brackets to represent restorable ASTs. For example, for a node A with two children nodes B and C, it can be represented asA [ B C ]

About

code, experimental data, and our trained models of MRT-OAST

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages