This repository includes the code, experimental data.
model
: Contains trained Multiple Representation Transformer (MRT) models.nnet
: Thebtransform.py
file includes the code for our Multiple Representation Transformer network.origindata
: The original OJClone/GCJ/BCB dataset we used.OJClone_with_AST+OAST.csv
contains question numbers, file names, code, ASTs, and our Optimized AST (OAST).AST_dictionary.txt
andOAST_dictionary.txt
are the vocabularies for the two types of ASTs. The datasets of GCJ and BCB are in the same format.main_batch.py
:Entry point for executing training and testing.preprocess_data.py
:Packs the data fromorigindata
into the model.tutils.py
:Code related to training, validation and evaluation.quick_test.py
:Code related to quick evaluation.
- python 3.7
- pytorch 1.13.1
- matplotlib 3.5.3
- numpy 1.21.6
- tqdm
- javalang
- GPU with CUDA support is also needed
Install pytorch according to your environment, see https://pytorch.org/
python main_batch.py --cuda
To train and validate the MRT model. please refer to the specific parameters inmain_batch.py
.python main_batch.py --cuda --is_test --quick_test
To quick test the result on MRT model.
We saved the OJClone OASTs processed with the clang-based libtooling tool in OJClone_with_AST+OAST.csv
, OASTs of GCJ and BCB are generated by oast_builder.py
. You can also use your own dataset as long as it includes AST results. We use sequences enclosed in square brackets to represent restorable ASTs. For example, for a node A with two children nodes B and C, it can be represented asA [ B C ]