@misc{bfdataset,
author = {Martin Monperrus},
title = {Curated dataset of bug fix commits from "An Empirical Study on Real Bug Fixes"},
year = 2017,
doi = {10.5281/zenodo.1004734},
url = {https://doi.org/10.5281/zenodo.1004734}
}
It contains the bug fix commit dataset of An Empirical Study on Real Bug Fixes (ICSE 2015).
- The 5
*orig.txt
contain the SVN commit identifiers. The files were given by Hao Zhong in a private communication, expect for Cassandra, whose commit list has been reconstructed. The format of the fourth column is {commit id}_{issue id}. If the issue id is “internal”, it means that the commit has been included because the commit message uses "bug" or "fix". - The 14914
log.d/*
files contains the commit metadata (message, author, date) - The 14914
svn.d/*
files contains the files touched by the commits (the file name contains the commit identifier) - The 5956
jira.d/*
files contains the JSON data of the JIRA bug reports mentioned in the commit messages
The two python files create a bug JSON file called bug-fix-commit-dataset.json
with all information merged.