This tool contains methods to download, analyze, and extract data from job logs hosted at GitLab CI. They are:
- Harvester which downloads the logs (
harvester_main.py
) - Analyzer to extract relevant information from logs (
analyzer_main.py
) - Data Extraction (
data_extraction_main.py
) to split the data by variants (Read our article to know about Highly Configuration Systems) - Project Status (
project_status_main.py
) to observe relevant information about the system and its variants, such as the period (variant exist/logs extracted), the number of builds, the number of faults (and build that fails), the number of tests (variation), the mean test duration, and the interval between commits.
If this tool contributes to a project which leads to a scientific publication, I would appreciate a citation.
@InProceedings{PradoLima_Learning2020,
author = {Prado Lima, Jackson A. and Mendon\c{c}a, Willian D. F. and Vergilio, Silvia R. and Assun\c{c}\~{a}o, Wesley K. G.},
title = {{Learning-Based Prioritization of Test Cases in Continuous Integration of Highly-Configurable Software}},
booktitle = {Proceedings of the 24th ACM Conference on Systems and Software Product Line: Volume A - Volume A},
series = {SPLC'20}
year = {2020},
isbn = {9781450375696},
doi = {10.1145/3382025.3414967},
articleno = {31},
numpages = {11},
location = {Montreal, Quebec, Canada},
publisher = {Association for Computing Machinery},
}
The following command allows to install the required dependencies:
$ pip install -r requirements.txt
- Create a personal Access Token (see Personal Token Acess guide) for the GitLab instance desired, for example, https://gitlab.com or https://gitlab.dune-project.org/. This token needs privileges to read the repository and gather the logs.
- Complete the configuration.properties file with your GitLab Access Token
WARNING: Sometimes the connection does not work, and you need to change the path for the properties file in gitlabci_torrent/utils/gitlab_utils.py and use the absolute path.
To download the logs from a project, do:
python harvester_main.py -p ProjectID -k ConfigKey
where:
-p
or--project_id
for Repository ID (or Project ID). Follow this answer to find the ID.-k
or--configkey
is the Configuration Key saved in configuration.properties (default GitLab)
The another parameters available are:
- The user can pass a directory where the logs will be saved using
-d
or--save_dir
(default logs). - The user can define a threshold for the mining using the parameter
-t
or--threshold
. This parameter is a date threshold in format YYYY/MM/DD, otherwise it will return all logs.
To extract the features for one project, do:
python analyzer_main.py -d PathToLogs
where:
-d
or--logs_dir
is the directory with the logs.
To split the test results by variant, do:
python data_extraction_main.py -d PathToLogs -p ProjectName
where:
-d
or--logs_dir
is the directory with the logs.-p
or--project_name
the project name. Here, some projects have similar name for repository and (gitlab) user. In this way, you can decide the right name.
To observe the project status for one project, do:
python project_status_main.py -d PathToLogs -p ProjectName
where:
-d
or--logs_dir
is the directory with the logs.-p
or--project_name
the project name. Here, some projects have similar name for repository and (gitlab) user. In this way, you can decide the right name.