Replication Package for "On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization"

Purpose

This article presents the replication package associated with our paper:

Junayed Mahmud, Nadeeshan De Silva, Safwat Ali Khan, Seyed Hooman Mostafavi, SM Hasan Mansur, Oscar Chaparro, Andrian Marcus, and Kevin Moran, “On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization,” in Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE 2024)

Link to the Full paper: https://dl.acm.org/doi/10.1145/3597503.3608139

Provenance

We provide access to the available dataset, source code and detailed instructions required to reproduce the experimental results discussed in our paper. We aim to apply for Available & Reusable badges and hope to further extend research on GUI-based bug localization. We recommend utilizing a recent version of the Mac operating system, and we have conducted our tests on Sonoma 14.0. For SentenceBERT and UniXCoder, we suggest using a system with GPU support. We provide all of our source code at https://github.com/SageSELab/UI-Bug-Localization-Study and the dataset is available here: https://github.com/SageSELab/GUI-Bug-Localization-Data.

Our code and data are also permanently archived at: https://doi.org/10.5281/zenodo.10460609

A user will require ~80 hours to regenerate the results for all experiments.

Paper Overview

One of the significant challenges in bug report management involves localizing the fault in source code based on the information provided in bug reports. This task is particularly complicated due to the incomplete or incorrect information in these reports. Researchers have attempted to automate the retrieval and ranking of relevant buggy files or code snippets using bug reports as queries. Although many researchers consider bug localization as a text-retrieval-based (TR) problem, there exists a noticeable semantic gap between the contents of bug reports and the source code written by developers. Researchers have explored various strategies to bridge this gap, such as processing bug reports or source code, or reformulating queries by incorporating information from diverse sources, including execution information, code dependencies, and historical data.

Our study explores leveraging graphical user interfaces (GUIs) in bug localization, which no prior research has thoroughly investigated. GUI information is readily obtainable and encapsulates the latent features of an application, manifested in pixel-based (i.e., screenshots) and metadata-based (i.e., html/uiautomator) information. Our objective is to utilize GUI information to boost the ranking of the files and also utilize it in query reformulation. We posit that analyzing the GUI of the application screen where a bug occurs, along with the one to three preceding screens, can aid in identifying faults in code. We refer to the GUI information on these screens as GUI interaction data. In our research, we specifically utilize three types of GUI interaction data: (1) the Activity and Window information for specific app screens, (2) the GUI components present in the selected app screens, and (3) the GUI components with which the user interacted on the selected app screens during bug reproduction. We believe that GUI interaction data can (i) filter out irrelevant files, (ii) boost relevant files, and (iii) aid in query reformulation.

To assess the effectiveness of GUI in bug localization, we employ four baseline approaches: BugLocator [1], Lucene [2], sentenceBERT [3], and UniXCoder [4]. Our focus is on bug localization within Android apps, specifically for four bug categories: crash, navigation, output, and cosmetic bugs. Our dataset comprises 80 fully localized Android bugs from 39 apps, with associated bug reproduction scenarios and GUI metadata. We compare these baseline TR-based bug localization approaches to 657 different configurations. Our findings reveal that the best-performing configurations of these techniques outperform the baseline approaches, resulting in an improvement in Hits@10 ranging from 13% to 18%. These augmentations imply that more files appear in the top-10 ranking of buggy files. Consequently, our results support the rationale that leveraging GUI information enhances bug localization approaches.

Directory Structure (Source Code)

├── UI Bug Localization
│   ├──  InitialSteps
|   |   ├──  ExtractGUIInformation --> extract necessary GUI information and filenames necessary for augmentation
|   |   ├──  AugmentationCorpus --> copy and paste files into another directory by matching filenames from the previous step
|   |   ├──  Preprocessing --> preprocess queries and source code
|   |   ├──  Preprocessing-BugLocator--> an additional preprocessing required for BugLocator
│   ├──  SentenceBERT --> generate rankings of all configurations for SentenceBERT
│   ├──  UnixCoder --> generate rankings of all configurations for UniXCoder
│   ├──  Lucene --> generate rankings of all configurations for Lucene
│   ├──  BugLocator --> generate rankings of all configurations for BugLocator
│   ├──  ResultComputation --> calculate metrics for all configurations
│   ├──  Results
│   │   ├──  SentenceBERT --> calculated metrics for SentenceBERT
│   │   ├──  UnixCoder -->  calculated metrics for UniXCoder
│   │   ├──  Lucene --> calculated metrics for Lucene
│   │   ├──  BugLocator --> calculated metrics for BugLocator

Experiments (with Docker)

We bundled all the experiments in a docker image so that a user can replicate those easily. The docker image is available in dockerfile and required packages are available in environment.yml. Please follow these steps to replicate the experiments:

Install Docker following the instructions in this link
Download the docker image docker pull junayed21/buglocalization
Run the container docker run -it junayed21/buglocalization

Generate results for each baseline

a. SentenceBERT

cd UI-Bug-Localization-Study/sentenceBERT
./sentencebert-cmnd-all.sh
cd ..

b. UniXCoder

cd UI-Bug-Localization-Study/Unixcoder
./unixcoder-cmnd-all.sh
cd ..

c. Lucene

cd UI-Bug-Localization-Study/Lucene
./lucene-cmnd-all.sh
cd ..

d. BugLocator

cd UI-Bug-Localization-Study/BugLocator
./buglocator-cmnd-all.sh
cd ..

To generate results for a small subset of configurations instead of all configurations, replace <approach-name>-cmnd-all.sh with <approach-name>-cmnd-small.sh

Compute Metrics separately for each baseline
```
cd ResultComputation
python3 results-summary-all.py -a <approach-name>
```
Here approach-name will be BugLocator or Lucene or SentenceBERT or UniXCoder. The results will be saved in Results/<approach-Name>/Metrics.csv.

Experiments (without docker)

The entire experiment has been done on Mac. We recommend using the x86_64 architecture on Mac. However, if a user is using Arm architecture, there is a workaround by running the following command to emulate x86_64:

conda config --env --set subdir osx-64

A user needs to install Anaconda to run the experiments. Most of the experiments are done by running either a shell script or a Python file. To run all the scripts, a user has to update the specific path in the variable data_dir that contains the link for the dataset and package_dir that contains the replication package.

Note: A user can ignore the preprocessing steps and can use the already preprocessed data. However, in that case, the user needs to update preprocessed_code_dir variable with /Users/sagelab/Documents/Projects/BugLocalization/Artifact-ICSE24/GUI-Bug-Localization-Data/BuggyProjects if it exists in each shell script when generating rankings

Setup

Environment Setup

Install the following packages:

conda install python=3.7.6
conda install bs4=4.11.1
conda install anaconda::nltk
conda install pandas=1.3.5

Install JDK 11
Install Apache Maven using the following command:

conda install -c conda-forge maven=3.9.6

Clone the following Repos:
- appcore
- text-analyzer
Go to appcore/appcore in the terminal and run the following command:

./gradlew clean testClasses install

Go to text-analyzer/text-analyzer in the terminal and run the following command:

./gradlew clean testClasses install

Go to Preprocessing/lib in the terminal and run the following command:

mvn install:install-file "-Dfile=ir4se-fwk-0.0.2.jar" "-DgroupId=edu.wayne.cs.severe" "-DartifactId=ir4se-fwk" "-Dversion=0.0.2" "-Dpackaging=jar"

Note that the setup consists of two main steps: First, the data from the dataset needs to be pre-processed, then Second, each of the individual BL techniques needs to be configured.

Preprocessing Data

The user has to run the following scripts for preprocessing:

ExtractGUIInformation/filter_files_cmnd.sh : This script will extract necessary GUI information and get all the filenames that are necessary for text-retrieval augmentation methods.
AugmentationCorpus/match_files_from_repo.sh: From the filenames extracted in the previous step, this script will copy and paste all the files into another directory. This step significantly improves running experiments because we have 657 configurations for each baseline.
Preprocessing/run_cmnd.sh: To perform the preprocessing of the queries and source code a user needs to run this shell script. A user needs to perform preprocessing for four types of information by updating content_type variable. This variable should contain specifically four values one by one:

Title: Preprocess Bug Report Titles. Only necessary for BugLocator.
Content: Preprocess Bug Report Contents. Only necessary for BugLocator.
BugReport: Preprocess Bug Reports. It is necessary for all baselines except BugLocator.
Code: Preprocess Source Code. It is necessary for all baselines.

Preprocessing-BugLocator/generate_xml_data_for_buglocator.sh: The preprocessing for BugLocator is different compared to other approaches. A user needs to run this script to generate preprocessed queries for BugLocator.

SentenceBERT

Dependencies

A user needs to run the following commands to perform environment setup.

conda install python=3.7.6
conda install pytorch=1.12.1
conda install transformers=4.24.0
conda install pandas=1.3.5

Run

sentenceBERT/sentencebert-cmnd-all.sh: Run to get rankings of all configurations for SentenceBERT. sentenceBERT/sentencebert-cmnd-small.sh: Run to get rankings of a subset of configurations for SentenceBERT.

UniXCoder

Dependencies

A user needs to run the following commands to perform environment setup.

conda install python=3.7.6
conda install pytorch=1.4.0
conda install transformers=2.1.1
conda install pandas=1.1.5

Run

Unixcoder/unixcoder-cmnd-all.sh: Run to get rankings of all configurations for UniXCoder. Unixcoder/unixcoder-cmnd-small.sh: Run to get rankings of a subset of configurations for UniXCoder.

Lucene

Environment Setup

Install JDK 11
Install Apache Maven using the following command:

conda install -c conda-forge maven=3.9.6

Clone the following Repos:
- appcore
- text-analyzer
Go to appcore/appcore in the terminal and run the following command:

./gradlew clean testClasses install

Go to text-analyzer/text-analyzer in the terminal and run the following command:

./gradlew clean testClasses install

Go to Lucene/lib in the terminal and run the following command:

mvn install:install-file "-Dfile=ir4se-fwk-0.0.2.jar" "-DgroupId=edu.wayne.cs.severe" "-DartifactId=ir4se-fwk" "-Dversion=0.0.2" "-Dpackaging=jar"

Run

Lucene/lucene-cmnd-all.sh: Run to get rankings of all configuartions for Lucene.

Lucene/lucene-cmnd-small.sh: Run to get rankings of a subset of configuartions for Lucene.

BugLocator

Dependencies

A user needs to run the following commands to perform environment setup.

conda install python=3.7.6
conda install bs4=4.11.1
conda install pandas=1.3.5
conda install lxml=4.9.1

Run

BugLocator/buglocator-cmnd-all.sh: Run to get rankings of all configurations for BugLocator. BugLocator/buglocator-cmnd-small.sh: Run to get rankings of a subset of configurations for BugLocator.

Usage

Install the following packages:

conda install pandas=1.3.5

Run

ResultComputation/results-summary-all.py: Running the previous baselines will provide ranks of the buggy files. To calculate metrics for all configurations, the user needs to run the follwing command after updating approach-name with one of the following baseline names: BugLocator or Lucene or SentenceBERT or UniXCoder.

python3 results-summary-all.py -a <approach-name>

ResultComputation/results-summary-small.py: To calculate metrics for a subset of configurations, the user needs to run the follwing command after updating approach-name with one of the following baseline names: BugLocator or Lucene or SentenceBERT or UniXCoder.

python3 results-summary-small.py -a <approach-name>

The results will be saved in Results/<approach-Name>/Metrics.csv.

References

Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports. In ICSE’12. 14–24.
Apache Lucene - https://lucene.apache.org (2023).
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. EMNLP’19 (2019).
Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. ACL’22 (2022).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replication Package for "On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization"

Purpose

Provenance

Paper Overview

Directory Structure (Source Code)

Experiments (with Docker)

Experiments (without docker)

Setup

Environment Setup

Preprocessing Data

SentenceBERT

Dependencies

Run

UniXCoder

Dependencies

Run

Lucene

Environment Setup

Run

BugLocator

Dependencies

Run

Usage

Run

References

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
BugLocator		BugLocator
InitialSteps		InitialSteps
Lucene		Lucene
ResultComputation		ResultComputation
Results		Results
Unixcoder		Unixcoder
sentenceBERT		sentenceBERT
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dockerfile		dockerfile
environment.yml		environment.yml

License

SageSELab/UI-Bug-Localization-Study

Folders and files

Latest commit

History

Repository files navigation

Replication Package for "On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization"

Purpose

Provenance

Paper Overview

Directory Structure (Source Code)

Experiments (with Docker)

Experiments (without docker)

Setup

Environment Setup

Preprocessing Data

SentenceBERT

Dependencies

Run

UniXCoder

Dependencies

Run

Lucene

Environment Setup

Run

BugLocator

Dependencies

Run

Usage

Run

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages