IJIRST (International Journal for Innovative Research in Science & Technology)ISSN (online) : 2349-6010

 International Journal for Innovative Research in Science & Technology

Bug Report Triaging Using Textual, Categorical and Contextual Features Using Latent Dirichlet Allocation


Print Email Cite
International Journal for Innovative Research in Science & Technology
Volume 1 Issue - 9
Year of Publication : 2015
Authors : Anuradha Sharma ; Sachin Sharma

BibTeX:

@article{IJIRSTV1I9033,
     title={Bug Report Triaging Using Textual, Categorical and Contextual Features Using Latent Dirichlet Allocation},
     author={Anuradha Sharma and Sachin Sharma},
     journal={International Journal for Innovative Research in Science & Technology},
     volume={1},
     number={9},
     pages={85--96},
     year={},
     url={http://www.ijirst.org/articles/IJIRSTV1I9033.pdf},
     publisher={IJIRST (International Journal for Innovative Research in Science & Technology)},
}



Abstract:

Software Bugs occur for a wide range of reasons. Bug reports can be generated automatically or drafted by user of software. Bug reports can also go with other malfunctions of the software, mostly for the beta or unsteady versions of the software. Most often, these bug reports are improved with user contributed experiences as to know what in fact faced by him/her. Addressing these bugs accounts for the majority of effort spent in the maintenance phase of a software project life cycle. Most often, several bug reports, sent by different users, match up to the same defect. Nevertheless, every bug report is to be analyzed separately and carefully for the possibility of a potential bug. The person responsible for processing the newly reported bugs, checking for duplicates and passing them to suitable developers to get fixed is called a Triager and this process is called Triaging. The utility of bug tracking systems is hindered by a large number of duplicate bug reports. In many open source software projects, as many as one third of all reports are duplicates. This identification of duplicacy in bug reports is time-taking and adds to the already high cost of software maintenance. In this dissertation, a model of automated triaging process is proposed based on textual, categorical and contextual similarity features. The contribution of this dissertation is twofold. In the proposed scheme a total of 80 textual features are extracted from the bug reports. Moreover, topics are modeled from the complete set of text corpus using Latent Dirichlet Allocation (LDA). These topics are specific to the category, class or functionality of the software. For e.g., possible list of topics for android bug repository might be Bluetooth, Download, Network etc. Bug reports are analyzed for context, to relate them to the domain specific topics of the software, thereby; enhancing the feature set which is used for tabulating similarity score. Finally, two sets are made for duplicates and non-duplicate bug reports for binary classification using Support Vector Machine. Simulation is performed over a dataset of Bugzilla. The proposed system improves the efficiency of duplicacy checking by 15 % as compared to the contextual model proposed by Anahita Alipour et.al. The system is able to reduce development cost by improvising the duplicity checking while allowing at least one bug report for each real defect to reach developers.


Keywords:

Bug Reports, Textual Features, Contextual Features, Automated Triaging, Support Vector Machines, Classification


Download Article