Year
2019
Season
Spring
Paper Type
Master's Thesis
College
College of Computing, Engineering & Construction
Degree Name
Master of Science in Computer and Information Sciences (MS)
Department
Computing
NACO controlled Corporate Body
University of North Florida. School of Computing
First Advisor
Dr. Sandeep Reddivari
Second Advisor
Dr. Sanjay P. Ahuja
Rights Statement
http://rightsstatements.org/vocab/InC/1.0/
Third Advisor
Dr. Karthikeyan Umapathy
Department Chair
Dr. Sherif Elfayoumy
College Dean
Dr. William F. Klostermeyer
Abstract
In this thesis work, the potential benefits of Latent Dirichlet Allocation (LDA) as a technique for code clone detection has been described. The objective is to propose a language-independent, effective, and scalable approach for identifying similar code fragments in relatively large software systems. The main assumption is that the latent topic structure of software artifacts gives an indication of the presence of code clones. It can be hypothesized that artifacts with similar topic distributions contain duplicated code fragments and to prove this hypothesis, an experimental investigation using multiple datasets from various application domains were conducted. In addition, CloneTM, an LDA-based working prototype for code clone detection was developed. Results showed that, if calibrated properly, topic modeling can deliver a satisfactory performance in capturing different types of code clones, showing particularity good performance in detecting Type III clones. CloneTM also achieved levels of performance comparable to already existing practical tools that adopt different clone detection strategies.
Suggested Citation
Khan, Mohammed Salman, "A Topic Modeling approach for Code Clone Detection" (2019). UNF Graduate Theses and Dissertations. 874.
https://digitalcommons.unf.edu/etd/874