CroLSSim: Cross‐language software similarity detector using hybrid approach of LSA‐based AST‐MDrep features and CNN‐LSTM model
Article
Ullah, F., Naeem, M., Naeem, H., Cheng, X. and Alazab, M. 2022. CroLSSim: Cross‐language software similarity detector using hybrid approach of LSA‐based AST‐MDrep features and CNN‐LSTM model. International Journal of Intelligent Systems. 37 (9), pp. 5768-5795. https://doi.org/10.1002/int.22813
Type | Article |
---|---|
Title | CroLSSim: Cross‐language software similarity detector using hybrid approach of LSA‐based AST‐MDrep features and CNN‐LSTM model |
Authors | Ullah, F., Naeem, M., Naeem, H., Cheng, X. and Alazab, M. |
Abstract | Software similarity in different programming codes is a rapidly evolving field because of its numerous applications in software development, software cloning, software plagiarism, and software forensics. Currently, software researchers and developers search cross-language open-source repositories for similar applications for a variety of reasons, such as reusing programming code, analyzing different implementations, and looking for a better application. However, it is a challenging task because each programming language has a unique syntax and semantic structure. In this paper, a novel tool called Cross-Language Software Similarity (CroLSSim) is designed to detect similar software applications written in different programming codes. First, the Abstract Syntax Tree (AST) features are collected from different programming codes. These are high-quality features that can show the abstract view of each program. Then, Methods Description (MDrep) in combination with AST is used to examine the relationship among different method calls. Second, the Term Frequency Inverse Document Frequency approach is used to retrieve the local and global weights from AST-MDrep features. Third, the Latent Semantic Analysis-based features extraction and selection method is proposed to extract the semantic anchors in reduced dimensional space. Fourth, the Convolution Neural Network (CNN)-based features extraction method is proposed to mine the deep features. Finally, a hybrid deep learning model of CNN-Long-Short-Term Memory is designed to detect semantically similar software applications from these latent variables. The data set contains approximately 9.5K Java, 8.8K C#, and 7.4K C++ software applications obtained from GitHub. The proposed approach outperforms as compared with the state-of-the-art methods. |
Keywords | Artificial Intelligence, Human-Computer Interaction, Theoretical Computer Science, Software |
Publisher | Wiley |
Journal | International Journal of Intelligent Systems |
ISSN | 0884-8173 |
Electronic | 1098-111X |
Publication dates | |
Online | 09 Jan 2022 |
30 Jul 2022 | |
Publication process dates | |
Deposited | 20 Jan 2022 |
Accepted | 25 Dec 2021 |
Output status | Published |
Accepted author manuscript | |
Copyright Statement | This is the peer reviewed version of the following article: Ullah, F, Naeem, MR, Naeem, H, Cheng, X, Alazab, M. CroLSSim: Cross-language software similarity detector using hybrid approach of LSA-based AST-MDrep features and CNN-LSTM model. Int J Intell Syst. 2022; 37: 5768- 5795. doi:10.1002/int.22813, which has been published in final form at https://doi.org/10.1002/int.22813. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. This article may not be enhanced, enriched or otherwise transformed into a derivative work, without express permission from Wiley or by statutory rights under applicable legislation. Copyright notices must not be removed, obscured or modified. The article must be linked to Wiley’s version of record on Wiley Online Library and any embedding, framing or otherwise making available the article or pages thereof by third parties from platforms, services and websites other than Wiley Online Library must be prohibited |
Digital Object Identifier (DOI) | https://doi.org/10.1002/int.22813 |
Language | English |
https://repository.mdx.ac.uk/item/89q19
Download files
88
total views38
total downloads0
views this month1
downloads this month