Labelled vulnerability dataset on Android source code (LVDAndro) to develop AI-based code vulnerability detection models

Conference paper


Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Piras, L. and Petrovski, A. 2023. Labelled vulnerability dataset on Android source code (LVDAndro) to develop AI-based code vulnerability detection models. Vimercati, S. and Samarati, P. (ed.) International Conference on Security and Cryptography (SECRYPT) 2023. Rome, Italy 10 - 12 Jul 2023 SCITEPRESS - Science and Technology Publications. pp. 659-666 https://doi.org/10.5220/0012060400003555
TypeConference paper
TitleLabelled vulnerability dataset on Android source code (LVDAndro) to develop AI-based code vulnerability detection models
AuthorsSenanayake, J., Kalutarage, H., Al-Kadri, M.O., Piras, L. and Petrovski, A.
Abstract

Ensuring the security of Android applications is a vital and intricate aspect requiring careful consideration during development. Unfortunately, many apps are published without sufficient security measures, possibly due to a lack of early vulnerability identification. One possible solution is to employ machine learning models trained on a labelled dataset, but currently, available datasets are suboptimal. This study creates a sequence of datasets of Android source code vulnerabilities, named LVDAndro, labelled based on Common Weakness Enumeration (CWE). Three datasets were generated through app scanning by altering the number of apps and their sources. The LVDAndro, includes over 2,000,000 unique code samples, obtained by scanning over 15,000 apps. The AutoML technique was then applied to each dataset, as a proof of concept to evaluate the applicability of LVDAndro, in detecting vulnerable source code using machine learning. The AutoML model, trained on the dataset, achieved accuracy of 94% and F1-Score of 0.94 in binary classification, and accuracy of 94% and F1-Score of 0.93 in CWE-based multi-class classification. The LVDAndro dataset is publicly available, and continues to expand as more apps are scanned and added to the dataset regularly. The LVDAndro GitHub Repository also includes the source code for dataset generation, and model training.

KeywordsAndroid Application Security; Code Vulnerability; Labelled Dataset; Artificial Intelligence; Auto Machine Learning.
Sustainable Development Goals9 Industry, innovation and infrastructure
Middlesex University ThemeCreativity, Culture & Enterprise
Research GroupSoftware Engineering, Theory & Algorithms (SETA)
ConferenceInternational Conference on Security and Cryptography (SECRYPT) 2023
Page range659-666
Proceedings TitleProceedings of the 20th International Conference on Security and Cryptography, SECRYPT - Volume 1
SeriesSECRYPT
EditorsVimercati, S. and Samarati, P.
ISSN2184-7711
ISBN9789897586668
PublisherSCITEPRESS - Science and Technology Publications
Publication dates
Print10 Jul 2023
Publication process dates
Accepted23 Apr 2023
Deposited18 Jul 2023
Output statusPublished
Publisher's version
License
File Access Level
Open
Copyright Statement

Senanayake, J., Kalutarage, H., Al-Kadri, M., Piras, L. and Petrovski, A., Labelled Vulnerability Dataset on Android Source Code (LVDAndro) to Develop AI-Based Code Vulnerability Detection Models. DOI: 10.5220/0012060400003555
In Proceedings of the 20th International Conference on Security and Cryptography (SECRYPT 2023), pages 659-666
ISBN: 978-989-758-666-8; ISSN: 2184-7711
Copyright © 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

Digital Object Identifier (DOI)https://doi.org/10.5220/0012060400003555
Web of Science identifierWOS:001072829100063
Web address (URL) of conference proceedingshttps://doi.org/10.5220/0000167900003555
LanguageEnglish
Permalink -

https://repository.mdx.ac.uk/item/8q739

Download files


Publisher's version
SECRYPT-23_Labelled_LVDAndro.pdf
License: CC BY-NC-ND 4.0
File access level: Open

  • 126
    total views
  • 9
    total downloads
  • 2
    views this month
  • 1
    downloads this month

Export as

Related outputs

Assuring privacy of AI-powered community driven Android code vulnerability detection
Senanayake, J., Kalutarage, H., Piras, L., Al-Kadri, M.O. and Petrovski, A. 2024. Assuring privacy of AI-powered community driven Android code vulnerability detection. 3rd International Workshop on System Security Assurance. Bydgoszcz, Poland 19 - 20 Sep 2024 Springer.
Formalizing federated learning and differential privacy for GIS systems in IIIf
Kammueller, F., Piras, L., Fields, B. and Nagarajan, R. 2024. Formalizing federated learning and differential privacy for GIS systems in IIIf. 3rd International Workshop on System Security Assurance. Bydgoszcz, Poland 19 - 20 Sep 2024 Springer.
Model-based gamification design with Web-Agon: an automated analysis tool for gamification
Zaw, H., Piras, L., Calabrese, F. and Al-Obeidallah, M. 2024. Model-based gamification design with Web-Agon: an automated analysis tool for gamification. 50th Euromicro Conference Series on Software Engineering and Advanced Applications. Paris, France 28 - 30 Aug 2024 IEEE.
Defendroid: real-time Android code vulnerability detection via blockchain federated neural network with XAI
Senanayake, L., Kalutarage, H., Petrovski, A., Piras, L. and Al-Kadri, M. 2024. Defendroid: real-time Android code vulnerability detection via blockchain federated neural network with XAI. Journal of Information Security and Applications. 82. https://doi.org/10.1016/j.jisa.2024.103741
Gamification of E-Learning apps via acceptance requirements analysis
Calabrese, L., Piras, L., Al-Obeidallah, M., Egbikuadje, B. and Alkubaisy, D. 2024. Gamification of E-Learning apps via acceptance requirements analysis. 19th International Conference on Evaluation of Novel Approaches to Software Engineering. Angers, France 28 - 29 Apr 2024 SCITEPRESS - Science and Technology Publications. pp. 291-298 https://doi.org/10.5220/0012550400003687
FedREVAN: real-time detection of vulnerable Android source code through federated neural network with XAI
Senanayake, J., Kalutarage, H., Petrovski, A., Al-Kadri, M.O. and Piras, L. 2024. FedREVAN: real-time detection of vulnerable Android source code through federated neural network with XAI. ESORICS Workshop on Attacks and Software Protection (WASP). The Hague, The Netherlands 25 - 29 Sep 2023 Springer. pp. 426-441 https://doi.org/10.1007/978-3-031-54129-2_25
Android code vulnerabilities early detection using AI-powered ACVED plugin
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A. and Piras, L. 2023. Android code vulnerabilities early detection using AI-powered ACVED plugin. Atluri, V. and Ferrara, A. (ed.) 37th Annual IFIP WG 11.3 Conference (DBSec 2023). Sophia-Antipolis, France 19 - 21 Jul 2023 Cham, Switzerland. Springer. pp. 339–357 https://doi.org/10.1007/978-3-031-37586-6_20
Goal-modeling privacy-by-design patterns for supporting GDPR compliance
Al-Obeidallah, M., Piras, L., Iloanugo, O., Mouratidis, H., Alkubaisy, D and Dellagiacoma, D. 2023. Goal-modeling privacy-by-design patterns for supporting GDPR compliance. Fill, H.-G., Domínguez-Mayo, F.J., van Sinderen, M. and Maciaszek, L. (ed.) International Conference on Software Technologies (ICSOFT). Rome, Italy 10 - 12 Jul 2023 SCITEPRESS - Science and Technology Publications. pp. 361-368 https://doi.org/10.5220/0012080700003538
Android source code vulnerability detection: a systematic literature review
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A. and Piras, L. 2023. Android source code vulnerability detection: a systematic literature review. ACM Computing Surveys. 55 (9). https://doi.org/10.1145/3556974
A framework for privacy and security requirements analysis and conflict resolution for supporting GDPR compliance through privacy-by-design
Alkubaisy, D., Piras, L., Al-Obeidallah, M., Cox, K. and Mouratidis, H. 2022. A framework for privacy and security requirements analysis and conflict resolution for supporting GDPR compliance through privacy-by-design. Ali, R., Kaindl, H. and Maciaszek, L. (ed.) 16th International Conference on Evaluation of Novel Approaches to Software Engineering. Virtual 26 - 27 Apr 2021 Cham Springer. pp. 67-87 https://doi.org/10.1007/978-3-030-96648-5_4
Developing secured Android applications by mitigating code vulnerabilities with machine learning
Senanayake, J., Kalutarage, H., Al-Kadri, M., Petrovski, A. and Piras, L. 2022. Developing secured Android applications by mitigating code vulnerabilities with machine learning. ACM Asia Conference on Computer and Communications Security (ASIA CCS '22). Nagasaki, Japan 30 May - 03 Jun 2022 Association for Computing Machinery (ACM). pp. 1255–1257 https://doi.org/10.1145/3488932.3527290
Supporting the individuation, analysis and gamification of software components for acceptance requirements fulfilment
Calabrese, F., Piras, L. and Giorgini, P. 2022. Supporting the individuation, analysis and gamification of software components for acceptance requirements fulfilment. Barn, B. and Sandkuhl, K (ed.) IFIP Working Conference on The Practice of Enterprise Modeling. London, UK 23 - 25 Nov 2022 Springer. pp. 33-48 https://doi.org/10.1007/978-3-031-21488-2_3
Confis: a tool for privacy and security analysis and conflict resolution for supporting GDPR compliance through privacy-by-design
Alkubaisy, D., Piras, L., Al-Obeidallah, M., Cox, K. and Mouratidis, H. 2021. Confis: a tool for privacy and security analysis and conflict resolution for supporting GDPR compliance through privacy-by-design. Ali, R., Kaindl, H. and Maciaszek, L. (ed.) 16th International Conference on Evaluation of Novel Approaches to Software Engineering. Virtual 26 - 27 Apr 2021 SCITEPRESS - Science and Technology Publications. pp. 80-91 https://doi.org/10.5220/0010406100800091
Privacy, security, legal and technology acceptance requirements for a GDPR compliance platform
Tsohou, A., Magkos, M., Mouratidis, H., Chrysoloras, G., Piras, L., Pavlidis, M., Debussche, J., Rotoloni, M. and Gallego-Nicasio Crespo, B. 2020. Privacy, security, legal and technology acceptance requirements for a GDPR compliance platform. 2019 International Workshop on Security and Privacy Requirements Engineering. Luxembourg City, Luxembourg 26 - 27 Sep 2019 Springer. https://doi.org/10.1007/978-3-030-42048-2_14
DEFeND DSM: a data scope management service for model-based privacy by design GDPR compliance
Piras, L., Al-Obeidallah, M., Pavlidis, M., Mouratidis, H., Tsohou, A., Magkos, E., Praitano, A., Iodice, A. and Gallego-Nicasio Crespo, B. 2020. DEFeND DSM: a data scope management service for model-based privacy by design GDPR compliance. 17th International Conference on Trust and Privacy in Digital Business. Bratislava, Slovakia 14 - 17 Sep 2020 Springer. https://doi.org/10.1007/978-3-030-58986-8_13
Design thinking and acceptance requirements for designing gamified software
Piras, L., Dellagiacoma, D., Perini, A., Susi, A., Giorgini, P. and Mylopoulos, J. 2019. Design thinking and acceptance requirements for designing gamified software. 13th International Conference on Research Challenges in Information Science. Brussels, Belgium 29 - 31 May 2019 IEEE. pp. 1-12 https://doi.org/10.1109/rcis.2019.8876973
Goal-oriented requirements engineering: an extended systematic mapping study
Horkoff, J., Aydemir, F., Cardoso, E., Li, T., Mate, A., Paja, E., Salnitri, M., Piras, L., Mylopoulos, J. and Giorgini, P. 2019. Goal-oriented requirements engineering: an extended systematic mapping study. Requirements Engineering. 24 (2), pp. 133-160. https://doi.org/10.1007/s00766-017-0280-z
DEFeND architecture: a privacy by design platform for GDPR compliance
Piras, L., Al-Obeidallah, M., Praitano, A., Tsohou, A., Mouratidis, H., Gallego-Nicasio Crespo, B., Bernard, J., Fiorani, M., Magkos, E., Castillo Sanz, A., Pavlidis, M., D'Addario, R. and Zorzino, G. 2019. DEFeND architecture: a privacy by design platform for GDPR compliance. 16th International Conference on Trust, Privacy and Security in Digital Business. Linz, Austria 26 - 29 Aug 2019 Springer. https://doi.org/10.1007/978-3-030-27813-7_6
Goal models for acceptance requirements analysis and gamification design
Piras, L., Paja, E., Giorgini, P. and Mylopoulos, J. 2017. Goal models for acceptance requirements analysis and gamification design. Mayr, H.C., Guizzardi, G., Ma, H. and Pastor, O. (ed.) 36th International Conference on Conceptual Modeling. Valencia, Spain 06 - 09 Nov 2017 Cham Springer. pp. 223-230 https://doi.org/10.1007/978-3-319-69904-2_18
Gamification solutions for software acceptance: a comparative study of requirements engineering and organizational behavior techniques
Piras, L., Paja, E., Giorgini, P., Mylopoulos, J., Cuel, R. and Ponte, D. 2017. Gamification solutions for software acceptance: a comparative study of requirements engineering and organizational behavior techniques. 11th International Conference on Research Challenges in Information Science. Brighton, UK 10 - 12 May 2017 IEEE. pp. 255-265 https://doi.org/10.1109/rcis.2017.7956544
Acceptance requirements and their gamification solutions
Piras, L., Giorgini, P. and Mylopoulos, J. 2016. Acceptance requirements and their gamification solutions. IEEE 24th International Requirements Engineering Conference. Beijing, China 12 - 16 Sep 2016 IEEE. pp. 365-370 https://doi.org/10.1109/RE.2016.43
Using gamification to incentivize sustainable urban mobility
Kazhamiakin, R., Marconi, A., Perillo, M., Pistore, M., Valetto, G., Piras, L., Avesani, F. and Perri, N. 2015. Using gamification to incentivize sustainable urban mobility. IEEE First International Smart Cities Conference. Guadalajara, Mexico 25 - 28 Oct 2015 IEEE. https://doi.org/10.1109/ISC2.2015.7366196
A portable wireless-based architecture for solving minimum digital divide problems
Fenu, G. and Piras, L. 2008. A portable wireless-based architecture for solving minimum digital divide problems. 4th International Conference on Wireless and Mobile Communications. Athens, Greece 27 Jul - 01 Aug 2008 IEEE. pp. 130-136 https://doi.org/10.1109/icwmc.2008.21