Labelled vulnerability dataset on Android source code (LVDAndro) to develop AI-based code vulnerability detection models

Conference paper


Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Piras, L. and Petrovski, A. 2023. Labelled vulnerability dataset on Android source code (LVDAndro) to develop AI-based code vulnerability detection models. International Conference on Security and Cryptography (SECRYPT) 2023. Rome, Italy 10 - 12 Jul 2023 Rome (IT) SciTePress. pp. 659-666 https://doi.org/10.5220/0012060400003555
TypeConference paper
TitleLabelled vulnerability dataset on Android source code (LVDAndro) to develop AI-based code vulnerability detection models
AuthorsSenanayake, J., Kalutarage, H., Al-Kadri, M.O., Piras, L. and Petrovski, A.
Abstract

Ensuring the security of Android applications is a vital and intricate aspect requiring careful consideration during development. Unfortunately, many apps are published without sufficient security measures, possibly due to a lack of early vulnerability identification. One possible solution is to employ machine learning models trained on a labelled dataset, but currently, available datasets are suboptimal. This study creates a sequence of datasets of Android source code vulnerabilities, named LVDAndro, labelled based on Common Weakness Enumeration (CWE). Three datasets were generated through app scanning by altering the number of apps and their sources. The LVDAndro, includes over 2,000,000 unique code samples, obtained by scanning over 15,000 apps. The AutoML technique was then applied to each dataset, as a proof of concept to evaluate the applicability of LVDAndro, in detecting vulnerable source code using machine learning. The AutoML model, trained on the dataset, achieved accuracy of 94% and F1-Score of 0.94 in binary classification, and accuracy of 94% and F1-Score of 0.93 in CWE-based multi-class classification. The LVDAndro dataset is publicly available, and continues to expand as more apps are scanned and added to the dataset regularly. The LVDAndro GitHub Repository also includes the source code for dataset generation, and model training.

KeywordsAndroid Application Security; Code Vulnerability; Labelled Dataset; Artificial Intelligence; Auto Machine Learning.
Sustainable Development Goals9 Industry, innovation and infrastructure
Middlesex University ThemeCreativity, Culture & Enterprise
Research GroupSoftware Engineering, Theory & Algorithms (SETA)
LanguageEnglish
ConferenceInternational Conference on Security and Cryptography (SECRYPT) 2023
Page range659-666
Proceedings TitleProceedings of the 20th International Conference on Security and Cryptography, SECRYPT - Volume 1
SeriesSECRYPT
ISSN2184-7711
ISBN9789897586668
PublisherSciTePress
Place of publicationRome (IT)
Publication dates
OnlineJul 2023
Print10 Jul 2023
Publication process dates
Accepted23 Apr 2023
Deposited18 Jul 2023
Output statusPublished
Digital Object Identifier (DOI)https://doi.org/10.5220/0012060400003555
Web of Science identifierWOS:001072829100063
Web address (URL) of conference proceedingshttps://doi.org/10.5220/0000167900003555
File
File Access Level
Restricted
Permalink -

https://repository.mdx.ac.uk/item/8q739

Restricted files

Accepted author manuscript

  • 69
    total views
  • 2
    total downloads
  • 5
    views this month
  • 0
    downloads this month

Export as

Related outputs

Android code vulnerabilities early detection using AI-powered ACVED plugin
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A. and Piras, L. 2023. Android code vulnerabilities early detection using AI-powered ACVED plugin. Atluri, V. and Ferrara, A. (ed.) 37th Annual IFIP WG 11.3 Conference (DBSec 2023). Sophia-Antipolis, France 19 - 21 Jul 2023 Cham Springer. pp. 339–357 https://doi.org/10.1007/978-3-031-37586-6_20
FedREVAN: real-time detection of vulnerable Android source code through federated neural network with XAI
Senanayake, J., Kalutarage, H., Petrovski, A., Al-Kadri, M.O. and Piras, L. 2023. FedREVAN: real-time detection of vulnerable Android source code through federated neural network with XAI. ESORICS Workshop on Attacks and Software Protection (WASP). The Hague, The Netherlands 25 - 29 Sep 2023 Springer.
Goal-modeling privacy-by-design patterns for supporting GDPR compliance
Al-Obeidallah, M., Piras, L., Iloanugo, O., Mouratidis, H., Alkubaisy, D and Dellagiacoma, D. 2023. Goal-modeling privacy-by-design patterns for supporting GDPR compliance. International Conference on Software Technologies (ICSOFT). Rome (Italy) 10 - 12 Jul 2023 Rome (IT) SciTePress. https://doi.org/10.5220/0012080700003538
Android source code vulnerability detection: a systematic literature review
Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A. and Piras, L. 2023. Android source code vulnerability detection: a systematic literature review. ACM Computing Surveys. 55 (9). https://doi.org/10.1145/3556974
Supporting the individuation, analysis and gamification of software components for acceptance requirements fulfilment
Calabrese, F., Piras, L. and Giorgini, P. 2022. Supporting the individuation, analysis and gamification of software components for acceptance requirements fulfilment. Barn, B. and Sandkuhl, K (ed.) IFIP Working Conference on The Practice of Enterprise Modeling. London 23 - 25 Nov 2022 Springer. pp. 33-48 https://doi.org/10.1007/978-3-031-21488-2_3
Goal models for acceptance requirements analysis and gamification design
Piras, L., Paja, E., Giorgini, P. and Mylopoulos, J. 2017. Goal models for acceptance requirements analysis and gamification design. Mayr, H.C., Guizzardi, G., Ma, H. and Pastor, O. (ed.) 36th International Conference on Conceptual Modeling. Valencia 2017 Cham Springer. pp. 223-230 https://doi.org/10.1007/978-3-319-69904-2_18
Using gamification to incentivize sustainable urban mobility
Kazhamiakin, Raman, Marconi, Annapaola, Perillo, Mirko, Pistore, Marco, Valetto, Giuseppe, Piras, Luca, Avesani, Francesco and Perri, Nicola 2015. Using gamification to incentivize sustainable urban mobility. IEEE International Smart Cities Conference. IEEE. https://doi.org/10.1109/ISC2.2015.7366196