Endoscopic image classification using vision transformers
Conference paper
Bissoonauth-Daiboo, P., Heenaye-Mamode Khan, M., Auzine, M., Gao, X., Baichoo, S. and Heetun, Z. 2023. Endoscopic image classification using vision transformers. 7th International Conference on Advances in Artificial Intelligence. Istanbul, Turkey 13 - 15 Oct 2023 Association for Computing Machinery (ACM). pp. 128–132 https://doi.org/10.1145/3633598.3633623
Type | Conference paper |
---|---|
Title | Endoscopic image classification using vision transformers |
Authors | Bissoonauth-Daiboo, P., Heenaye-Mamode Khan, M., Auzine, M., Gao, X., Baichoo, S. and Heetun, Z. |
Abstract | Convolutional Neural Networks (CNNs) have been the state-of-the-art techniques applied in the field of medical imaging for numerous image processing tasks. Recently, vision transformer networks are emerging as another technique, complementing current CNNs in the medical field providing on-par performance while also having a number of unique characteristics that may be useful for medical image processing. While CNNs have been predominantly applied to artefact detection and classification in endoscopic images, ViT has been sparsely applied in this area. Additionally, both CNN and ViT have been sparingly applied to colour misalignment artefact classification. In this work, we, therefore, explore the application of Vision Transformer (ViT) in the classification of artefacts in endoscopic images of the gastrointestinal tract organs. Furthermore, the performance of ViT is compared to that of CNN in the classification of colour misalignment artefacts. Our customised ViT model, based on DeiT (Data-efficient image Transformers), has obtained an accuracy of 96.33% as compared to the CNN based Inceptionv3 model with an accuracy of 78.67% and InceptionResNetv2 with 76.67%. The results demonstrate that when pretrained on ImageNet, ViT offer better performance than CNNs in colour misalignment artefact classification. This is due to the ability of ViT to better depict the relationship between image pixels through self-attention weights. Moreover, the built-in self-attention mechanism offers fresh insight into the decision-making processes of the model. |
Sustainable Development Goals | 3 Good health and well-being |
Middlesex University Theme | Health & Wellbeing |
Conference | 7th International Conference on Advances in Artificial Intelligence |
Page range | 128–132 |
Proceedings Title | ICAAI '23: Proceedings of the 2023 7th International Conference on Advances in Artificial Intelligence |
Series | ACM International Conference Proceeding Series |
ISBN | 9798400708985 |
Publisher | Association for Computing Machinery (ACM) |
Publication dates | |
Online | 13 Oct 2023 |
Publication process dates | |
Accepted | 10 Aug 2023 |
Deposited | 21 Mar 2025 |
Output status | Published |
Publisher's version | File Access Level Restricted |
Digital Object Identifier (DOI) | https://doi.org/10.1145/3633598.3633623 |
Scopus EID | 2-s2.0-85184103093 |
Language | English |
https://repository.mdx.ac.uk/item/124y1w
Restricted files
Publisher's version
4
total views2
total downloads4
views this month2
downloads this month