Exploring vision transformers and explainable AI for enhanced artefact classification in esophageal endoscopic images
Article
Bissoonauth-Daiboo, P., Auzine, M.M., Inshal, M., Shannaq, F., Saba, T. and Gao, X. 2025. Exploring vision transformers and explainable AI for enhanced artefact classification in esophageal endoscopic images. IEEE Access. 13, pp. 176221-176244. https://doi.org/10.1109/ACCESS.2025.3616796
| Type | Article |
|---|---|
| Title | Exploring vision transformers and explainable AI for enhanced artefact classification in esophageal endoscopic images |
| Authors | Bissoonauth-Daiboo, P., Auzine, M.M., Inshal, M., Shannaq, F., Saba, T. and Gao, X. |
| Abstract | Esophageal cancer (EC) remains the disease that has the highest incidence and highest mortality rate in global cancer statistics, emphasising the imperative to enhance diagnostic precision and reliability through the use of advancing technologies. While AI-enhanced systems can improve the early detection of EC considerably, the prevalence of artefacts (1 in 4 frames) during endoscopy procedures compromises the developed systems significantly, leading to unreliable medical decision making. Vision transformer (ViT) networks, initially designed for natural language processing tasks, have demonstrated outstanding performance in handling medical images by presenting distinctive features advantageous for image processing. The application of ViT for detecting and classifying artefacts in endoscopic images, particularly in classifying colour misalignment artefacts is still subject to continual refinement and enhancement. This work aims to investigate the implementation of ViT for classification of colour misalignment artefacts in esophagus endoscopy images. Moreover, even though ViT has been a major breakthrough, its acceptance for real world applications is often jeopardised due to the lack of interpretability of how the classification results have been reached. Consequently, Explainable Artificial Intelligence (XAI) techniques have been explored to understand the criteria used to achieve the outcome. Several variants of the ViT and Data Efficient image Transformer (DeiT) networks have been fine-tuned and applied to our dataset in order to improve and evaluate their performance in colour misalignment classification in esophagus endoscopic images. Furthermore, XAI methods have been implemented to provide the criteria used by the network in reaching the classification results. Our fine-tuned ViT model, achieves an accuracy of 93.46%, precision of 93.48%, recall of 93.46% and F1 score of 93.46% surpassing InceptionResNetV2, a state-of-the-art model based on CNN, with an accuracy of 89.10%, precision of 89.10%, recall of 89.10% and F1 score of 88.23%. Additionally, the GradCAM XAI technique has been found to highlight the deterministic features used by the ViT model better than other XAI methods applied in this work. ViT achieves remarkable performance in classification of colour misalignment artefact outperforming CNNs, attributed to ViT’s enhanced ability to capture pixel relationships through self-attention weights. In addition, the intrinsic self-attention technique provides novel insights into the model’s decision-making mechanism. |
| Keywords | Endoscopic Artefact; Vision Transformer; Explainable AI; Healthcare |
| Sustainable Development Goals | 3 Good health and well-being |
| Middlesex University Theme | Health & Wellbeing |
| Research Group | Artificial Intelligence group |
| Publisher | IEEE |
| Journal | IEEE Access |
| ISSN | |
| Electronic | 2169-3536 |
| Publication dates | |
| Online | 06 Oct 2025 |
| 16 Oct 2025 | |
| Publication process dates | |
| Accepted | Oct 2025 |
| Deposited | 09 Oct 2025 |
| Output status | Published |
| Publisher's version | License File Access Level Open |
| Copyright Statement | This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ |
| Digital Object Identifier (DOI) | https://doi.org/10.1109/ACCESS.2025.3616796 |
| Web of Science identifier | WOS:001600088200012 |
| Language | English |
https://repository.mdx.ac.uk/item/2wqvq7
Download files
Publisher's version
60
total views12
total downloads20
views this month2
downloads this month