Towards real-time detection of squamous pre-cancers from oesophageal endoscopic videos
Conference paper
Gao, X., Braden, B., Taylor, S. and Pang, W. 2019. Towards real-time detection of squamous pre-cancers from oesophageal endoscopic videos. ICMLA 2019. Boca Raton, Florida, USA 16 - 19 Dec 2019 IEEE. pp. 1606-1612 https://doi.org/10.1109/ICMLA.2019.00264
Type | Conference paper |
---|---|
Title | Towards real-time detection of squamous pre-cancers from oesophageal endoscopic videos |
Authors | Gao, X., Braden, B., Taylor, S. and Pang, W. |
Abstract | This study investigates the feasibility of applying state of the art deep learning techniques to detect precancerous stages of squamous cell carcinoma (SCC) cancer in real time to address the challenges while diagnosing SCC with subtle appearance changes as well as video processing speed. Two deep learning models are implemented, which are to determine artefact of video frames and to detect, segment and classify those no-artefact frames respectively. For detection of SCC, both mask-RCNN and YOLOv3 architectures are implemented. In addition, in order to ascertain one bounding box being detected for one region of interest instead of multiple duplicated boxes, a faster non-maxima suppression technique (NMS) is applied on top of predictions. As a result, this developed system can process videos at 16-20 frames per second. Three classes are classified, which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. With the resolution of 1920x1080 pixels of videos, the average processing time while apply YOLOv3 is in the range of 0.064-0.101 seconds per frame, i.e. 10-15 frames per second, while running under Windows 10 operating system with 1 GPU (GeForce GTX 1060). The averaged accuracies for classification and detection are 85% and 74% respectively. Since YOLOv3 only provides bounding boxes, to delineate lesioned regions, mask-RCNN is also evaluated. While better detection result is achieved with 77% accuracy, the classification accuracy is similar to that by YOLOYv3 with 84%. However, the processing speed is more than 10 times slower with an average of 1.2 second per frame due to creation of masks. The accuracy of segmentation by mask-RCNN is 63%. These results are based on the date sets of 350 images. Further improvement is hence in need in the future by collecting, annotating or augmenting more datasets. |
Keywords | Videos; cancer; machine learning; image color analysis, real-time systems, Proposals, image segmentation, oesophagus endoscopy, pre-cancer detection, deep learning, real-time video processing, segmentation |
Conference | ICMLA 2019 |
Page range | 1606-1612 |
Proceedings Title | Proceedings: 2019 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019 |
ISBN | |
Electronic | 9781728145501 |
Publisher | IEEE |
Publication dates | |
16 Dec 2019 | |
Online | 17 Feb 2020 |
Publication process dates | |
Deposited | 18 Oct 2019 |
Accepted | 07 Oct 2019 |
Output status | Published |
Accepted author manuscript | |
Copyright Statement | © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Digital Object Identifier (DOI) | https://doi.org/10.1109/ICMLA.2019.00264 |
Scopus EID | 2-s2.0-85080892318 |
Web address (URL) of conference proceedings | https://ieeexplore.ieee.org/xpl/conhome/8974348/proceeding |
Related Output | |
Has metadata | http://www.scopus.com/inward/record.url?eid=2-s2.0-85080892318&partnerID=MN8TOARS |
Language | English |
https://repository.mdx.ac.uk/item/8888x
Download files
61
total views8
total downloads1
views this month1
downloads this month