Deep combination of radar with optical data for gesture recognition: role of attention in fusion architectures
Article
Towakel, P., Windridge, D. and Nguyen, H. 2023. Deep combination of radar with optical data for gesture recognition: role of attention in fusion architectures. IEEE Transactions on Instrumentation and Measurement. 72, pp. 1-15. https://doi.org/10.1109/TIM.2023.3307768
Type | Article |
---|---|
Title | Deep combination of radar with optical data for gesture recognition: role of attention in fusion architectures |
Authors | Towakel, P., Windridge, D. and Nguyen, H. |
Abstract | Multimodal time series classification is an important aspect of human gesture recognition, in which limitations of individual sensors can be overcome by combining data from multiple modalities. In a deep learning pipeline, the attention mechanism further allows for a selective, contextual concentration on relevant features. However, while the standard attention mechanism is an effective tool when working with Natural Language Processing (NLP), it is not ideal when working with temporally- or spatially-sparse multi-modal data. In this paper, we present a novel attention mechanism, Multi-Modal Attention Preconditioning (MMAP). We first demonstrate that MMAP outperforms regular attention for the task of classification of modalities involving temporal and spatial sparsity and secondly investigate the impact of attention in the fusion of radar and optical data for gesture recognition via three specific modalities: dense spatiotemporal optical data, spatially sparse/temporally dense kinematic data, and sparse spatiotemporal radar data. We explore the effect of attention on early, intermediate, and late fusion architectures and compare eight different pipelines in terms of accuracy and their ability to preserve detection accuracy when modalities are missing. Results highlight fundamental differences between late and intermediate attention mechanisms in respect to the fusion of radar and optical data. |
Keywords | Attention mechanism; deep learning; gesture recognition; multimodality; radar combination |
Sustainable Development Goals | 9 Industry, innovation and infrastructure |
Middlesex University Theme | Sustainability |
Research Group | London Digital Twin Research Centre |
Publisher | IEEE |
Journal | IEEE Transactions on Instrumentation and Measurement |
ISSN | 0018-9456 |
Electronic | 1557-9662 |
Publication dates | |
Online | 23 Aug 2023 |
07 Sep 2023 | |
Publication process dates | |
Submitted | 10 May 2023 |
Accepted | 29 Jul 2023 |
Deposited | 21 Sep 2023 |
Output status | Published |
Accepted author manuscript | |
Copyright Statement | © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Digital Object Identifier (DOI) | https://doi.org/10.1109/TIM.2023.3307768 |
Web of Science identifier | WOS:001067706000011 |
Language | English |
https://repository.mdx.ac.uk/item/q170q
Download files
139
total views49
total downloads7
views this month3
downloads this month