Unsupervised grounding of textual descriptions of object features and actions in video
Conference paper
Alomari, M., Chinellato, E., Gatsoulis, Y., Hogg, D. and Cohn, A. 2016. Unsupervised grounding of textual descriptions of object features and actions in video. 15th International Conference Principles of Knowledge Representation and Reasoning (KR 2016). Cape Town, South Africa 25 - 29 Apr 2016 Association for the Advancement of Artificial Intelligence (AAAI). pp. 505-508
Type | Conference paper |
---|---|
Title | Unsupervised grounding of textual descriptions of object features and actions in video |
Authors | Alomari, M., Chinellato, E., Gatsoulis, Y., Hogg, D. and Cohn, A. |
Abstract | We propose a novel method for learning visual concepts and their correspondence to the words of a natural language. The concepts and correspondences are jointly inferred from video clips depicting simple actions involving multiple objects, together with corresponding natural language commands that would elicit these actions. Individual objects are first detected, together with quantitative measurements of their colour, shape, location and motion. Visual concepts emerge from the co-occurrence of regions within a measurement space and words of the language. The method is evaluated on a set of videos generated automatically using computer graphics from a database of initial and goal configurations of objects. Each video is annotated with multiple commands in natural language obtained from human annotators using crowd sourcing. |
Conference | 15th International Conference Principles of Knowledge Representation and Reasoning (KR 2016) |
Page range | 505-508 |
ISBN | |
Hardcover | 9781577357551 |
Publisher | Association for the Advancement of Artificial Intelligence (AAAI) |
Publication dates | |
25 Apr 2016 | |
Publication process dates | |
Deposited | 05 May 2016 |
Accepted | 21 Jan 2016 |
Output status | Published |
Accepted author manuscript | |
Copyright Statement | This is the author's accepted manuscript included in this repository with permission, granted on 16/02/17 by the publisher AAAI. The final published paper appears as: "Alomari, Muhannad, Chinellato, Eris, Gatsoulis, Yiannis, Hogg, David, AND Cohn, Anthony. "Unsupervised Grounding of Textual Descriptions of Object Features and Actions in Video" Knowledge Representation and Reasoning Conference 2016". Published by the Association for the Advancement of Artificial Intelligence (AAAI), available at: http://www.aaai.org/ocs/index.php/KR/KR16/paper/view/12827 |
Web address (URL) | http://www.aaai.org/ocs/index.php/KR/KR16/paper/view/12827/ |
Language | English |
Book title | Proceedings, Fifteenth International Conference on Principles of Knowledge Representation and Reasoning (KR-16) |
https://repository.mdx.ac.uk/item/8658y
Download files
51
total views8
total downloads2
views this month2
downloads this month