Unsupervised grounding of textual descriptions of object features and actions in video

Conference paper


Alomari, M., Chinellato, E., Gatsoulis, Y., Hogg, D. and Cohn, A. 2016. Unsupervised grounding of textual descriptions of object features and actions in video. 15th International Conference Principles of Knowledge Representation and Reasoning (KR 2016). Cape Town, South Africa 25 - 29 Apr 2016 Association for the Advancement of Artificial Intelligence (AAAI). pp. 505-508
TypeConference paper
TitleUnsupervised grounding of textual descriptions of object features and actions in video
AuthorsAlomari, M., Chinellato, E., Gatsoulis, Y., Hogg, D. and Cohn, A.
Abstract

We propose a novel method for learning visual concepts and their correspondence to the words of a natural language. The concepts and correspondences are jointly inferred from video clips depicting simple actions involving multiple objects, together with corresponding natural language commands that would elicit these actions. Individual objects are first detected, together with quantitative measurements of their colour, shape, location and motion. Visual concepts emerge from the co-occurrence of regions within a measurement space and words of the language. The method is evaluated on a set of videos generated automatically using computer graphics from a database of initial and goal configurations of objects. Each video is annotated with multiple commands in natural language obtained from human annotators using crowd sourcing.

LanguageEnglish
Conference15th International Conference Principles of Knowledge Representation and Reasoning (KR 2016)
Page range505-508
ISBN
Hardcover9781577357551
PublisherAssociation for the Advancement of Artificial Intelligence (AAAI)
Publication dates
Print25 Apr 2016
Publication process dates
Deposited05 May 2016
Accepted21 Jan 2016
Output statusPublished
Accepted author manuscript
Copyright Statement

This is the author's accepted manuscript included in this repository with permission, granted on 16/02/17 by the publisher AAAI. The final published paper appears as: "Alomari, Muhannad, Chinellato, Eris, Gatsoulis, Yiannis, Hogg, David, AND Cohn, Anthony. "Unsupervised Grounding of Textual Descriptions of Object Features and Actions in Video" Knowledge Representation and Reasoning Conference 2016". Published by the Association for the Advancement of Artificial Intelligence (AAAI), available at: http://www.aaai.org/ocs/index.php/KR/KR16/paper/view/12827

Web address (URL)http://www.aaai.org/ocs/index.php/KR/KR16/paper/view/12827/
Book titleProceedings, Fifteenth International Conference on Principles of Knowledge Representation and Reasoning (KR-16)
Permalink -

https://repository.mdx.ac.uk/item/8658y

Download files


Accepted author manuscript
  • 24
    total views
  • 1
    total downloads
  • 0
    views this month
  • 0
    downloads this month

Export as

Related outputs

Affective visuomotor interaction: a functional model for socially competent robot grasping
Chinellato, E., Ferretti, G. and Irving, L. 2019. Affective visuomotor interaction: a functional model for socially competent robot grasping. Martinez-Hernandez, U., Vouloutsi, V., Mura, A., Mangan, M., Minoru, A., Prescott, T. and Verschure, P. (ed.) 8th International Conference, Living Machines 2019. Nara, Japan 09 - 12 Jul 2019 Springer, Cham. pp. 51-62 https://doi.org/10.1007/978-3-030-24741-6_5
The competitive and multi-faceted nature of neural coding in motor imagery: Comment on "Muscleless motor synergies and actions without movements: From motor neuroscience to cognitive robotics" by V. Mohan et al.
Chinellato, E. 2019. The competitive and multi-faceted nature of neural coding in motor imagery: Comment on "Muscleless motor synergies and actions without movements: From motor neuroscience to cognitive robotics" by V. Mohan et al. Physics of life reviews. https://doi.org/10.1016/j.plrev.2019.02.003
Sensorial computing
Varsani, P., Moseley, R., Jones, S., James-Reynolds, C., Chinellato, E. and Augusto, J. 2018. Sensorial computing. in: Filimowicz, M. and Tzankova, V. (ed.) New Directions in Third Wave Human-Computer Interaction: Volume 1 - Technologies Springer. pp. 265-284
Advances in human-computer interactions: methods, algorithms, and applications
Solari, F., Chessa, M., Chinellato, E. and Bresciani, J. 2018. Advances in human-computer interactions: methods, algorithms, and applications. Computational Intelligence and Neuroscience. 2018. https://doi.org/10.1155/2018/4127475
The STRANDS project: long-term autonomy in everyday environments
Hawes, N., Burbridge, C., Jovan, F., Kunze, L., Lacerda, B., Mudrova, L., Young, J., Wyatt, J., Hebesberger, D., Kortner, T., Ambrus, R., Bore, N., Folkesson, J., Jensfelt, P., Beyer, L., Hermans, A., Leibe, B., Aldoma, A., Faulhammer, T., Zillich, M., Vincze, M., Chinellato, E., Al-Omari, M., Duckworth, P., Gatsoulis, Y., Hogg, D., Cohn, A., Dondrup, C., Pulido Fentanes, J., Krajnik, T., Santos, J., Duckett, T. and Hanheide, M. 2017. The STRANDS project: long-term autonomy in everyday environments. IEEE Robotics & Automation Magazine. 24 (3), pp. 146-156. https://doi.org/10.1109/MRA.2016.2636359
Decoding information for grasping from the macaque dorsomedial visual stream
Filippini, M., Breveglieri, R., Akhras, M., Bosco, A., Chinellato, E. and Fattori, P. 2017. Decoding information for grasping from the macaque dorsomedial visual stream. The Journal of Neuroscience. 37 (16), pp. 4311-4322. https://doi.org/10.1523/JNEUROSCI.3077-16.2017
An incremental von mises mixture framework for modelling human activity streaming data
Chinellato, E., Mardia, K., Hogg, D. and Cohn, A. 2017. An incremental von mises mixture framework for modelling human activity streaming data. International Work-Conference on Time Series Analysis (ITISE 2017). Granada, Spain 18 - 20 Sep 2017 pp. 379-389
Feature space analysis for human activity recognition in smart environments
Chinellato, E., Hogg, D. and Cohn, A. 2016. Feature space analysis for human activity recognition in smart environments. 12th International Conference on Intelligent Environments (IE). London, United Kingdom 14 - 16 Sep 2016 Institute of Electrical and Electronics Engineers (IEEE). pp. 194-197 https://doi.org/10.1109/IE.2016.43
A hierarchical system for a distributed representation of the peripersonal space of a humanoid robot
Antonelli, M., Gibaldi, A., Beuth, F., Duran, A., Canessa, A., Chessa, M., Solari, F., Del Pobil, A., Hamker, F., Chinellato, E. and Sabatini, S. 2014. A hierarchical system for a distributed representation of the peripersonal space of a humanoid robot. IEEE Transactions on Autonomous Mental Development. 6 (4), pp. 259-273. https://doi.org/10.1109/TAMD.2014.2332875
Adaptive saccade controller inspired by the primates' cerebellum
Antonelli, M., Duran, A., Chinellato, E. and Del Pobil, A. 2015. Adaptive saccade controller inspired by the primates' cerebellum. IEEE International Conference on Robotics and Automation (ICRA). Seattle, Washington, USA 26 - 30 May 2015 Institute of Electrical and Electronics Engineers (IEEE). pp. 5048-5053 https://doi.org/10.1109/ICRA.2015.7139901
Learning the visual–oculomotor transformation: effects on saccade control and space representation
Antonelli, M., Duran, A., Chinellato, E. and Del Pobil, A. 2015. Learning the visual–oculomotor transformation: effects on saccade control and space representation. Robotics and Autonomous Systems. 71, pp. 13-22. https://doi.org/10.1016/j.robot.2014.11.018
Motor interference in interactive contexts
Chinellato, E., Castiello, U. and Sartori, L. 2015. Motor interference in interactive contexts. Frontiers in Psychology. 6. https://doi.org/10.3389/fpsyg.2015.00791
The multiform motor cortical output: kinematic, predictive and response coding
Sartori, L., Betti, S., Chinellato, E. and Castiello, U. 2015. The multiform motor cortical output: kinematic, predictive and response coding. Cortex. 70, pp. 169-178. https://doi.org/10.1016/j.cortex.2015.01.019
The visual neuroscience of robotic grasping: achieving sensorimotor skills through dorsal-ventral stream integration
Chinellato, E. and Del Pobil, A. 2016. The visual neuroscience of robotic grasping: achieving sensorimotor skills through dorsal-ventral stream integration. Springer.