CIMeC - Centro interdipartimentale Mente/Cervello

Language and Vision (LaVi) | People | Publications

Language and Vision (LaVi)

Language and Vision are two fundamental modalities through which human beings acquire knowledge about the world. We see and speak about things and events around us, and by doing so, we learn properties and relations about objects. These two modalities are quite interdipendent and we constantly mix information we acquire through them. However, computational models of language and vision have been developing separately and the two research comunities have for a long time been unaware of each other's work. Interestingly, through these parallel research lines, they have developed highly compatible representations of words and images, respectively.

The importance of developing computational models of language and vision together has been highligted by philosophers and cognitive scientists since the birth of the Aritificial Intelligence paradigm. Only recently, however, the challenge has been empirically taken up by computational linguists and computer vision researchers.

In the last two decades, the availability of large amounts of text on the web has led to tremendous improvements in NLP research. Sophisticated textual search engines are now well consolidated and part of everybody's daily life. Images are the natural next challenge of the digital society. The combination of language and vision is the winning horse for this new era.

The UniTrento researchers are at the fronting edges of this new challenge. Driven by theoretical questions, we look at applications as the test bed for our models. The focus so far has been on the investigation of multimodal models combining linguistic and visual vector representations; cross-modal mapping from visual to language space; enhancement of visual recognizers through language models. The recent results on LaVI at UniTrento have profitted on the close collaboartion with the team of the ERC project COMPOSES and of MHUG Research Group.

We are part of the Cost Action The European Network on Integrating Vision and Language

People

CURRENT STAFF-STUDENTS

  1. Raffaella Bernardi (Language-vision area coordinator, senior researcher)
  2. Marco Baroni (senior researcher, on leave -- at Facebook)
  3. Aurelie Herbelot (senior researcher)
  4. Sandro Pezzelle (PhD student)
  5. Ravi Shekhar (PhD student)
  6. Ionut-Teodor Sorodoc, research assistant
  7. Yauhen Klimovich (EM LCT student)

FORMER STAFF-STUDENTS

  1. Gemma Boleda (senior researcher)
  2. Angeliki Lazaridou (former PhD student)
  3. Elia Bruni (former PhD)
  4. Le Diu Thu (former PhD)
  5. Giovanni Cassani (former MSc student)
  6. Dat Tien Nguyen (EM LCT student)
  7. Laura Bostan (EM LCT student)

Publications

  • E. Bruni, J. Uijlings, M. Baroni and N. Sebe. (2012) Distributional semantics with eyes: Using image analysis to improve computational representations of word meaning. Brave New Idea paper. Proceedings of MM 12 (20th ACM International Conference on Multimedia), New York NY: ACM, 1219-1228.
  • E. Bruni, G. Boleda, M. Baroni and N. Tran (2012) Distributional semantics in technicolor. Proceedings of ACL 2012 (50th Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 136-145. The data sets from this study.
  • E. Bruni, G.B. Tran and M. Baroni (2011) Distributional semantics from text and images. Proceedings of the EMNLP 2011 Geometrical Models for Natural Language Semantics (GEMS). Workshop, East Stroudsburg PA: ACL, 22-32.