Skip to main content
  • People
  • Esse3
  • Webmail
  • Sitemap
  • Pin
  • Know
  • Live
  • Discover
    • Job Guidance
    • Academic calendar
    • Research
    • Visit us
Logo Unitn
Quick links
MyUnitn
MyUnitn
Quick links
UNIVERSITY CENTRE
CIMeC - Center for Mind/Brain Sciences
  • The Center
    • Organization
    • Development Plan
    • Open Science
    • Strategic Projects
    • Partnership
  • Research
    • Research Areas
    • Research Groups
    • Grants
    • Internship at the CIMeC
  • Laboratories
    • Neuroimaging Labs (LNiF)
    • Experimental Psychology Labs (EPL)
    • Animal Cognition and Neuroscience Laboratory (ACN Lab)
    • Language, Interaction and Computation Laboratory (CLIC)
    • Center for Neurocognitive Rehabilitation (CeRiN)
    • Laboratories in collaboration
  • CeRiN
    • Clinical Service
    • Research
    • CeRiN staff
    • Training
  • People
    • The Director
    • Principal Investigators (PIs)
    • Research Fellows
    • Ph.D. Students
    • Admin and Technical Staff
    • Visiting professors
  • Locations
 
CIMeC - Center for Mind/Brain Sciences
  • Condividi questa pagina
  • Printer-friendly version
Facebook Google Plus LinkedIn Twitter Mail Whatsapp 
Home | Research | Research Groups | Language and Vision - LaVi

Language and Vision - LaVi

  • Research Areas
  • Research Groups
  • Grants
  • Internship at the CIMeC

Overview | Research directions | Members | Publications | Ongoing collaborations

Overview

Language and Vision are two fundamental modalities through which human beings acquire knowledge about the world. We see and speak about things and events around us, and by doing so, we learn properties and relations about objects. These two modalities are quite interdependent and we constantly mix information we acquire through them. However, computational models of language and vision have been developing separately and the two research communities have for a long time been unaware of each other's work. Interestingly, through these parallel research lines, they have developed highly compatible representations of words and images, respectively.
The importance of developing computational models of language and vision together has been highlighted by philosophers and cognitive scientists since the birth of the Artificial Intelligence paradigm. Only recently, however, the challenge has been empirically taken up by computational linguists and computer vision researchers.
In the last two decades, the availability of large amounts of text on the web has led to tremendous improvements in NLP research. Sophisticated textual search engines are now well consolidated and part of everybody's daily life. Images are the natural next challenge of the digital society. The combination of language and vision is the winning horse for this new era.

Research Directions

The UniTrento researchers are at the fronting edges of this new challenge. Driven by theoretical questions, we look at applications as the test bed for our models. The focus so far has been on the investigation of multimodal models combining linguistic and visual vector representations; cross-modal mapping from visual to language space; enhancement of visual recognizers through language models. The recent results on LaVI at UniTrento have profit on the close collaboration with the team of the ERC project COMPOSES and of MHUG Research Group. We are currently focusing on:

  • Modeling the acquisition of cognitive core skills: Can multimodal models acquire core cognitive skills, e.g. numerosity? Is this a core skill also for the models, in other words does it facilitate learning other skills?
  • Learning to see through interaction: infants learn by interacting with the others and the world outside. Can model learn to ground language into vision through interaction? 

We tackled these issues by investigating continual learning methods and grounded dialogues. 

Members

  • Raffaella Bernardi, Personal Investigator 
  • Stella Frank, Research fellow
  • Claudio Greco, PhD student
  • Alberto Testoni, PhD student

Former members

  • Elia Bruni
  • Eleonora Gualdoni
  • Angeliki Lazaridou
  • D. Tien Nguyen
  • Sandro Pezzelle
  • Ravi Shenkar
  • David Addison Smith
  • Ionut-Teodor Sorodoc
  • Dieu Thu Le

Publications

​For a complete list see Raffaella Bernardi personal page

Ongoing collaborations

  • Luciana Benotti, Universidad Nacional de Córdoba
  • Gemma Boleda, Department of Translation and Language Sciences, Universitat Pompeu Fabra 
  • Raquel Fernandez,  Institute for Logic, Language & Computation (ILLC), University of Amsterdam
  • Albert Gatt, Institute of Linguistics and Language Technology, University of Malta
  • Moin Nabi, SAP
  • Barbara Plank, IT University of Copenhagen

We were part of the Cost Action The European Network on Integrating Vision and Language; attended  the Dagstuhl seminar on Joint Processing of Language and Visual Data for Better Automated Understanding and organizers of various international workshop on this research topic.