Course details

8.3497

Grounding Language into the External Visual World

T
SS 2022 Dr. Elia Bruni ONLINE
3h/wk
4 ECTS
B.Sc modules:
CS-BWP-CL - (Computational) Linguistics
KOGW-WPM-CL - Computational Linguistics
M.Sc modules:
CC-MWP-CL - Computational Linguistics
CS-MWP-CL - (Computational) Linguistics

CS-MW - Master elective course
Thu: 9-12

Everyday interactions require a common understanding of language, i.e. for people to communicate effectively, words (for example ‘cat’) should invoke similar beliefs over physical concepts (what cats look like, the sounds they make, how they behave, what their skin feels like etc.). However, how this ‘common understanding’ emerges is still unclear. One appealing hypothesis is that language is tied to how we interact with the environment. As a result, meaning emerges by ‘grounding’ language in modalities in our environment (images, sounds, actions, etc.). This course will review recent works in machine learning which bridges visual and natural language understanding through visually-grounded language learning tasks, e.g. through natural images (Visual Question Answering, Visual Dialog), or through interactions with virtual physical environments. As the grounding problem requires an interdisciplinary attitude, this course aims to gather students with broad expertise in various fields -- machine learning, computer vision, natural language, neuroscience, and psychology -- and who are excited about this space of grounding and interactions.