Leveraging vision-langage model to improve representation learning

le 14 février 2024


Campus de Beaulieu Salle i-50 - bât. 12D

Intervention de Ewa Kijak, maître de conférence à l'ESIR dans l'équipe Linkmedia à l'IRISA, dans le cadre des séminaires du département Informatique.


Obtaining a good representation of visual content is a fundamental condition for the success of many vision tasks, such as image classification, object recognition or information retrieval. The development of deep learning and the availability of vision or vision-language foundation models have profoundly changed the way visual data is represented. These models show great potential for learning better visual representations, and representations that are transferable across many vision tasks.
In this talk, we will present vision-language models and discuss different approaches for improving the learning of visual representations in different application contexts.

