Multi-View Representation Learning in Computer Vision

Chaudhuri, A

dc.contributor.author	Chaudhuri, A
dc.date.accessioned	2024-10-23T16:29:18Z
dc.date.issued	2024-10-28
dc.date.updated	2024-10-23T13:52:48Z
dc.description.abstract	Representations of data obtained from deep neural networks automatically encode structures in a data distribution that are helpful for solving arbitrary downstream tasks like classification, retrieval, etc. To achieve this, design patterns for deep neural networks, as well as their training schemes, rely on a fundamental assumption about the completeness of the input data source. Specifically, they assume that each unit of datum, consumed in its original form (images at a certain scale, or from a certain domain), contains everything that there is to know when it comes to predicting its label. However, this completeness assumption may be violated when the data distribution is ambiguous, noisy, or incomplete. This led to the development of multi-view representation learning, which posits that a complete concept may only be holistically described as a combination of multiple views, and each sample (data-point) is only one of the many required views. This thesis studies the conditions which lead to various problems being either better or worse candidates for multi-view representation learning in the context of computer vision, from both theoretical and empirical perspectives. We start by understanding how relationships between the different views of an object can uniquely encode semantic information. We develop a rigorous theoretical framework for formalizing this idea and show its benefits in the context of fine-grained visual categorization and zero-shot learning. We further study how relational representation learning can be made more interpretable by expressing the abstract ways in which different views combine in a deep neural network as transformations over a graph of image views. In the second part of this thesis, we explore view multiplicity in the context of multi-modal representation learning. We primarily focus on cross-modal image retrieval, whereby we develop state-of-the-art algorithms that mine complementary information across views to efficiently learn unified multi-modal representations, as well as those that can operate in data and model constrained environments. In the final part of this thesis, we study various properties of conditional invariance learning in the context of domain adaptation. We present a novel perspective on invariance learning by viewing the same through the lens of learning operators over domains. We then show that certain properties of the underlying operator dictates the nature of the invariance learned. We find that a simple and computationally efficient way of learning conditional invariances is by optimizing the corresponding operator to non-commutatively direct the domain mapping towards the target. A common theme that runs throughout this thesis is a characterization of the ways in which the distribution shifts that exist across different views influence the representation spaces of neural networks, which is helpful in understanding the generalization properties of various learning paradigms.	en_GB
dc.identifier.uri	http://hdl.handle.net/10871/137765
dc.language.iso	en	en_GB
dc.publisher	University of Exeter	en_GB
dc.subject	Domain Adaptation	en_GB
dc.subject	Interpretability	en_GB
dc.subject	Invariance Learning	en_GB
dc.subject	Learning Theory	en_GB
dc.subject	Multi-Modal Learning	en_GB
dc.subject	Relational Representations	en_GB
dc.subject	Representation Learning	en_GB
dc.title	Multi-View Representation Learning in Computer Vision	en_GB
dc.type	Thesis or dissertation	en_GB
dc.date.available	2024-10-23T16:29:18Z
dc.contributor.advisor	Dutta, Anjan
dc.contributor.advisor	Akata, Zeynep
dc.contributor.advisor	Rowlands, Sareh
dc.publisher.department	Computer Science
dc.rights.uri	http://www.rioxx.net/licenses/all-rights-reserved	en_GB
dc.type.degreetitle	Doctor of Philosophy in Computer Science
dc.type.qualificationlevel	Doctoral
dc.type.qualificationname	Doctoral Thesis
rioxxterms.version	NA	en_GB
rioxxterms.licenseref.startdate	2024-10-28
rioxxterms.type	Thesis	en_GB
refterms.dateFOA	2024-10-23T16:29:24Z

Files in this item

Name:: AbhraChaudhuri_PhD_Thesis_with ...
Size:: 40.72Mb
Format:: PDF
Description:: Multi-View Representation Learning ...

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record

Show Statistical Information

Multi-View Representation Learning in Computer Vision

Files in this item

This item appears in the following Collection(s)

Related items

The Value and Benefits of Learning a Foreign Language in Community Settings in the UK: older adults' perceptions of what this does and means for them ﻿

The vocabulary learning behavior of Romanian high school students in a digital context ﻿

Developing and Evaluating Peer Tutoring Programme (Maths PALS) for Trainee Teachers of SEN Pupils in Saudi Arabia ﻿

The Value and Benefits of Learning a Foreign Language in Community Settings in the UK: older adults' perceptions of what this does and means for them

The vocabulary learning behavior of Romanian high school students in a digital context

Developing and Evaluating Peer Tutoring Programme (Maths PALS) for Trainee Teachers of SEN Pupils in Saudi Arabia