Multi-View Representation Learning in Computer Vision
Chaudhuri, A
Date: 28 October 2024
Thesis or dissertation
Publisher
University of Exeter
Degree Title
Doctor of Philosophy in Computer Science
Abstract
Representations of data obtained from deep neural networks automatically encode structures in a data distribution that are helpful for solving arbitrary downstream tasks like classification, retrieval, etc. To achieve this, design patterns for deep neural networks, as well as their training schemes, rely on a fundamental assumption ...
Representations of data obtained from deep neural networks automatically encode structures in a data distribution that are helpful for solving arbitrary downstream tasks like classification, retrieval, etc. To achieve this, design patterns for deep neural networks, as well as their training schemes, rely on a fundamental assumption about the completeness of the input data source. Specifically, they assume that each unit of datum, consumed in its original form (images at a certain scale, or from a certain domain), contains everything that there is to know when it comes to predicting its label. However, this completeness assumption may be violated when the data distribution is ambiguous, noisy, or incomplete. This led to the development of multi-view representation learning, which posits that a complete concept may only be holistically described as a combination of multiple views, and each sample (data-point) is only one of the many required views. This thesis studies the conditions which lead to various problems being either better or worse candidates for multi-view representation learning in the context of computer vision, from both theoretical and empirical perspectives.
We start by understanding how relationships between the different views of an object can uniquely encode semantic information. We develop a rigorous theoretical framework for formalizing this idea and show its benefits in the context of fine-grained visual categorization and zero-shot learning. We further study how relational representation learning can be made more interpretable by expressing the abstract ways in which different views combine in a deep neural network as transformations over a graph of image views. In the second part of this thesis, we explore view multiplicity in the context of multi-modal representation learning. We primarily focus on cross-modal image retrieval, whereby we develop state-of-the-art algorithms that mine complementary information across views to efficiently learn unified multi-modal representations, as well as those that can operate in data and model constrained environments. In the final part of this thesis, we study various properties of conditional invariance learning in the context of domain adaptation. We present a novel perspective on invariance learning by viewing the same through the lens of learning operators over domains. We then show that certain properties of the underlying operator dictates the nature of the invariance learned. We find that a simple and computationally efficient way of learning conditional invariances is by optimizing the corresponding operator to non-commutatively direct the domain mapping towards the target. A common theme that runs throughout this thesis is a characterization of the ways in which the distribution shifts that exist across different views influence the representation spaces of neural networks, which is helpful in understanding the generalization properties of various learning paradigms.
Doctoral Theses
Doctoral College
Item views 0
Full item downloads 0
Related items
Showing items related by title, author, creator and subject.
-
The Value and Benefits of Learning a Foreign Language in Community Settings in the UK: older adults' perceptions of what this does and means for them
Hooker, Rebecca (University of Exeter Graduate School of Education, 1 April 2011)This is a qualitative and context-specific study into the meaning and value attributed by older people to learning a foreign language in their own time and for reasons mainly unconnected to attainment and qualifications. ... -
The vocabulary learning behavior of Romanian high school students in a digital context
Cojocnean, Diana Maria (University of Exeter College of Social Sciences and International Studies, Graduate School of Education, 27 April 2015)This thesis investigates the vocabulary learning behavior of Romanian high school students in a digital context. The research identifies the vocabulary learning strategies used by EFL high school students and focuses on ... -
Developing and Evaluating Peer Tutoring Programme (Maths PALS) for Trainee Teachers of SEN Pupils in Saudi Arabia
Alhasan, Naeema Abdulrahman (University of Exeter The Graduate School of Education, 16 February 2018)Peer tutoring has become well-established in higher education and, with growing interest in peer learning, has started to gain popularity at school level with evident success in a range of settings and subject areas. ...