Show simple item record

dc.contributor.authorChaudhuri, A
dc.date.accessioned2024-10-23T16:29:18Z
dc.date.issued2024-10-28
dc.date.updated2024-10-23T13:52:48Z
dc.description.abstractRepresentations of data obtained from deep neural networks automatically encode structures in a data distribution that are helpful for solving arbitrary downstream tasks like classification, retrieval, etc. To achieve this, design patterns for deep neural networks, as well as their training schemes, rely on a fundamental assumption about the completeness of the input data source. Specifically, they assume that each unit of datum, consumed in its original form (images at a certain scale, or from a certain domain), contains everything that there is to know when it comes to predicting its label. However, this completeness assumption may be violated when the data distribution is ambiguous, noisy, or incomplete. This led to the development of multi-view representation learning, which posits that a complete concept may only be holistically described as a combination of multiple views, and each sample (data-point) is only one of the many required views. This thesis studies the conditions which lead to various problems being either better or worse candidates for multi-view representation learning in the context of computer vision, from both theoretical and empirical perspectives. We start by understanding how relationships between the different views of an object can uniquely encode semantic information. We develop a rigorous theoretical framework for formalizing this idea and show its benefits in the context of fine-grained visual categorization and zero-shot learning. We further study how relational representation learning can be made more interpretable by expressing the abstract ways in which different views combine in a deep neural network as transformations over a graph of image views. In the second part of this thesis, we explore view multiplicity in the context of multi-modal representation learning. We primarily focus on cross-modal image retrieval, whereby we develop state-of-the-art algorithms that mine complementary information across views to efficiently learn unified multi-modal representations, as well as those that can operate in data and model constrained environments. In the final part of this thesis, we study various properties of conditional invariance learning in the context of domain adaptation. We present a novel perspective on invariance learning by viewing the same through the lens of learning operators over domains. We then show that certain properties of the underlying operator dictates the nature of the invariance learned. We find that a simple and computationally efficient way of learning conditional invariances is by optimizing the corresponding operator to non-commutatively direct the domain mapping towards the target. A common theme that runs throughout this thesis is a characterization of the ways in which the distribution shifts that exist across different views influence the representation spaces of neural networks, which is helpful in understanding the generalization properties of various learning paradigms.en_GB
dc.identifier.urihttp://hdl.handle.net/10871/137765
dc.language.isoenen_GB
dc.publisherUniversity of Exeteren_GB
dc.subjectDomain Adaptationen_GB
dc.subjectInterpretabilityen_GB
dc.subjectInvariance Learningen_GB
dc.subjectLearning Theoryen_GB
dc.subjectMulti-Modal Learningen_GB
dc.subjectRelational Representationsen_GB
dc.subjectRepresentation Learningen_GB
dc.titleMulti-View Representation Learning in Computer Visionen_GB
dc.typeThesis or dissertationen_GB
dc.date.available2024-10-23T16:29:18Z
dc.contributor.advisorDutta, Anjan
dc.contributor.advisorAkata, Zeynep
dc.contributor.advisorRowlands, Sareh
dc.publisher.departmentComputer Science
dc.rights.urihttp://www.rioxx.net/licenses/all-rights-reserveden_GB
dc.type.degreetitleDoctor of Philosophy in Computer Science
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctoral Thesis
rioxxterms.versionNAen_GB
rioxxterms.licenseref.startdate2024-10-28
rioxxterms.typeThesisen_GB
refterms.dateFOA2024-10-23T16:29:24Z


Files in this item

This item appears in the following Collection(s)

Show simple item record