Theoretical Understanding on Adversarial Robustness of Deep Neural Networks

Wang, Z

dc.contributor.author	Wang, Z
dc.date.accessioned	2025-04-22T15:08:11Z
dc.date.issued	2025-04-22
dc.date.updated	2025-04-22T13:34:40Z
dc.description.abstract	As state-of-the-art (SOTA) Machine Learning (ML) techniques, particularly Deep Neural Networks (DNNs), become increasingly deployed across various industry sectors, concerns about their trustworthiness have been raised more frequently. The trustworthiness of DNNs can be undermined by adversarial examples—subtle input perturbations that are nearly imperceptible to humans but can deceive ML models. This can result in critical errors in applications such as cancer detection and traffic signal recognition. However, the underlying causes of adversarial examples still lack a comprehensive theoretical understanding. To contribute to this theoretical understanding, we first investigate the layer-wise causes that influence the adversarial robustness of DNNs. We present a comprehensive theoretical framework that connects adversarial robustness to the Cauchy problem of its underlying Ordinary Differential Equation (ODE), allowing for the independent assessment of the robustness of neural network components, irrespective of attacking algorithms. Through two case studies, we illustrate the application of our theorem: first, we examine the robustness of commonly used bottleneck architectures and their variants, emphasizing the enhanced robustness of inverted residuals; second, we evaluate the defensive capabilities of Multihead Self-Attention (MSA) in Vision Transformers (ViTs), uncovering its inferior performance against strong adversarial attacks compared to convolutional neural networks. In addition to focusing on individual layers, we investigate the effects of collaboration among them. We ask: Is there collaboration between layers against adversarial examples during gradient descent? To quantify this collaboration, we introduce a new concept called Collaboration Correlation (Co-Correlation), which is interpreted as the alignment of feature selections to maximise outputs for each layer. We examine the implicit bias of gradient descent for adversarial robustness and theoretically prove that gradient descent enhances the Co-Correlation between layers. Additionally, we observe differing behaviours in under- and over-parameterised neural networks: under-parameterised networks tend to promote Co-Correlation among layers to enhance performance, whereas the performance improvement of over-parameterised networks does not heavily rely on establishing such Co-Correlation. With regard to generalisation and its connection to adversarial robustness, we establish the theoretical relationship between natural and adversarial risks. Building on this connection, we aim to answer the question: How does the cross-layer correlation of the weight matrices influence both natural and adversarial generalization errors? Under mild conditions, we provably show that natural risks monotonically increase with respect to the cross-layer correlations. However, the impact on adversarial robustness is more nuanced, depending on the mean value of the Jacobian of the weight matrices. In this thesis, we investigate the internal structure of DNNs concerning adversarial robustness and generalization. We aim to understand how internal structures and their connections in neural networks influence adversarial robustness and its trade-offs with generalization. To this end, we propose a theoretical framework for analyzing these internal structures, offering a potential pathway for future research. From a practical standpoint, our exploration provides valuable guidance for the design and optimization of deep neural networks, such as identifying and replacing unstable modules to enhance adversarial robustness and employing tailored learning rate strategies for different layers to improve both performance and robustness. We believe that understanding the intrinsic properties of neural networks will also enhance the interpretability of large and complex models.	en_GB
dc.identifier.uri	http://hdl.handle.net/10871/140844
dc.language.iso	en	en_GB
dc.publisher	University of Exeter	en_GB
dc.rights.embargoreason	This thesis is embargoed until 22/Oct/2026 as the author plans to publish their research.	en_GB
dc.subject	Neural Networks	en_GB
dc.subject	Adversarial Robustness	en_GB
dc.subject	Machine Learning Theory	en_GB
dc.title	Theoretical Understanding on Adversarial Robustness of Deep Neural Networks	en_GB
dc.type	Thesis or dissertation	en_GB
dc.date.available	2025-04-22T15:08:11Z
dc.contributor.advisor	Min, Geyong
dc.contributor.advisor	Mustafee, Nav
dc.contributor.advisor	Ruan, Wenjie
dc.publisher.department	Computer Science
dc.rights.uri	http://www.rioxx.net/licenses/all-rights-reserved	en_GB
dc.type.degreetitle	PhD in Computer Science
dc.type.qualificationlevel	Doctoral
dc.type.qualificationname	Doctoral Thesis
rioxxterms.version	NA	en_GB
rioxxterms.licenseref.startdate	2025-04-22
rioxxterms.type	Thesis	en_GB

Files in this item

Name:: Thesis-final.pdf
Size:: 8.798Mb
Format:: PDF
Description:: Theoretical Understanding on ...

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record

Show Statistical Information