Theoretical Understanding on Adversarial Robustness of Deep Neural Networks
Wang, Z
Date: 22 April 2025
Thesis or dissertation
Publisher
University of Exeter
Degree Title
PhD in Computer Science
Abstract
As state-of-the-art (SOTA) Machine Learning (ML) techniques, particularly Deep Neural Networks (DNNs), become increasingly deployed across various industry sectors, concerns about their trustworthiness have been raised more frequently. The trustworthiness of DNNs can be undermined by adversarial examples—subtle input perturbations that ...
As state-of-the-art (SOTA) Machine Learning (ML) techniques, particularly Deep Neural Networks (DNNs), become increasingly deployed across various industry sectors, concerns about their trustworthiness have been raised more frequently. The trustworthiness of DNNs can be undermined by adversarial examples—subtle input perturbations that are nearly imperceptible to humans but can deceive ML models. This can result in critical errors in applications such as cancer detection and traffic signal recognition. However, the underlying causes of adversarial examples still lack a comprehensive theoretical understanding.
To contribute to this theoretical understanding, we first investigate the layer-wise causes that influence the adversarial robustness of DNNs. We present a comprehensive theoretical framework that connects adversarial robustness to the Cauchy problem of its underlying Ordinary Differential Equation (ODE), allowing for the independent assessment of the robustness of neural network components, irrespective of attacking algorithms. Through two case studies, we illustrate the application of our theorem: first, we examine the robustness of commonly used bottleneck architectures and their variants, emphasizing the enhanced robustness of inverted residuals; second, we evaluate the defensive capabilities of Multihead Self-Attention (MSA) in Vision Transformers (ViTs), uncovering its inferior performance against strong adversarial attacks compared to convolutional neural networks.
In addition to focusing on individual layers, we investigate the effects of collaboration among them. We ask: Is there collaboration between layers against adversarial examples during gradient descent? To quantify this collaboration, we introduce a new concept called Collaboration Correlation (Co-Correlation), which is interpreted as the alignment of feature selections to maximise outputs for each layer. We examine the implicit bias of gradient descent for adversarial robustness and theoretically prove that gradient descent enhances the Co-Correlation between layers. Additionally, we observe differing behaviours in under- and over-parameterised neural networks: under-parameterised networks tend to promote Co-Correlation among layers to enhance performance, whereas the performance improvement of over-parameterised networks does not heavily rely on establishing such Co-Correlation.
With regard to generalisation and its connection to adversarial robustness, we establish the theoretical relationship between natural and adversarial risks. Building on this connection, we aim to answer the question: How does the cross-layer correlation of the weight matrices influence both natural and adversarial generalization errors? Under mild conditions, we provably show that natural risks monotonically increase with respect to the cross-layer correlations. However, the impact on adversarial robustness is more nuanced, depending on the mean value of the Jacobian of the weight matrices.
In this thesis, we investigate the internal structure of DNNs concerning adversarial robustness and generalization. We aim to understand how internal structures and their connections in neural networks influence adversarial robustness and its trade-offs with generalization. To this end, we propose a theoretical framework for analyzing these internal structures, offering a potential pathway for future research. From a practical standpoint, our exploration provides valuable guidance for the design and optimization of deep neural networks, such as identifying and replacing unstable modules to enhance adversarial robustness and employing tailored learning rate strategies for different layers to improve both performance and robustness. We believe that understanding the intrinsic properties of neural networks will also enhance the interpretability of large and complex models.
Doctoral Theses
Doctoral College
Item views 0
Full item downloads 0