dc.description.abstract | Manually labelling training data for machine learning has always been incredibly time-consuming and expensive. For those who seek to apply modern deep learning algorithms however, the cost of acquiring enough accurately labelled data is quickly becoming the single greatest obstacle impeding progress. Weakly-supervised learning offers a promising alternative by enabling practitioners to rapidly apply weak sources of supervision to large amounts of data. Unfortunately, the presence of label noise in these datasets remains a critical issue as it can severely impair the performance of a machine learning model. In this thesis, we investigate a new approach for performing label cleaning on weakly-supervised data without human supervision. We propose that the boundary between correctly labelled and mislabelled examples might best be described in terms of the impact that an individual training example has on performance. Specifically, we hypothesise that mislabelled training examples will reliably detriment the generalization performance of a classifier and can be identified as such. To this end, we present the Training-Value approximation network (Training-ValueNet) which learns to estimate the training-value of each example - an objective measure of its impact on performance. In a series of three key experiments, we demonstrate that by simply discarding examples with a negative training-value, Training-ValueNet can significantly reduce the proportion of label noise in weakly-supervised datasets and improve the final performance of an image classification model as a result. In a label noise detection task, our method achieves a substantial 39% lower detection error than the current state-of-the-art outlier detection method for label cleaning. Furthermore, we demonstrate that when our method is used for label cleaning, weakly-supervised learning can achieve comparable performance with the fully-supervised paradigm. This highlights the potential for data-driven approaches like ours to eradicate the need for manual label cleaning all-together. | en_GB |