Show simple item record

dc.contributor.authorSmyth, L
dc.date.accessioned2020-08-04T07:33:00Z
dc.date.issued2020-08-03
dc.description.abstractManually labelling training data for machine learning has always been incredibly time-consuming and expensive. For those who seek to apply modern deep learning algorithms however, the cost of acquiring enough accurately labelled data is quickly becoming the single greatest obstacle impeding progress. Weakly-supervised learning offers a promising alternative by enabling practitioners to rapidly apply weak sources of supervision to large amounts of data. Unfortunately, the presence of label noise in these datasets remains a critical issue as it can severely impair the performance of a machine learning model. In this thesis, we investigate a new approach for performing label cleaning on weakly-supervised data without human supervision. We propose that the boundary between correctly labelled and mislabelled examples might best be described in terms of the impact that an individual training example has on performance. Specifically, we hypothesise that mislabelled training examples will reliably detriment the generalization performance of a classifier and can be identified as such. To this end, we present the Training-Value approximation network (Training-ValueNet) which learns to estimate the training-value of each example - an objective measure of its impact on performance. In a series of three key experiments, we demonstrate that by simply discarding examples with a negative training-value, Training-ValueNet can significantly reduce the proportion of label noise in weakly-supervised datasets and improve the final performance of an image classification model as a result. In a label noise detection task, our method achieves a substantial 39% lower detection error than the current state-of-the-art outlier detection method for label cleaning. Furthermore, we demonstrate that when our method is used for label cleaning, weakly-supervised learning can achieve comparable performance with the fully-supervised paradigm. This highlights the potential for data-driven approaches like ours to eradicate the need for manual label cleaning all-together.en_GB
dc.identifier.urihttp://hdl.handle.net/10871/122297
dc.publisherUniversity of Exeteren_GB
dc.titleTraining-ValueNet: A new approach for label cleaning on weakly-supervised datasetsen_GB
dc.typeThesis or dissertationen_GB
dc.date.available2020-08-04T07:33:00Z
dc.contributor.advisorPugeault, Nen_GB
dc.publisher.departmentComputer Scienceen_GB
dc.rights.urihttp://www.rioxx.net/licenses/all-rights-reserveden_GB
dc.type.degreetitleMbyRes Computer Scienceen_GB
dc.type.qualificationlevelMastersen_GB
dc.type.qualificationnameMbyRes Dissertationen_GB
rioxxterms.versionNAen_GB
rioxxterms.licenseref.startdate2020-08-03
rioxxterms.typeThesisen_GB
refterms.dateFOA2020-08-04T07:33:17Z


Files in this item

This item appears in the following Collection(s)

Show simple item record