Training-ValueNet: A new approach for label cleaning on weakly-supervised datasets

Smyth, L

dc.contributor.author	Smyth, L
dc.date.accessioned	2020-08-04T07:33:00Z
dc.date.issued	2020-08-03
dc.description.abstract	Manually labelling training data for machine learning has always been incredibly time-consuming and expensive. For those who seek to apply modern deep learning algorithms however, the cost of acquiring enough accurately labelled data is quickly becoming the single greatest obstacle impeding progress. Weakly-supervised learning offers a promising alternative by enabling practitioners to rapidly apply weak sources of supervision to large amounts of data. Unfortunately, the presence of label noise in these datasets remains a critical issue as it can severely impair the performance of a machine learning model. In this thesis, we investigate a new approach for performing label cleaning on weakly-supervised data without human supervision. We propose that the boundary between correctly labelled and mislabelled examples might best be described in terms of the impact that an individual training example has on performance. Specifically, we hypothesise that mislabelled training examples will reliably detriment the generalization performance of a classifier and can be identified as such. To this end, we present the Training-Value approximation network (Training-ValueNet) which learns to estimate the training-value of each example - an objective measure of its impact on performance. In a series of three key experiments, we demonstrate that by simply discarding examples with a negative training-value, Training-ValueNet can significantly reduce the proportion of label noise in weakly-supervised datasets and improve the final performance of an image classification model as a result. In a label noise detection task, our method achieves a substantial 39% lower detection error than the current state-of-the-art outlier detection method for label cleaning. Furthermore, we demonstrate that when our method is used for label cleaning, weakly-supervised learning can achieve comparable performance with the fully-supervised paradigm. This highlights the potential for data-driven approaches like ours to eradicate the need for manual label cleaning all-together.	en_GB
dc.identifier.uri	http://hdl.handle.net/10871/122297
dc.publisher	University of Exeter	en_GB
dc.title	Training-ValueNet: A new approach for label cleaning on weakly-supervised datasets	en_GB
dc.type	Thesis or dissertation	en_GB
dc.date.available	2020-08-04T07:33:00Z
dc.contributor.advisor	Pugeault, N	en_GB
dc.publisher.department	Computer Science	en_GB
dc.rights.uri	http://www.rioxx.net/licenses/all-rights-reserved	en_GB
dc.type.degreetitle	MbyRes Computer Science	en_GB
dc.type.qualificationlevel	Masters	en_GB
dc.type.qualificationname	MbyRes Dissertation	en_GB
rioxxterms.version	NA	en_GB
rioxxterms.licenseref.startdate	2020-08-03
rioxxterms.type	Thesis	en_GB
refterms.dateFOA	2020-08-04T07:33:17Z

Files in this item

Name:: SmythL.pdf
Size:: 6.781Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MbyRes Dissertations

Show simple item record

Show Statistical Information