Training-ValueNet: data-driven label noise cleaning on weakly-supervised web images

Smyth, L; Kangin, D; Pugeault, N

dc.contributor.author	Smyth, L
dc.contributor.author	Kangin, D
dc.contributor.author	Pugeault, N
dc.date.accessioned	2019-06-07T14:40:26Z
dc.date.issued	2019-09-30
dc.description.abstract	Manually labeling new datasets for image classification remains expensive and time-consuming. A promising alternative is to utilize the abundance of images on the web for which search queries or surrounding text offers a natural source of weak supervision. Unfortunately the label noise in these datasets has limited their use in practice. Several methods have been proposed for performing unsupervised label noise cleaning, the majority of which use outlier detection to identify and remove mislabeled images. In this paper, we argue that outlier detection is an inherently unsuitable approach for this task due to major flaws in the assumptions it makes about the distribution of mislabeled images. We propose an alternative approach which makes no such assumptions. Rather than looking for outliers, we observe that mislabeled images can be identified by the detrimental impact they have on the performance of an image classifier. We introduce training-value as an objective measure of the contribution each training example makes to the validation loss. We then present the training-value approximation network (Training-ValueNet) which learns a mapping between each image and its training-value. We demonstrate that by simply discarding images with a negative training-value, Training-ValueNet is able to significantly improve classification performance on a held-out test set, outperforming the state of the art in outlier detection by a large margin.	en_GB
dc.identifier.citation	2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 19-22 August 2019, Oslo, Norway	en_GB
dc.identifier.doi	10.1109/DEVLRN.2019.8850689
dc.identifier.uri	http://hdl.handle.net/10871/37408
dc.language.iso	en	en_GB
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_GB
dc.rights	© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works	en_GB
dc.title	Training-ValueNet: data-driven label noise cleaning on weakly-supervised web images	en_GB
dc.type	Conference paper	en_GB
dc.date.available	2019-06-07T14:40:26Z
dc.description	This is the author accepted manuscript. The final version is available from IEEE via the DOI in this record	en_GB
dc.rights.uri	http://www.rioxx.net/licenses/all-rights-reserved	en_GB
dcterms.dateAccepted	2019-05-07
rioxxterms.version	AM	en_GB
rioxxterms.licenseref.startdate	2019-05-07
rioxxterms.type	Conference Paper/Proceeding/Abstract	en_GB
refterms.dateFCD	2019-06-07T13:48:14Z
refterms.versionFCD	AM
refterms.dateFOA	2019-11-04T14:06:01Z
refterms.panel	B	en_GB

Files in this item

Name:: Conference_Paper (1).pdf
Size:: 774.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science

Show simple item record

Show Statistical Information