Show simple item record

dc.contributor.authorSmyth, L
dc.contributor.authorKangin, D
dc.contributor.authorPugeault, N
dc.date.accessioned2019-06-07T14:40:26Z
dc.date.issued2019-09-30
dc.description.abstractManually labeling new datasets for image classification remains expensive and time-consuming. A promising alternative is to utilize the abundance of images on the web for which search queries or surrounding text offers a natural source of weak supervision. Unfortunately the label noise in these datasets has limited their use in practice. Several methods have been proposed for performing unsupervised label noise cleaning, the majority of which use outlier detection to identify and remove mislabeled images. In this paper, we argue that outlier detection is an inherently unsuitable approach for this task due to major flaws in the assumptions it makes about the distribution of mislabeled images. We propose an alternative approach which makes no such assumptions. Rather than looking for outliers, we observe that mislabeled images can be identified by the detrimental impact they have on the performance of an image classifier. We introduce training-value as an objective measure of the contribution each training example makes to the validation loss. We then present the training-value approximation network (Training-ValueNet) which learns a mapping between each image and its training-value. We demonstrate that by simply discarding images with a negative training-value, Training-ValueNet is able to significantly improve classification performance on a held-out test set, outperforming the state of the art in outlier detection by a large margin.en_GB
dc.identifier.citation2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 19-22 August 2019, Oslo, Norwayen_GB
dc.identifier.doi10.1109/DEVLRN.2019.8850689
dc.identifier.urihttp://hdl.handle.net/10871/37408
dc.language.isoenen_GB
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_GB
dc.rights© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksen_GB
dc.titleTraining-ValueNet: data-driven label noise cleaning on weakly-supervised web imagesen_GB
dc.typeConference paperen_GB
dc.date.available2019-06-07T14:40:26Z
dc.descriptionThis is the author accepted manuscript. The final version is available from IEEE via the DOI in this recorden_GB
dc.rights.urihttp://www.rioxx.net/licenses/all-rights-reserveden_GB
dcterms.dateAccepted2019-05-07
rioxxterms.versionAMen_GB
rioxxterms.licenseref.startdate2019-05-07
rioxxterms.typeConference Paper/Proceeding/Abstracten_GB
refterms.dateFCD2019-06-07T13:48:14Z
refterms.versionFCDAM
refterms.dateFOA2019-11-04T14:06:01Z
refterms.panelBen_GB


Files in this item

This item appears in the following Collection(s)

Show simple item record