How clumpy is my image? Evaluating crowdsourced annotation tasks
Institute of Electrical and Electronics Engineers (IEEE)
Copyright © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
The use of citizen science to obtain annotations from multiple annotators has been shown to be an effective method for annotating datasets in which computational methods alone are not feasible. The way in which the annotations are obtained is an important consideration which affects the quality of the resulting consensus estimates. In this paper, we examine three separate approaches to obtaining scores for instances rather than merely classifications. To obtain a consensus score annotators were asked to make annotations in one of three paradigms: classification, scoring and ranking. A web-based citizen science experiment is described which implements the three approaches as crowdsourced annotation tasks. The tasks are evaluated in relation to the accuracy and agreement among the participants using both simulated and real-world data from the experiment. The results show a clear difference in performance between the three tasks, with the ranking task obtaining the highest accuracy and agreement among the participants. We show how a simple evolutionary optimiser may be used to improve the performance by reweighting the importance of annotators.
13th UK Workshop on Computational Intelligence (UKCI), Guildford, UK, 9-11 September 2013
This is the author accepted manuscript. The final version is available from the publisher via the DOI in this record.
Proceedings of the 13th UK Workshop on Computational Intelligence (UKCI), pp. 136-143