skip to Main Content

New research sheds light on how to get the most out of crowdsourcing campaigns

Credit: public domain CC0

In recent years, crowdsourcing, which involves recruiting members of the public to help collect data, has been extremely useful in providing researchers with unique and rich datasets, while engaging the public in the process of scientific discovery. In a new study, an international team of researchers explored how crowdsourcing projects can make the most of volunteer contributions.

Data collection activities through crowdsourcing range from field activities such as birdwatching to online activities such as image classification for projects like the highly successful Galaxy Zoo, in which participants classify galaxy shapes; and Geo-Wiki, where satellite images are interpreted for land cover, land use and socio-economic indicators. However, getting feedback from such a large number of participants analyzing a set of images raises questions about the actual accuracy of the responses submitted. While there are methods to ensure the accuracy of data collected in this way, they often have implications for crowdsourcing activities such as sampling design and associated costs.

In their study which has just been published in the journal PLOS ONEIIASA researchers and international colleagues have explored the issue of accuracy by examining how many notes of a task must be completed before researchers can be reasonably certain of the correct answer.

“Many types of research with public participation involve asking volunteers to classify images that are difficult for computers to distinguish in an automated way. However, when a task needs to be repeated by many people, it makes assigning tasks to the people who perform them more efficient if you are sure of the correct answer. This means volunteers or paid evaluators waste less time, and scientists or others requesting the tasks can make more use of the limited resources they have,” says Carl Salk, a summer program alumnus for young and long-serving IIASA (YSSP) scientists. IIASA collaborator time currently associated with the Swedish University of Agricultural Sciences.

The researchers developed a system to estimate the probability that the majority answer to a task was wrong, then stopped assigning the task to new volunteers when that probability became low enough or the probability of getting a clear answer is become weak. They demonstrated this process using a set of over 4.5 million unique classifications by 2,783 volunteers from over 190,000 images rated for the presence or absence of cropland. The authors point out that if their system had been implemented in the original data collection campaign, it would have eliminated the need for 59.4% of volunteer ratings, and that if the effort had been applied to new tasks, it would have allowed more than double the amount of images to be filed with the same amount of work. This shows how effective this method can be in using limited voluntary contributions more effectively.

According to the researchers, this method can be applied to almost any situation where a yes or no (binary) classification is required, and the answer may not be very obvious. Examples could include classification of other land use types, for example: “Is there forest in this image?” » ; identify the species, by asking, “Is there a bird in this photo?” » ; or even the kind of “ReCaptcha” tasks we perform to convince websites that we are human, such as “Is there a stoplight in this picture? The work can also help better answer questions important to policy makers, such as how much of the world’s land is used to grow crops.

“As data scientists increasingly turn to machine learning techniques for image classification, the use of crowdsourcing to create image libraries for training continues to grow in importance. This study describes how optimize the use of the crowd for this purpose, providing clear guidance on when to refocus efforts when the necessary level of trust is reached or a particular image is too difficult to classify,” concludes the co-author of the study, Ian McCallum, who leads the Novel Data Ecosystems for Sustainability research group at IIASA.

Classify works of art with a multiple naive Bayes algorithm

More information:

Carl Salk et al, How many people should classify the same image? A method for optimizing volunteer contributions in binary geographic classifications., PLOS ONE (2022). DOI: 10.1371/journal.pone.0267114

Provided by
International Institute for Applied Systems Analysis

New research sheds light on how to get the most out of crowdsourcing campaigns (2022, May 19)
retrieved 19 May 2022

This document is subject to copyright. Other than fair use for purposes of private study or research, no
any part may be reproduced without written permission. The content is provided for information only.

Stay connected with us on social media platform for instant update, click here to join our Jwitter& Facebook

Back To Top