Subscribe to Toloka News
Subscribe to Toloka News
Introducing Crowd Science Seminars: a series of international online meetups highlighting scientific research related to crowdsourcing, organized bi-weekly by the Toloka team. We'll be using the blog to share the main ideas and key takeaways from each event. This article is the first in an ongoing series where we'll zero in on the crowdsourcing aspects of each topic in the Crowd Science Seminars.
In our first post, we cover a recent talk by Ivan Stelmakh, who presented his research with Nihar Shah and Aarti Singh on the importance of "principled design of human decision-making systems". Stelmakh examined crowdsourcing as one example of a human decision-making system, which is what we'll focus on here.
A human decision-making system uses input from a group of individuals to solve complex problems. Here are some specific examples:
All of these systems are beneficial because they capitalize on shared knowledge and insights. However, as human decisions are prone to biases and errors, these systems also have inherent pitfalls which need to be accounted for.
Human systems tend to have systematic problems. In a 2016 article for Nature, Drummond Rennie makes a strong point:
It may sound surprising that the system responsible for validating the rigor of scientific research is itself unscientific. And if academic peer review is not scientific, how can the results of performance reviews or crowdsourcing be considered reliable?
Stelmakh's research focuses on principled design of human decision-making systems: he works on designing tools and techniques to address the problems that inevitably arise in large systems where humans make decisions. At the seminar, he presented three main problems to tackle:
Let's look at how these issues can be managed in the crowdsourcing context.
Problem: People are not always careful and often make mistakes.
Solution: Rely on the accuracy of the group, not individuals. The Golden Rule of crowdsourcing is to assign each task to multiple performers and aggregate their responses. The crowdsourcing industry has put a lot of effort into developing aggregation methods that take results from multiple people and combine them to boost the overall quality of data. One of the simplest approaches is majority vote: examine multiple responses for the same task and assume that the most popular answer is correct. Even a basic approach like this is a good start on eliminating noise.
Problem: In crowdsourcing, the performer's goal is often to earn as much money as possible, which might not match up with the requester's goal to get high-quality data. Performers sometimes generate spam, use scripts to do their tasks, or click through tasks randomly to speed up the process or "game the system".
Solution: Design a system where the incentives for performers are aligned with the goals of the organizers. A simple approach is to check the performer's accuracy by using a set of questions that you know the correct answers to (called a golden set). The performer's quality rating can be directly connected to their pay rate, which motivates them to put more effort into the task.
Problem: All people have biases, whether they are aware of them or not. Importantly, manifestations of these biases are often too subtle to be detected, but can hurt the quality of collected data. For example, performers can be influenced by certain aspects of the task design without realizing it — missing information that causes doubt, preference for options that are given first, similarity between response options — and these issues can decrease the quality of collected data.
Solution: Crowdsourcing projects can take a practical approach to eliminating the opportunity for bias. Careful instructions can reduce uncertainty and simplify the task for performers. To avoid social influence, performers are usually not allowed to see answers given by other people. Another common strategy is to use a randomized interface that shuffles the order of response options for each performer.
Human error in data labeling can have far-reaching consequences — if you use crowdsourcing to collect a large dataset and then train algorithms with it, any problems in the data (like bias or noise) will also appear in the machine learning models. However, there are many ways to control quality and compensate for human error in crowdsourcing, with ongoing research in this area.
This post has covered just an overview of the crowdsourcing aspect of Ivan Stelmakh's research. For the full analysis and more applications of these problems and solutions, watch the presentation: