Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation
Fri, December 11th, 2020 | 08:00 PST
Events | Crowd Science Tutorial | Full-day crowd science workshop at NeurIPS 2020


Despite the clear advantages of AI, automation driven by machine learning carries pitfalls that affect the lives of millions of people. The negative repercussions include the disappearance of many well-established mass professions and increased consumption of labeled data produced by humans. This data is not always obtained in a positive environment: data suppliers are often managed in an old-fashioned way and have to work full-time on routine pre-assigned tasks, leading to job dissatisfaction. Crowdsourcing is a modern and effective alternative as it gives flexibility and freedom to task executors in terms of place, time and the task type they want to work on. However, many potential stakeholders of crowdsourcing processes hesitate to use this technology due to a series of doubts that have continued to circulate over the past decade. To address these issues, our workshop focuses on the research and industry communities and covers three important aspects of data supply: Remoteness, Fairness, and Mechanisms.


Data labeling requesters (data consumers for ML systems) doubt the effectiveness and efficiency of remote work. They need trustworthy quality control techniques and ways to guarantee reliable results on time. Crowdsourcing is one of the viable solutions for effective remote work. However, despite the rapid growth and the body of literature on the topic, crowdsourcing is in its infancy and, to a large extent, is still an art. It lacks clear guidelines and accepted practices for both the requesters and the performers (also known as workers), which makes it much harder to reach the full potential of this technology. We intend to reverse this and achieve a breakthrough by turning the art into a science.


Crowd workers (data suppliers) doubt the availability and choice of tasks. They need fair and ethical task assignment, fair compensation, and growth opportunities. We believe that the working environment (e.g. a crowdsourcing platform) may help meet these needs — it should provide flexibility in choosing tasks and working hours, and access to tasks should be fair and ethical. We also aim to address bias in task design and execution that can skew results in ways that data requesters don’t anticipate. Since quality, fairness and growth opportunities for performers are central to our workshop, we invite a diverse group of performers from a global public crowdsourcing platform to our panel-led discussion.


Matchmakers (the working environment, usually represented by a crowdsourcing platform) doubt the effectiveness of economic mechanisms that underlie their two-sided market. They need a mechanism design that guarantees proper incentives for both sides: flexibility and fairness for workers, and quality and efficiency for data requesters. We stress that economic mechanisms are the key to address the issues of remoteness and fairness successfully. Our intention is to deepen the interaction between and within communities that work on mechanisms and crowdsourcing.

Invited Speakers

Lora Aroyo
Google Research, USA
Gianluca Demartini
University of Queensland
Praveen Paritosh
Google Research, USA
Matt Lease
University of Texas at Austin
Seid Muhie Yimam
Universität Hamburg

Panel Discussion

Our panel discussion will gather all stakeholders: researchers, representatives of global crowd platforms like Toloka and Amazon MTurk, performers, and requesters who work with the crowd on a large scale. We hope to stimulate a fruitful conversation, shed light on what is not often discussed, develop solutions to problems and find new growth points for crowdsourcing.
  • Perspectives

    What is the future of crowdsourcing in terms of science, business, and the professions of the future?

  • Trust

    What are the mechanisms that can strengthen the trust between performers and requesters?

    How can we increase the confidence of the entire IT industry in using crowdsourcing in general?

  • Ethics

    What ethical issues exist in the crowdsourcing community?

    What are the problem areas and what should be done to change them?


Olga Megorskaya, Toloka

Marcos Baez, Université Claude Bernard Lyon 1
Pranjal Chutia, Contributor on Toloka
Sara Dolph, Contributor on Amazon Mechanical Turk
Morgan Dutton, Amazon Web Services (AWS)
Olga Masyutina, Contributor on Toloka
Michael Meehan, Contributor on Amazon Mechanical Turk
Sam Zhong, Microsoft


08:00 – 08:15
Introduction & Icebreakers
08:15 – 08:45
Data Excellence: Better Data for Better AI
— Lora Aroyo (invited talk)
08:45 – 09:05
A Gamified Crowdsourcing Framework
for Data-Driven Co-creation of
Policy Making and Social Foresight
— Andrea Tocchetti and Marco Brambilla (contributed talk)
09:05 – 09:25
Conversational Crowdsourcing
— Sihang Qiu, Ujwal Gadiraju, Alessandro Bozzon and
Geert-Jan Houben (contributed talk)
09:25 – 09:35
Coffee break
09:35 – 10:05
Quality Control in Crowdsourcing
— Seid Muhie Yimam (invited talk)
10:05 – 10:25
What Can Crowd Computing Do
for the Next Generation of AI Technology?
— Ujwal Gadiraju and Jie Yang (contributed talk)
10:25 – 10:45
Real-Time Crowdsourcing of Health Data
in a Low-Income country: A Case Study
of Human Data Supply on Malaria First-Line
Treatment Policy Tracking in Nigeria
— Olubayo Adekanmbi, Wuraola Fisayo Oyewusi and
Ezekiel Ogundepo (contributed talk)
10:45 – 11:00
Coffee break
11:00 – 12:30
Panel discussion
"Successes and failures in crowdsourcing: experiences
from work providers, performers and platforms"
12:30 – 13:00
Lunch break
13:00 – 13:30
Modeling and Aggregation of Complex Annotations
via Annotation Distance
— Matt Lease (invited talk)
13:30 – 13:50
Active Learning from Crowd in Item Screening
— Evgeny Krivosheev, Burcu Sayin, Alessandro Bozzon
and Zoltán Szlávik (contributed talk)
13:50 – 14:10
Human Computation Requires and Enables
a New Approach to Ethics
— Libuse Veprek, Patricia Seymour and Pietro Michelucci
(contributed talk)
14:10 – 14:20
Coffee break
14:20 – 14:50
Bias in Human-in-the-Loop Artificial Intelligence
— Gianluca Demartini (invited talk)
14:50 – 15:10
VAIDA: An Educative Benchmark Creation Paradigm
using Visual Analytics for Interactively
Discouraging Artifacts
— Anjana Arunkumar, Swaroop Mishra,
Bhavdeep Sachdeva, Chitta Baral and Chris Bryan
(contributed talk)
15:10 – 15:40
Achieving Data Excellence
— Praveen Paritosh (invited talk)
15:40 – 16:00
Don't miss
Don't miss our informative workshops, tutorials, and webinars.
Wed Apr 28 2021 16:41:21 GMT+0300 (Moscow Standard Time)