Crowd Science Workshop at NeurIPS 2020

Remoteness, fairness, and mechanisms as challenges of data supply by humans for automation.



Despite the clear advantages of AI, automation driven by machine learning carries pitfalls that affect the lives of millions of people. The negative repercussions include the disappearance of many well-established mass professions and increased consumption of labeled data produced by humans. This data is not always obtained in a positive environment: data suppliers are often managed in an old-fashioned way and have to work full-time on routine pre-assigned tasks, leading to job dissatisfaction. Crowdsourcing is a modern and effective alternative as it gives flexibility and freedom to task executors in terms of place, time and the task type they want to work on. However, many potential stakeholders of crowdsourcing processes hesitate to use this technology due to a series of doubts that have continued to circulate over the past decade. To address these issues, our workshop focuses on the research and industry communities and covers three important aspects of data supply: Remoteness, Fairness, and Mechanisms.


Data labeling requesters (data consumers for ML systems) doubt the effectiveness and efficiency of remote work. They need trustworthy quality control techniques and ways to guarantee reliable results on time. Crowdsourcing is one of the viable solutions for effective remote work. However, despite the rapid growth and the body of literature on the topic, crowdsourcing is in its infancy and, to a large extent, is still an art. It lacks clear guidelines and accepted practices for both the requesters and the performers (also known as workers), which makes it much harder to reach the full potential of this technology. We intend to reverse this and achieve a breakthrough by turning the art into a science.


Crowd workers (data suppliers) doubt the availability and choice of tasks. They need fair and ethical task assignment, fair compensation, and growth opportunities. We believe that the working environment (e.g. a crowdsourcing platform) may help meet these needs — it should provide flexibility in choosing tasks and working hours, and access to tasks should be fair and ethical. We also aim to address bias in task design and execution that can skew results in ways that data requesters don’t anticipate. Since quality, fairness and growth opportunities for performers are central to our workshop, we invite a diverse group of performers from a global public crowdsourcing platform to our panel-led discussion.


Matchmakers (the working environment, usually represented by a crowdsourcing platform) doubt the effectiveness of economic mechanisms that underlie their two-sided market. They need a mechanism design that guarantees proper incentives for both sides: flexibility and fairness for workers, and quality and efficiency for data requesters. We stress that economic mechanisms are the key to address the issues of remoteness and fairness successfully. Our intention is to deepen the interaction between and within communities that work on mechanisms and crowdsourcing.


Lora Aroyo
Google Research, USA
Gianluca Demartini
University of Queensland
Praveen Paritosh
Google Research, USA
Matt Lease
University of Texas at Austin
Seid Muhie Yimam
Universität Hamburg

Panel Discussion

Our panel discussion will gather all stakeholders: researchers, representatives of global crowd platforms like Toloka and Amazon MTurk, performers, and requesters who work with the crowd on a large scale. We hope to stimulate a fruitful conversation, shed light on what is not often discussed, develop solutions to problems and find new growth points for crowdsourcing.


  • Perspectives
    What is the future of crowdsourcing in terms of science, business, and the professions of the future?
  • Trust
    What are the mechanisms that can strengthen the trust between performers and requesters?
    How can we increase the confidence of the entire IT industry in using crowdsourcing in general?
  • Ethics
    What ethical issues exist in the crowdsourcing community?
    What are the problem areas and what should be done to change them?


Olga Megorskaya, Toloka
Marcos Baez, Université Claude Bernard Lyon 1
Pranjal Chutia, Contributor on Toloka
Sara Dolph, Contributor on Amazon Mechanical Turk
Morgan Dutton, Amazon Web Services (AWS)
Olga Masyutina, Contributor on Toloka
Michael Meehan, Contributor on Amazon Mechanical Turk
Sam Zhong, Microsoft


08:00 – 08:15

Introduction & Icebreakers

08:15 – 08:45

Data Excellence: Better Data for Better AI
— Lora Aroyo (invited talk)

08:45 – 09:05

A Gamified Crowdsourcing Framework for Data-Driven Co-creation of Policy Making and Social Foresight
— Andrea Tocchetti and Marco Brambilla (contributed talk)

09:05 – 09:25

Conversational Crowdsourcing
— Sihang Qiu, Ujwal Gadiraju, Alessandro Bozzon and Geert-Jan Houben (contributed talk)

09:25 – 09:35

Coffee break

09:35 – 10:05

Quality Control in Crowdsourcing
— Seid Muhie Yimam (invited talk)

10:05 – 10:25

What Can Crowd Computing Do for the Next Generation of AI Technology?
— Ujwal Gadiraju and Jie Yang (contributed talk)

10:25 – 10:45

Real-Time Crowdsourcing of Health Data in a Low-Income country: A Case Study of Human Data Supply on Malaria First-Line Treatment Policy Tracking in Nigeria
— Olubayo Adekanmbi, Wuraola Fisayo Oyewusi and Ezekiel Ogundepo (contributed talk)

10:45 – 11:00

Coffee break

11:00 – 12:30

Panel discussion
"Successes and failures in crowdsourcing: experiences from work providers, performers and platforms"

12:30 – 13:00

Lunch break

13:00 – 13:30

Modeling and Aggregation of Complex Annotations via Annotation Distance
— Matt Lease (invited talk)

13:30 – 13:50

Active Learning from Crowd in Item Screening
— Evgeny Krivosheev, Burcu Sayin, Alessandro Bozzon and Zoltán Szlávik (contributed talk)

13:50 – 14:10

Human Computation Requires and Enables a New Approach to Ethics
— Libuse Veprek, Patricia Seymour and Pietro Michelucci (contributed talk)

14:10 – 14:20

Coffee break

14:20 – 14:50

Bias in Human-in-the-Loop Artificial Intelligence
— Gianluca Demartini (invited talk)

14:50 – 15:10

VAIDA: An Educative Benchmark Creation Paradigm using Visual Analytics for Interactively Discouraging Artifacts
— Anjana Arunkumar, Swaroop Mishra, Bhavdeep Sachdeva, Chitta Baral and Chris Bryan (contributed talk)

15:10 – 15:40

Achieving Data Excellence
— Praveen Paritosh (invited talk)

15:40 – 16:00



Don't miss out

Be the first to hear about our workshops, 
tutorials, and webinars.