Conference

Annotator empowerment and data excellence

Toloka social at NeurIPS 2022. From old biases to new opportunities: Annotator empowerment and data excellence

Dec 1, 2022, 00:00 UTC

+3

Background

In recent years, we observe a sharp increase in the popularity of different artificial intelligence (AI) systems impacting people's lives, from face recognition in the streets to text-to-image generation systems like DALL-E helping people produce visual content. These systems are trained on data and these data might be biased, resulting in the amplification of the biases. Regardless of the dataset size, it usually contains biases, both selection biases (due to the sampling from a skewed distribution) and annotator biases (e.g., personal biases, confusion of observations, etc).

Because of the latter, gold standard datasets are often difficult to obtain. How do we find ground-truth datasets backed by experts that can eliminate ambiguity and disagreement among annotators? Additionally, while a dataset may be free of any annotator subjectivities, it may still be biased due to the dataset's lack of sampling diversity. When machine learning (ML) algorithms are trained on such biased datasets, the AI product may act in ways that are irresponsible, offensive, or even life-threatening.

We'd like to welcome NeurIPS attendees to join our social and discuss how to build ML systems immune to biases, obtain representative datasets, and resolve disagreements among annotators in order to bring out high-quality AI products. Our goal is to bring together people from different backgrounds and hear various opinions on how imperfect AI systems impact people's lives and how the training data can be changed to improve the state of affairs.

Event Details

Toloka Social is an in-person two-part roundtable event with free snacks and beverages.

There will be 6 tables with up to 15 participants at each one, including 1 moderator who will distribute discussion questions to their group. Each discussion will last 30 minutes, the participants will then change tables. There will be 3 such changes in this part of the event, allowing each participant to engage with different topics.
For the next 30 minutes, an open discussion will ensue, allowing the guests to address each other and the moderators with any thoughts or questions they may have related to the topics covered in the first part. This will facilitate a deeper exploration of the roundtable questions and ensure that all of the issues raised during the social are heard by everyone.

When & where

Location:
Room 394
New Orleans Ernest N. Morial Convention Center
900 Convention Center Blvd New Orleans, LA 70130

Date and time:
Wednesday, November 30, 2022
6:00 PM - 8:00 PM CST (UTC-06)

Discussion topics

I

What lessons can we learn from biased and unfair AI systems, and how can we prevent these mistakes in the future?

II

How can we empower digital workers as we strive to reach our AI potential and obtain large-scale multiculturally diverse and unbiased data?

III

Data labeling + Social sciences: the spillovers. We'll talk about some common issues for the two fields, such as data quality and objectivity, performer selection, motivation, and biases, and how researchers and practitioners can enrich each other while tackling them.

IV

How can we apply approaches from other disciplines to address weaknesses in AI systems caused by imperfect data?

V

How can the need for human-labeled data be channeled to help online annotators earn extra income in regions of the world where job opportunities are few and far between?