Toloka social at NeurIPS 2022. From old biases to new opportunities: Annotator empowerment and data excellence
In recent years, we observe a sharp increase in the popularity of different artificial intelligence (AI) systems impacting people's lives, from face recognition in the streets to text-to-image generation systems like DALL-E helping people produce visual content. These systems are trained on data and these data might be biased, resulting in the amplification of the biases. Regardless of the dataset size, it usually contains biases, both selection biases (due to the sampling from a skewed distribution) and annotator biases (e.g., personal biases, confusion of observations, etc).
Because of the latter, gold standard datasets are often difficult to obtain. How do we find ground-truth datasets backed by experts that can eliminate ambiguity and disagreement among annotators? Additionally, while a dataset may be free of any annotator subjectivities, it may still be biased due to the dataset's lack of sampling diversity. When machine learning (ML) algorithms are trained on such biased datasets, the AI product may act in ways that are irresponsible, offensive, or even life-threatening.
We'd like to welcome NeurIPS attendees to join our social and discuss how to build ML systems immune to biases, obtain representative datasets, and resolve disagreements among annotators in order to bring out high-quality AI products. Our goal is to bring together people from different backgrounds and hear various opinions on how imperfect AI systems impact people's lives and how the training data can be changed to improve the state of affairs.
Toloka Social is an in-person two-part roundtable event with free snacks and beverages.
Location:
Room 394
New Orleans Ernest N. Morial Convention Center
900 Convention Center Blvd New Orleans, LA 70130
Date and time:
Wednesday, November 30, 2022
6:00 PM - 8:00 PM CST (UTC-06)
What lessons can we learn from biased and unfair AI systems, and how can we prevent these mistakes in the future?
How can we empower digital workers as we strive to reach our AI potential and obtain large-scale multiculturally diverse and unbiased data?
Data labeling + Social sciences: the spillovers. We'll talk about some common issues for the two fields, such as data quality and objectivity, performer selection, motivation, and biases, and how researchers and practitioners can enrich each other while tackling them.
How can we apply approaches from other disciplines to address weaknesses in AI systems caused by imperfect data?
How can the need for human-labeled data be channeled to help online annotators earn extra income in regions of the world where job opportunities are few and far between?
How do we obtain representative datasets and establish ground truth without inter-annotator agreement?