Solutions

Datasets

Research

Resources

Company

Talk to us

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

6 Papers Not to Miss from NeurIPS

Toloka Team

March 31, 2021

News

Neural Information Processing Systems is an annual conference that featured more than 2,000 papers when it was last held in December 2020. Alexey Drutsa and Alexey Volobuev from the Toloka team have chosen six papers from the conference as essential reading in crowd science. They discussed their selection and shared their perspective on trends in crowdsourcing research. In this post we'll focus on some general crowdsourcing aspects — for the full details, see the video of the seminar.

Here are the top NeurIPS publications most relevant to crowd science:

Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

Andres Masegosa, Stephan Lorenzen, Christian Igel, Yevgeny Seldin

Overview

It may come as a surprise that majority vote is a fundamental technique used in robust state-of-the-art machine learning systems. In fact, the winning strategy at most machine learning competitions employs weighted majority vote as a way to combine different prediction models. Majority vote performs so well because it removes the effect of errors in predictions by averaging out the noise in individual predictions. When individual classifiers are of different quality, it is important to assign different weights to them when aggregating their predictions. This paper focuses on the problem of weight selection and develops a practical technique that does not require individual classifiers to be independent.

Weighted Majority Vote

Key takeaway

Majority vote (MV) is a popular algorithm that is often used in crowdsourcing for aggregating results. However, it has a weakness: all responses are weighted equally when determining the majority, which doesn't reflect the huge variation in quality of performers. Poor performers may skew results. If we could give more weight to responses from highly-skilled performers, we could be more confident in the aggregated result. This paper offers a strategy for successfully selecting the weights to use in order to achieve better results.

Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion

Qianqian Ma, Alex Olshevsky

Overview

The famous Dawid-Skene model is often used in crowdsourcing, where the key aspect is performer skill. Many different algorithms have been developed for estimating performer skill levels and using these predictions when aggregating responses. However, the model's assumptions are not always true in practice (in other words, not all individual performers demonstrate the predicted skill in real tasks), which means that standard methods are often unreliable.

Dawid-Skene Model

This paper suggests a new method involving matrix completion in order to discover performer quality level from incomplete data. The authors believe their method is robust when performers deviate from the model, which means it can effectively improve the quality of skill predictions, and ultimately, aggregated results.

Key takeaway

The proposed algorithm could be a successful method for aggregating results in scenarios with dishonest or low-quality performers. This method could be applied to crowdsourcing projects and used for creating new crowd science solutions.

Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention

Ekta Sood, Simon Tannert, Philipp Mueller, Andreas Bulling

Overview

Modern computer vision models often use supervision with human gaze data to understand what parts of an image are most important and then use this information to improve the quality. Could models related to text processing also benefit from gathering alternative types of data? Unfortunately, there isn't much gaze data available for texts. The existing corpora usually contain small amounts of data related to which parts of a text receive more attention from a human and which receive less. Crowdsourcing is key to collecting new types of data to accelerate natural language processing.

The authors propose the following solutions:

A hybrid text saliency model where they combined human gaze data with a cognitive model known as EZ Reader.
Joint training of parameters for the entire network updated for a target task.

Supervision Using Human Attention

Key takeaway

Natural language processing is an important field in machine learning, yet it still experiences a lack of human-labeled data. Models could be significantly improved by providing them with large quantities of data and using new types of data for training. In addition, future research could involve human-in-the-loop technology for training parameters.

Learning to Summarize with Human Feedback

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul F. Christiano

Overview

This paper looks at the challenges of summarization, which is a natural language processing (NLP) task that takes a long and complicated text and extracts a short summary. Large transformer networks like BERT and GPT-3 achieve good results, but optimization is limited because standard text quality metrics like ROUGE don’t accurately reflect the quality of a summary and often lead to various artifacts.

The solution is to collect human feedback: crowd performers choose which summaries are better, and this information is used to train a reward model and a policy that will generate high-quality summaries.

Human feedback techniques

Key takeaway

Human feedback techniques can help models improve performance on practical NLP tasks. Crowdsourcing plays a crucial role in human-in-the-loop pipelines.

Disentangling Human Error from the Ground Truth in Segmentation of Medical Images

Le Zhang, Ryutaro Tanno, Mou-Cheng Xu, Chen Jin, Joseph Jacob, Olga Cicarrelli, Frederik Barkhof, Daniel Alexander

Overview

Tasks that require selecting areas in complex images, like medical scans, often show a great deal of variation in answers. It can be difficult to establish the ground truth because even professional experts are inconsistent, and this problem creates obstacles for training AI. In this paper the authors propose a solution for using human-provided answers to train automatic segmentation algorithms, without getting derailed by flawed data.

Proposed solution

Key takeaway

Jointly training a model and learning how to aggregate data looks very promising — doing this with a creative probabilistic model could achieve state-of-the-art results in the medical segmentation field. Extending this method to other tasks may be a promising area of research, resulting in new cost-effective ways to train models while using noisy data.

A Topological Filter for Learning with Label Noise

Pengxiang Wu, Songzhu Zheng, Mayank Goswami, Dimitris Metaxas, Chao Chen

Overview

If you send your data to the crowd for labeling, how will you know whether the resulting labeled dataset contains any errors? What if you could detect errors and fix them or remove them from the dataset before training? This paper offers anew way to do this with a robust method for filtering noise and collecting clean data.

Clean and corrupted data

Key takeaway

Deep learning models are good at finding noisy labels because they interfere with learning. Previous research mostly focused on exploiting uncertain labels. This method leverages the group behavior of data in the latent representation, which has been neglected by previous classifier-confidence-dependent approaches.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Recent articles

View all articles

Creating domain-ready datasets: How Toloka's hybrid approach generates realistic and high-quality data

Aug 4, 2025

Image annotation tools: how to label data that actually teaches AI

Jul 30, 2025

Agentic AI & the Future of Coding

Jul 29, 2025

Creating domain-ready datasets: How Toloka's hybrid approach generates realistic and high-quality data

Aug 4, 2025

Image annotation tools: how to label data that actually teaches AI

Jul 30, 2025

Agentic AI & the Future of Coding

Jul 29, 2025

How to measure AI performance and ensure your AI investment pays off

Jul 28, 2025

More about Toloka

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?