6 Papers Not to Miss from NeurIPS

by Toloka Team on Mar 31st, 2021
social media monitoring

Neural Information Processing Systems is an annual conference that featured more than 2,000 papers when it was last held in December 2020. Alexey Drutsa and Alexey Volobuev from the Toloka team have chosen six papers from the conference as essential reading in crowd science. They discussed their selection and shared their perspective on trends in crowdsourcing research. In this post we’ll focus on some general crowdsourcing aspects — for the full details, see the video of the seminar. 

Here are the top NeurIPS publications most relevant to crowd science: 

Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

Andres Masegosa, Stephan Lorenzen, Christian Igel, Yevgeny Seldin


It may come as a surprise that majority vote is a fundamental technique used in robust state-of-the-art machine learning systems. In fact, the winning strategy at most machine learning competitions employs weighted majority vote as a way to combine different prediction models. Majority vote performs so well because it removes the effect of errors in predictions by averaging out the noise in individual predictions. When individual classifiers are of different quality, it is important to assign different weights to them when aggregating their predictions. This paper focuses on the problem of weight selection and develops a practical technique that does not require individual classifiers to be independent.

social media mentions

Key takeaway

Majority vote (MV) is a popular algorithm that is often used in crowdsourcing for aggregating results. However, it has a weakness: all responses are weighted equally when determining the majority, which doesn’t reflect the huge variation in quality of performers. Poor performers may skew results. If we could give more weight to responses from highly-skilled performers, we could be more confident in the aggregated result. This paper offers a strategy for successfully selecting the weights to use in order to achieve better results.

Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion

Qianqian Ma, Alex Olshevsky


The famous Dawid-Skene model is often used in crowdsourcing, where the key aspect is performer skill. Many different algorithms have been developed for estimating performer skill levels and using these predictions when aggregating responses. However, the model’s assumptions are not always true in practice (in other words, not all individual performers demonstrate the predicted skill in real tasks), which means that standard methods are often unreliable.

content relevance

This paper suggests a new method involving matrix completion in order to discover performer quality level from incomplete data. The authors believe their method is robust when performers deviate from the model, which means it can effectively improve the quality of skill predictions, and ultimately, aggregated results.

Key takeaway

The proposed algorithm could be a successful method for aggregating results in scenarios with dishonest or low-quality performers. This method could be applied to crowdsourcing projects and used for creating new crowd science solutions.

Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention

Ekta Sood, Simon Tannert, Philipp Mueller, Andreas Bulling


Modern computer vision models often use supervision with human gaze data to understand what parts of an image are most important and then use this information to improve the quality. Could models related to text processing also benefit from gathering alternative types of data? Unfortunately, there isn't much gaze data available for texts. The existing corpora usually contain small amounts of data related to which parts of a text receive more attention from a human and which receive less. Crowdsourcing is key to collecting new types of data to accelerate natural language processing. 

The authors propose the following solutions: 

  • A hybrid text saliency model where they combined human gaze data with a cognitive model known as EZ Reader. 
  • Joint training of parameters for the entire network updated for a target task. 

Key takeaway

Natural language processing is an important field in machine learning, yet it still experiences a lack of human-labeled data. Models could be significantly improved by providing them with large quantities of data and using new types of data for training. In addition, future research could involve human-in-the-loop technology for training parameters.

Learning to Summarize with Human Feedback

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul F. Christiano


This paper looks at the challenges of summarization, which is a natural language processing (NLP) task that takes a long and complicated text and extracts a short summary. Large transformer networks like BERT and GPT-3 achieve good results, but optimization is limited because standard text quality metrics like ROUGE don’t accurately reflect the quality of a summary and often lead to various artifacts. 

The solution is to collect human feedback: crowd performers choose which summaries are better, and this information is used to train a reward model and a policy that will generate high-quality summaries. 

Key takeaway

Human feedback techniques can help models improve performance on practical NLP tasks. Crowdsourcing plays a crucial role in human-in-the-loop pipelines.

Disentangling Human Error from the Ground Truth in Segmentation of Medical Images

Le Zhang, Ryutaro Tanno, Mou-Cheng Xu, Chen Jin, Joseph Jacob, Olga Cicarrelli, Frederik Barkhof, Daniel Alexander


Tasks that require selecting areas in complex images, like medical scans, often show a great deal of variation in answers. It can be difficult to establish the ground truth because even professional experts are inconsistent, and this problem creates obstacles for training AI. In this paper the authors propose a solution for using human-provided answers to train automatic segmentation algorithms, without getting derailed by flawed data.

Key takeaway

Jointly training a model and learning how to aggregate data looks very promising — doing this with a creative probabilistic model could achieve state-of-the-art results in the medical segmentation field. Extending this method to other tasks may be a promising area of research, resulting in new cost-effective ways to train models while using noisy data. 

A Topological Filter for Learning with Label Noise

Pengxiang Wu, Songzhu Zheng, Mayank Goswami, Dimitris Metaxas, Chao Chen


If you send your data to the crowd for labeling, how will you know whether the resulting labeled dataset contains any errors? What if you could detect errors and fix them or remove them from the dataset before training? This paper offers anew way to do this with a robust method for filtering noise and collecting clean data.

Key takeaway

Deep learning models are good at finding noisy labels because they interfere with learning. Previous research mostly focused on exploiting uncertain labels. This method leverages the group behavior of data in the latent representation, which has been neglected by previous classifier-confidence-dependent approaches.

Toloka News

Receive information about platform updates, training materials, and other news.
Wed Apr 28 2021 16:49:10 GMT+0300 (Moscow Standard Time)