Toloka at NeurIPS 2022: discussions and insights

Toloka Team
by Toloka Team
Image

Subscribe to Toloka News

Subscribe to Toloka News

The Toloka team recently organized a social at NeurIPS 2022, a premier conference on machine learning held in New Orleans, Louisiana. Socials at NeurIPS are informal events with a panel of moderators who facilitate the discussion and make sure every participant is engaged.

During the social our researchers joined other experts to discuss the following topic: “From old biases to new opportunities: Annotator empowerment and data excellence". Let's take a closer look at the key takeaways from Toloka's perspective.

Helping annotators earn extra income

Matteo Tafuro, a teaching assistant at the University of Amsterdam, facilitated a discussion on the topic of using the demand for human-labeled data to help online annotators earn extra income in regions with limited employment opportunities.

The discussion highlighted the crucial role online annotators play in the development of ML systems. However, when annotators are demotivated by poor working conditions or infrastructure problems, they can't produce consistently high-quality data to support ML models. We need greater recognition and support for the work of annotators to remove existing barriers to improving quality.

Key recommendations for improving the experience of annotators and the quality of their work:

  • Improve annotator training
  • Seek ways to improve working conditions and infrastructure
  • Apply more effective quality control measures
  • Offer incentives or bonuses to encourage high-quality work
  • Create annotator cooperatives to resolve issues through collective action

Participants were hopeful that through a combination of improved working conditions, better training, and collective action, online annotators can be empowered to earn a fair and sustainable income in even the most challenging economic environments.

Joining forces with social scientists

Similar issues arise in the social sciences, where data obtained from online respondents is often of inconsistent quality, and requires monitoring and incentivizing. Social scientists study how to evaluate respondent quality and how to motivate people to provide more quality data.

Elena Brandt, CEO of Besample, shared her insights as a social scientist:

"Social sciences have the same problem as annotation does: some people provide poor data. And just like it is for data labeling, the problem is that we have no idea which data points will be poor and must be discarded.

The difference, though, is how the problem is being assessed and addressed. In labeling, poor quality usually means failing a set of control tasks. In social sciences, poor data usually means some inner quality of the respondent, such as showing signs of not reading the instructions, lying, or lacking internal consistency in responding.

The solution in both worlds is often to block fraudsters. But in social science, there is a growing understanding that performance must be viewed with the history of prior behavior in mind. This helps make more valid assessments and accurate predictions of how each person will perform in the future, as well as consistently reward them for high performance. I think the two fields should join forces in tackling the problem of data quality and annotator motivation."

Dealing with problems caused by inaccurate training data

Another hot topic brought up by Peter Kocsis, a PhD candidate at the Technical University of Munich, was inaccurate data and the problems it can cause in ML and data labeling processes.

In medical applications, for instance, where accurate predictions are crucial but no large labeled datasets are available, you can see how a lack of data can have serious consequences.

The main problem here is that models and data acquisition do not accurately represent the data generation process. Models treat the data as ground truth, but in reality, data collection processes can introduce errors.

There are two possible ways to deal with problems in models: fix the data, or make the model more robust. If we choose to focus on the model, there is no way to discover whether model issues are caused by the approach or the data. Further model development alone will not solve the problem. The best option is to continue working with the data to improve it and consider a workflow that includes continuous validation of both training data and model predictions.

Avoiding bias in AI

Of all the issues that affect AI performance, bias is one of the most widely discussed. This includes gender bias, skin color bias, or even poor distinction between animals that look alike, such as huskies and wolves. Many models, like Amazon's Alexa and Apple's Siri, have demonstrated at least one of these biases.

Max Kunakov from Toloka moderated a discussion exploring how we can learn from biased and unfair AI systems and prevent these mistakes in the future. Here are some key recommendations.

First, foster diversity within the development team. Teams that regularly communicate with people from different backgrounds during development and testing are more likely to spot and avoid issues early on.

Next, balance product development between two extreme types of models: a general model that works well in most cases but is prone to bias and unfairness, and a model for each class of users/tasks that provides a less biased answer but creates challenging "bubbles" around those users/tasks.

And lastly, introduce proactive approaches like red teams to mitigate bias (a red team is a group that plays the role of an “enemy” to provide security feedback). There is great potential for companies to start adding new roles or teams that will proactively identify scenarios and cases where the ML model behaves biased or unfairly.

However, the goal is not 100% immunity to bias for every AI system. It is nearly impossible to avoid bias in fields such as healthcare where it is extremely difficult to predict what the side effects of new drugs will be for everyone. Since a drug that is effective for 99.9% of patients may have severe side effects for the remaining 0.1%, clinical trials are still required.

Where we go from here

It's always exciting to exchange ideas for solving common issues in ML and data labeling. At Toloka, we are focusing on ways to improve annotator training and data quality as a means to empower annotators. Our diverse global crowd also helps prevent bias from creeping into data. We look forward to digging deeper and working on new and better solutions with the ML/AI community.

Article written by:
Toloka Team
Toloka Team
Updated: 

Recent articles

Have a data labeling project?

Take advantage of Toloka technologies. Chat with our expert to learn how to get reliable training data for machine learning at any scale.
Fractal