Products

Resources

Impact on AI

Company

Test your LLM's math skills with our benchmark for complex problems and step-by-step reasoning

Test your LLM's math skills with our benchmark for complex problems and step-by-step reasoning

Test your LLM's math skills with our benchmark for complex problems and step-by-step reasoning

Nov 26, 2024

Nov 26, 2024

Insights

Insights

Can LLMs eliminate toxicity in human and AI-generated content? What multilingual research shows


Guest post by Daryna Dementieva, Postdoc, Technical University of Munich, School of Computation, Information and Technology, TextDetox competition Lead Organizer.

New research: 2024 TextDetox competition

Challenge: Detect and correct toxic phrases in 9 languages
Results: 17 submitted solutions succeeded only in languages with large corpora available
Conclusion: Text detoxification models require extensive training data and human evaluation

This article looks at how researchers were able to collect parallel corpora and evaluate the solutions submitted in the TextDetox competition with the help of Toloka.

Why do we need Text Detoxification?

If you’ve spent any time online, you’ve been exposed to offensive comments at one point or another.

Online toxicity—whether generated by AI or humans—remains a significant challenge and poses significant safety concerns for users. Chatbots trained on open data, such as user comments, can unintentionally generate offensive responses. This not only frustrates users but can also harm a company’s reputation. Although large language models (LLMs) are trained on vast datasets and may even receive reinforcement against generating harmful responses, they occasionally reproduce toxic language due to the biases embedded in their data. These can emerge during interactions and lead to unintended offensive replies.

When users post aggressive or inappropriate messages on social networks, the toxic content is usually flagged and deleted.  However, a more proactive strategy involves offering users a neutral version of their statements — a chance to try again. This is particularly helpful for "parent mode" settings to guide children to communicate respectfully.

Examples of chatbot toxicity (left) and human toxicity (right)

In light of widespread issues with toxicity, detoxifying text has become a hot topic in natural language processing (NLP) research, particularly for languages other than English. 

Text style transfer (TST): neutralizing toxic language with LLMs 

One promising solution is automatic text detoxification, a technique that uses text style transfer (TST) to rephrase offensive language into a less toxic or neutral tone, rendering it harmless. For example:

The current LLM out-of-the-box cannot perform this type of task well without being exposed to parallel corpora of toxic and non-toxic phrases. Below, you can see ChatGPT detoxifying phrases, but its efforts result in overly polite, unnatural responses.

To find the best approaches to text detoxification with TST, an international group of researchers launched the TextDetox Shared Task, a competition that challenged participants to test current LLMs in text detoxification. To make the task multilingual, researchers required testing for 9 languages from different parts of the world: English, Chinese, Arabic, Hindi, Ukrainian, Spanish, German, Russian, and Amharic.

Setting up the competition involved collecting parallel data and evaluating model output in all 9 languages — a daunting task. The research team successfully hurdled this obstacle by crowdsourcing the data on the Toloka platform. Let’s look at how we set up an efficient data collection project.

Gathering data for parallel corpora

To help with training data for the models, we initially collected corpora with non-toxic paraphrases for over 10,000 English toxic sentences

The pipeline was designed on the Toloka platform with three stages: paraphrasing texts to eliminate toxicity; checking paraphrased texts to make sure the meaning is close enough to the original; checking the final versions to verify that all offensive language was removed.

Here are a couple of examples from the collected datasets:

The challenge of limited language data

We used the parallel corpora for the TextDetox competition, as an additional source to train LLMs to perform better. However, data is scarce for some languages covered in the challenge, and parallel corpora such as the one gathered with the pipelines above is very limited.

The biggest challenge of the competition was to perform unsupervised cross-lingual detoxification. We wanted to find models that perform detoxification well across multiple languages, even for those with limited data, like Amharic.

Evaluating the contest submissions

We had 17 different detoxification systems submitted by participants in our detoxification competition. The next step was to evaluate each system on all 9 languages: English, Spanish, German, Chinese, Arabic, Hindi, Ukrainian, Russian, and Amharic. 

We used the Toloka platform to build a human evaluation pipeline, where we checked three parameters. 

(i) If the new paraphrase is less toxic: 

(ii) If the meaning is still the same: 

(iii) If the text is fluent:

To maintain high-quality annotations, we made sure that only native speakers were involved in evaluating texts. Each evaluator completed training and passed an exam to learn how to do each type of task correctly.

Can LLMs solve text detoxification?

The final results of the competition showed that fine-tuned for the detoxification task LLMs can easily reach human performance levels for resource-rich languages like English, Spanish, German, Russian, and Arabic. For these languages, large amounts of hate speech and toxic speech detection data are readily available to use as training data.

Best-performing models across resource-rich languages in comparison to human references

For low-resource languages like Amharic and Ukrainian, text detoxification is still a challenging task. Even the best automated solutions could not outperform human references.

Best-performing models across resource-scarce languages in comparison to human references
We think there are probably several explanations for these results. For example, for Amharic, encoders still struggle to embed texts properly. For Ukrainian, there was no existing toxicity classification corpus previously available. We collected the first dataset of this kind for Ukrainian and released it for public use.

What we learned 

As we discovered, none of the competing models were successful in every language. Some solutions achieved high results for some languages but struggled in others. Unfortunately, multilingual and cross-lingual solutions for text detoxification are not yet a reality — success in this field will require the involvement of more experts and native speakers.

This challenge revealed that just like any LLMs, models designed for text detoxification are highly dependent on high-quality training data. Toloka was a key resource for this research, enabling access to domain experts and annotators across any languages with scalable pipelines to support seamless model training. 

To learn more about the project, read our research paper and visit the HuggingFace space. We are currently preparing next year's edition of the competition, so stay tuned.


Read our latest case study about creating a vast multilingual and multi-domain dataset that helped our client’s model outperform leading LLMs.
As we continue our commitment to Responsible AI, projects focused on ethical LLM usage and training are especially meaningful to us. Connect with the Toloka team to collect multilingual training data or assess the ethical standards of your model.

Article written by:

Updated:

Nov 26, 2024

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe

Subscribe
to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe

More about Toloka

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?