Solutions

Datasets

Research

Resources

Company

Talk to us

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Learning to unlearn

Evgeniya Sukhodolskaya

October 20, 2023

Insights

A data-driven approach to machine unlearning for generative language models

In today’s tech landscape, you’d be hard pressed to find someone who hasn’t heard of machine learning. Over the last decade the research field has been so trendy that even those outside the industry are now familiar with terms such as Artificial Intelligence (AI), Neural Networks (NNs), and Machine Learning (ML).

However, when it comes to machine unlearning, it seems the legal industry has heard more about it than the tech community. The recent boom in Large Language Models (LLMs), which in the fast-paced world of IT feels like a decade even if it’s only been 1–2 years, has unearthed hundreds of unresolved ethical and legal issues related to AI development. Novelists are suing OpenAI for using their texts to train GPT models without consent. Twitter is abuzz with critical comments from artists who believe their works were used in violation of copyright laws. Complying with "the right to be forgotten" has become extremely challenging.

A data-driven approach to machine unlearning for generative language models

Much like AI alignment, machine unlearning appears to be an overlooked field, given the limited open-sourced solutions available. I believe that machine unlearning exploration should be encouraged and popularized, especially considering that the current laws and ethical norms surrounding AI usage are underdeveloped and severely lack mechanisms for data protection. In this article, I would like to suggest some practical improvements to one of the first applied unlearning techniques for generative language models.

What is machine unlearning?

The term "machine unlearning" or "machine forgetting" means exactly what it sounds like: it includes techniques designed to erase requested information from a machine learning model’s "knowledge storage". However, it’s far from intuitive when you need to consider actual methods to achieve this efficiently in terms of time, computational resources, and model performance on the "not unlearned" data. An obvious solution is to retrain models from scratch using the initial dataset while excluding the "forget set" — but this would be an extremely impractical approach to deep neural network unlearning.

"Machine Unlearning Framework" from Survey of Machine Unlearning

The core research findings in the field of machine unlearning are concisely compiled in "A Survey of Machine Unlearning". Another article that covers the basics with accessible explanations is "Machine unlearning: The duty of forgetting". While I personally recommend these resources, you can find a multitude of other quality research materials on the subject. Yet in terms of practical applications, there remains much to be done.

A promising initiative that might shift this field from theoretical exploration to practical application is the ongoing NeurIPS 2023 Machine Unlearning challenge. Here, participants compete to create an unlearning algorithm for the ResNet18 Convolutional Neural Network.

NeurIPS 2023 Machine Unlearning challenge on Kaggle

Machine unlearning for generative language models

Considering the widespread accessibility and promotion of generative language models to the vast majority of internet users, there’s a critical need for unlearning mechanisms. One of the first successful techniques was recently published as open source; you can find the details in “Who’s Harry Potter? Approximate Unlearning in LLMs” by Ronen Eldan and Mark Russinovich.

Image generated with StableDiffusion

The authors use a data augmentation approach for machine unlearning on the Llama 2 7b chat model released this summer by Meta. The chosen unlearning target, also known as the “forget set”, is the Harry Potter saga (ingenious, these muggles!), which is a perfect example of machine unlearning due to the possible violation of copyright law. They show that with just one GPU hour of fine-tuning, the resulting model is unable to recall most of the Harry Potter-related content, while its performance on common benchmarks remains almost unaffected.

Overview of the data augmentation approach

The main goal of the approach is to make Llama 2 7b forget the linkage between entities from a defined forget set ("Harry" <is friends with> "Hermione") by giving the model plausible generic alternatives ("Harry" <is friends with> "Sally"). To provide these alternatives as target labels in a fine-tuning dataset, idiosyncratic terms from the "domain to be forgotten" should be highly penalized during the generation of targets. Such penalization could be achieved by combining in equation (1) logits generated by a reinforced model on the original input — Harry Potter books — and by a baseline model on a generic translation of the original input.

Equation (1) from "Who’s Harry Potter? Approximate Unlearning in LLMs"

The reinforced model is Llama 2 7b fine-tuned additionally on Harry Potter novels. The baseline model is untuned Llama 2 7b. To shift the baseline model’s output distribution away from the Harry Potter theme, the authors replace idiosyncratic terms in the original input with generic ones so the model generates a next word based on a context unrelated to the Harry Potter saga. To automate such replacements, the authors introduce a dictionary of anchor terms — terms specific to “Harry Potter” — mapped onto generic translations. The dictionary is fully gathered by GPT-4.

'Anchor Terms': 'Generic translations' from "Who’s Harry Potter? Approximate Unlearning in LLMs"

The resulting fine-tuning dataset consists of tokenized blocks of text from Harry Potter books in a one-to-one mapping to target labels , which are tokens corresponding to the maximal entries of the v_generic from the equation (1).

A sample of the fine-tuning dataset from "Who’s Harry Potter? Approximate Unlearning in LLMs"

To summarize, the authors describe four steps in the unlearning process:

Machine Unlearning Algorithm from "Who’s Harry Potter? Approximate Unlearning in LLMs"

Leveraging the approach: key challenges

The results of the data augmentation approach are promising, encouraging further application in similar tasks. Yet, the authors left some room for improvement in several application stages.

Dependency on GPT-4’s existing knowledge: The algorithm to some extent depends on GPT-4’s prior understanding of the Harry Potter series to generate generic translations. While the model is expected to have extensive knowledge of the Harry Potter realm, a reassessment by fans of the series could provide invaluable insights.

Challenges with idiosyncratic terms: Penalizing all unique terms related to the series poses an issue. For instance, replacing every instance of 'Harry' with a common name like 'John' disrupts the model's grasp of natural language, leading to sentences like, "Harry went up to him and said, 'Hi, my name is John'." To address this, the authors employ the following strategy:

Excluding repeated instances of anchored terms from contributing to the loss function beyond their initial occurrence.
Lowering the likelihood of logits connected to translations of terms that have appeared before.

However, this strategy also affects the model’s general language comprehension. A plausible alternative useful for the fine-tuning dataset would be, for example, "Harry went up to him and said, 'Hi, my name is Harold'."

Evaluation techniques: The team utilized GPT-4 for an initial evaluation, comprising 300 Harry Potter prompt completions, and further analysis of completions. Nonetheless, they acknowledged its limitations in accuracy, opting for manual inspections of the results for more thorough verification in their final training. The authors have not provided insights on how to set up such a manual inspection.

Overcoming the challenges

A more effective way to address the key challenges would be a hybrid approach that combines human insight and Large Language Models (LLMs).

In order to harness the collective strengths of human intuition and large language models, I have designed three Toloka project interfaces that facilitate collaborative labeling using LLMs and the crowd. Each interface designed for human labeling is tailored to a challenge listed above.

Project 1. Dependency on GPT-4’s existing knowledge

Image by author

Use the Named Entity Recognition (NER) Project Template to correct GPT-4 NER choices for a dictionary of anchor terms. As input, provide the text and GPT-4’s selection of terms (you can ask the model to return positions in the text directly), and instruct the crowd to correct and complement the selected entities.

Project 2. Challenges with idiosyncratic terms

Image by author

With the help of a baseline model, check on linguistic correctness prompts with completions done by the baseline model on a generic translation of the original input. All examples where the baseline model is unsure of an answer (the probability of output tokens is below a certain threshold, chosen by you empirically) should be sent to a crowdsourcing project with the interface shown on the image. You can easily create it using Toloka’s template builder.

Project 3. Evaluation techniques

Image by author

Manual inspection of the evaluation done by GPT-4 can be designed like in the image above.This is simple to set up using the text classification template in Toloka.

Conclusion

At Toloka, we've implemented various data-driven approaches to both machine learning and unlearning. For instance, we secured Personal Identifiable Information (PII) in Large Language Models assisting the Big Code project — and we're ready to power your next AI product.

Whether you’re building your own LLM or need access to high quality data, we’re here to help. Reach out or learn more by visiting: LLM-powered zero-code platform for text classification or Data labeling for Generative AI and LLM

Together, we can advance the field of machine unlearning!

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Recent articles

View all articles

Detecting hidden harm in long contexts: How Toloka built AWS Bedrock's advanced safety dataset

Jul 14, 2025

Does Your Agent Work? AI Agent Benchmarks Explained

Jul 7, 2025

What is data governance for AI, and why does it matter?

Jul 4, 2025

Detecting hidden harm in long contexts: How Toloka built AWS Bedrock's advanced safety dataset

Jul 14, 2025

Does Your Agent Work? AI Agent Benchmarks Explained

Jul 7, 2025

What is data governance for AI, and why does it matter?

Jul 4, 2025

LLM evaluation framework: principles, practices, and tools

Jul 3, 2025

More about Toloka

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?