Solutions

Datasets

Research

Resources

Company

Talk to us

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Retrieval-augmented fine-Tuning: extending retrieval-augmented generation through fine-tuning

Toloka Team

November 8, 2024

Essential ML Guide

In recent years, advancements in natural language processing (NLP) have opened doors to techniques like Retrieval-Augmented Generation (RAG) and its powerful successor, Retrieval-Augmented Fine-Tuning (RAFT). RAG has made many language models more dynamic by adding a retrieval system to pull in relevant external information. RAFT takes this a step further by combining retrieval with fine-tuning. This combination creates smarter, more adaptable models that can learn new information over time. Let's explore how RAFT builds on RAG, why it's useful, and how it can improve models for different applications.

What is retrieval-augmented generation (RAG)?

To understand RAFT, it's helpful to examine RAG and the challenges it was designed to address. Traditional large language models like GPT or BERT are trained on massive amounts of data but rely solely on that static, built-in knowledge. This means they don’t get new information because their knowledge is frozen during training.

Retrieval-augmented generation, or RAG, addresses this by adding a retrieval component. In RAG, an information retrieval mechanism is directly integrated into the generation process. When given input, the model searches external sources, like a website or relevant documents, for necessary information.

The language model generates its response once it has pulled in the relevant information. This means it's not just relying on its built-in knowledge, it's using up-to-date, targeted information to make its response as accurate as possible. This two-part process, which consists of retrieving and generating, makes RAG different from traditional models that rely only on their internal frozen knowledge.

What is retrieval-augmented fine-tuning (RAFT)?

Once a model is pre-trained, it can be further updated with new information that might not have been available during its initial training. For instance, if there's new scientific research or domain-specific knowledge, this can be added to the model to improve its responses. Two common methods to add this new information are:

RAG-based prompting. As mentioned, with RAG, a model can retrieve fresh, external information from sources like databases or websites when answering questions. This helps it stay up-to-date without having to relearn everything;
Fine-tuning is also referred to as supervised fine-tuning. In this process, the model is trained further on specific new data, learning and storing this new information in its memory.

The best way to add new knowledge to these models is still being researched. The paper RAFT: Adapting Language Model to Domain Specific RAG introduces Retrieval-Augmented Fine-Tuning to help language models handle new information more effectively.

RAFT extends RAG by integrating the retrieval process into the model's supervised fine-tuning (SFT). This means the model doesn’t just get new information; it actually learns from it, updating its understanding as it trains. The RAFT approach allows the model to adapt and modify its internal knowledge as it retrieves more information, making it more flexible and knowledgeable over time.

The essence of RAFT

The authors of the RAFT: Adapting Language Model to Domain Specific RAG paper explain the RAFT approach by providing an interesting analogy. You may think of a language model as a student taking an exam. When fine-tuning LLMs on a set of training data without access to external resources, it's like taking a closed-book exam because it can only rely on what it has learned and remembered. Using RAG is like taking an open-book exam, as the model can look up information in real time from external documents to help answer questions.

The RAFT approach combines both: imagine a student who has thoroughly studied the course material and can use it as a reference during an open-book exam. This student is more likely to perform better because they’ve memorized key information and can consult the books to confirm details.

How RAFT training works

Creating the training dataset

RAFT starts with an input document, which could be any text containing knowledge or information, such as an article or a chapter from a book. From this document, RAFT then creates a training dataset. Each entry in this dataset includes three key elements:

A question. This is a question that could be answered using information from the input document;
A set of documents. These are several documents pulled together. Some of them have helpful information for answering the question, and some don’t;
A chain-of-thought answer. This is an answer to the question, written as a series of logical steps that connect information from the documents to form a complete answer.

Two types of documents in the dataset

Not all documents are equally helpful for answering the question. The authors of RAFT divide the documents into two types:

Golden documents. These are documents that contain the information needed to answer the question accurately;
Distractor documents. These extra documents don’t help answer the question, acting as distractions.

Splitting the training data

To make the model more balanced in its learning, the RAFT splits the training data into two parts:

First Part. This part includes both the valid or golden documents and the irrelevant or distractor documents. It teaches the model to focus on useful information while ignoring distractions;
Second Part. This part only includes distractor documents that don’t contain the answer. This part of the training encourages the model to rely more on what it has memorized rather than just depending on the documents in front of it.

This split is significant because it helps the model improve in two ways:

Memorization. When the model is trained on questions with only distractor documents, it learns to rely on its memory for essential details, just like a student who memorizes key information;
Selective Reasoning. With a mix of valid and distractor documents, the model learns to identify and use relevant information while ignoring what doesn’t matter.

Training the model

The model is then fine-tuned on this training dataset using supervised training, where the model learns from examples of questions and answers. After training, this fine-tuned model can effectively operate in a so-called open-book environment with access to a set of documents. It has been trained to pick out relevant information from these documents and cross-check it with what it already knows, like a well-prepared student consulting their textbook during an exam.

The model becomes proficient at selectively using information from the documents so it doesn’t get confused by irrelevant content and can deliver accurate, high-quality responses.

Why RAFT is important for LLMs

Traditional LLMs rely either on their pre-trained knowledge or external documents, but RAFT enables models to combine both approaches. It fine-tunes the model to draw from what it already knows while retrieving new information only when relevant.

This makes a big difference in fields where new information is constantly updated, like medical research or news. RAFT helps the model stay current and more accurate without needing to be retrained entirely every time there’s an update while still relying on its foundational understanding.

Benefits of RAFT

Improves domain expertise

RAFT enables models to handle specialized topics like medicine, legal, and technical fields with greater depth and accuracy. Domain-specific fine-tuning on relevant, in-domain documents helps RAFT to adjust the model's ability to answer complex questions that require field-specific knowledge.

Combines memory and real-time retrieval

With RAFT, models can draw on what they’ve already learned and combine it with real-time information when needed. This lets the model answer more accurately and stay up-to-date without constant retraining.

Good Performance on benchmarks

The RAFT method has been tested on a variety of datasets, like PubMed (for medical knowledge) and HotpotQA (for complex question answering), consistently improving the performance of fine-tuned models. In these cases, RAFT shows that it can help models answer domain-specific questions more accurately than traditional methods.

Conclusion

RAFT method basically combines RAG and fine-tuning. In RAFT, the model is fine-tuned on examples where it learns to answer questions with the help of retrieved documents from a database, similar to RAG. However, RAFT doesn't just retrieve information, it also trains the model to distinguish between relevant and irrelevant documents in the retrieval process. This fine-tuning stage helps the model to focus only on documents that help answer the question, filtering out what's not helpful.

While RAG allows a model to search for outside information, RAFT goes further by training the model to focus only on the most useful information and ignore irrelevant details. In a broader sense, RAFT is a step toward AI that’s not just smart but actually engaged in finding the best answers. An AI that not only knows things but also knows where to look and how to process information to give the best response possible. This may open up exciting possibilities for AI that will be more responsive, relevant, and ready to deal with a broader range of challenges.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Recent articles

View all articles

Detecting hidden harm in long contexts: How Toloka built AWS Bedrock's advanced safety dataset

Jul 14, 2025

Does Your Agent Work? AI Agent Benchmarks Explained

Jul 7, 2025

What is data governance for AI, and why does it matter?

Jul 4, 2025

Detecting hidden harm in long contexts: How Toloka built AWS Bedrock's advanced safety dataset

Jul 14, 2025

Does Your Agent Work? AI Agent Benchmarks Explained

Jul 7, 2025

What is data governance for AI, and why does it matter?

Jul 4, 2025

LLM evaluation framework: principles, practices, and tools

Jul 3, 2025

More about Toloka

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?