Products

LLMs

Solutions

Resources

Impact on AI

Company

Get comprehensive guide for superior RLHF. Train safer, more accurate models with expert data.

Get comprehensive guide for superior RLHF. Train safer, more accurate models with expert data.

Get comprehensive guide for superior RLHF. Train safer, more accurate models with expert data.

Toloka Team

Jun 12, 2024

Jun 12, 2024

Essential ML Guide

Essential ML Guide

Transfer Learning: Harnessing the Power of Pre-Trained Models for Business Success

Transfer learning is a powerful machine learning (ML) methodology that leverages pre-trained models for similar tasks. This approach significantly reduces the time and computational resources required to train models for specific projects. In this article, we will explore the mechanics, applicability, and challenges of transfer learning. We will examine the foundations of transfer learning and review practical cases to determine when its adaptation is reasonable.

Transfer learning (TL) has gained popularity in deep learning projects because it enables training deep neural networks with relatively small amounts of data. In data science, this is particularly helpful since real-world problems often lack millions of labeled data points for training complicated models from scratch. Access to pre-trained models fine-tuned for specific tasks makes deep learning more accessible and efficient.

TL is primarily used in computer vision and natural language processing tasks, addressing the substantial computational demands of these fields. As a methodology, transfer learning can be successfully combined with active learning to enhance model performance.

Transfer Learning vs. Training a Specific Machine Learning Model

Transfer learning refers to reusing a previously trained model to solve a different problem. This approach allows a model to leverage the knowledge and insights it has acquired from one task to improve its predictions on a different yet related task.

For instance, a convolutional neural network (CNN) trained for general object classification on a commonly available dataset can be fine-tuned to analyze X-ray images and detect particular diseases.

Transfer Learning vs. Training a Specific Machine Learning Model 1

Founding principles of classic Machine Learning vs. Transfer Learning. Source: Modern Approaches in NLP


Instead of starting from scratch, the new model builds upon the pre-existing model’s parameters, significantly speeding up the training process and enhancing performance, especially in scenarios with limited data.

TL principles can be applied to a variety of machine learning models. However, it is commonly associated with deep learning and neural networks due to their ability to learn and transfer complex representations.

Transfer Learning vs. Training a Specific Machine Learning Model 2

Comparison between traditional machine learning models (a) requiring manual feature extraction and modern deep learning structures (b). Source: Energies


Transfer Learning Mechanics

Initially, transfer learning was meant to address the limitations of traditional machine learning models. First, it is computationally efficient and achieves better results with smaller datasets. Features from a pre-trained model, though may not be directly applicable to tackle specific tasks and projects, can be fine-tuned for similar goals even in a different domain.

Transfer Learning Mechanics 1

Applying transfer learning on ImageNet—a dataset of more than 14 million pictures distributed over 1000 classes—for medical image analysis. Source: Sensors 2023, 23(2)


In the early layers, deep neural networks trained on images usually learn low-level features like determining edges, colors, shapes, and intensity variations. These features are not task-specific, as they appear in different image processing tasks, whether detecting a traffic light or a can of soda.

In the field of NLP, transfer learning has been instrumental in improving the performance of various text-related tasks, including sentiment analysis. The latter involves determining the emotional tone behind text bodies, such as customer reviews, social media posts, or product feedback. Traditional NLP models require large, accurately labeled datasets specific to one task, which may be time-consuming and costly to create.

Transfer Learning Mechanics 2

Overview of transfer learning benefits. The training starts at a higher point as the source model has the initial skill, then demonstrates a higher skill improvement rate and can finally get a better-converged skill. Source: Handbook of Research on Machine Learning Applications and Trend


Here’s where pre-trained language models come into play, forming the foundation of transfer learning in NLP. Initially, these models are trained on vast text corpora that may contain billions of words. This allows them to grasp the intricacies of language, including semantics and syntax, and capture nuances, context, and linguistic patterns that traditional ML models may struggle with.

Transfer Learning Key Stages

Leveraging a pre-trained model to address new tasks includes several primary stages. Each is critical for the ultimate efficiency of the transfer learning process.

1. Selecting a Pre-Trained Model

The first stage is to choose a model already trained on a large and diverse dataset. Commonly used pre-trained models span various domains, such as ResNet for image classification, BERT for natural language processing, and OpenAI's GPT for generative tasks. These models have learned general features and patterns from extensive training data, providing a solid foundation for new tasks.

2. Base Model Utilization

The base or source model consists of layers that have learned hierarchical feature representations from the training data. These layers serve as the foundation for further task-specific learning, offering a robust starting point. For instance, convolutional layers in a CNN trained on ImageNet can capture textures useful for various image-related tasks.

3. Identifying Transfer Layers

Certain layers of the base model capture generic information relevant to both the original and the new task. A few layers, often found near the bottom of the network, are adept at learning low-level features. Identifying these layers is crucial as they form the basis for transferring knowledge to the new task.

Identifying Transfer Layers

You can freeze the initial layers of the pre-trained model to preserve the learned information and train a new model with the remaining layers. Source: Machine Learning Nuggets


4. Feature Extraction

Using the identified layers, the pre-trained model is employed to extract features from the new dataset. This process leverages the general representations learned during the pre-training phase, providing a head start in understanding the new data. For example, features extracted from a pre-trained CNN can be used for tasks like object detection, segmentation, or even different types of image classification.

5. Fine-Tuning

This step adjusts the model's weights and parameters to better suit the new task's specific requirements. It may presume to make some of the initially frozen layers trainable, aiming to preserve the valuable knowledge gained from the pre-training phase while optimizing the model for the new challenge.

Fine-Tuning

Fine-tuning (b) and feature extraction (c). Source: Learning without forgetting (PDF)


This structured approach allows for efficient adaptation and enhanced performance in various machine learning domains, from image processing and natural language understanding to more specialized tasks like medical diagnosis or financial forecasting.

Transfer learning does not always require the use of a third-party pre-trained model. In some cases, the initial model can be trained from scratch on an available related dataset. This approach is particularly helpful when the business has access to relevant data that is not publicly available.

Types of Transfer Learning

Transfer learning encompasses various methods of knowledge adaptation and performance enhancement. Here are some of the main types of transfer learning and their key characteristics.

Inductive Transfer Learning

The source and target tasks are different, but the source model helps improve the target task’s performance. It concerns using a model trained for object detection to improve performance on image segmentation or another related task.

Transductive Transfer Learning

The source and target tasks are the same, but the domains differ. It covers adapting a spam detection model to work effectively on a different dataset.

Unsupervised Transfer Learning

Both the source and target tasks are unsupervised, and knowledge transfer aims to enhance feature learning. For example, unsupervised learning techniques can be used on a large text corpus to improve feature extraction for clustering tasks on a different, smaller text corpus.

Domain Adaptation

The source and target domains have the same feature spaces but different distributions. A straightforward example involves adapting a speech recognition model trained in American English to recognize British English.

Multi-task Learning

Several tasks from the same domain are learned simultaneously without distinguishing between source and target tasks. For instance, we may train a model to perform both language translation and part-of-speech tagging using the same text data, improving performance through shared knowledge.

One-shot Learning

A classification task where only one or a few examples are available for learning and classifying many new examples in the future. It concerns recognizing a new person's face based on just one photo.

Zero-shot Learning

Transfer learning using zero instances of a class, relying on additional data during training to understand unseen data. For example, leveraging semantic relationships between known and unknown classes can help the model classify images of animals it has never seen before.

Zero-shot Learning

Distinction between usual machine learning setting and transfer learning, and positioning of domain adaptation. Source: Semantic Scholar


Each of these types of transfer learning addresses different scenarios of reusing previously acquired knowledge to improve performance on new tasks or in new domains.

Transfer Learning Practical Cases

Transfer learning has proven practical in diverse fields, reducing training time and resource requirements. Transfer learning is particularly beneficial in the following scenarios:

  • Lack of Data: when the target task has insufficient labeled data for training a model from scratch.

  • Similar Domains: when the source and target domains are similar or share common features.

  • Complex Models: when you need a deep learning model with large architecture that is too expensive to train from scratch.

Transfer learning is mainly associated with computer vision and NLP tasks, although it’s applied to various projects across multiple domains.

Transfer Learning in Natural Language Processing

Transfer learning has been transformative in this field with pre-trained language models like BERT, GPT, and RoBERTa, which have set new benchmarks in various NLP tasks:

  • Text Classification: Fine-tuning BERT on news categorization datasets.

  • Named Entity Recognition (NER): Using pre-trained models for entity extraction with minimal training data.

  • Machine Translation: Leveraging models like mBERT for multilingual translation tasks.

The platform StartUs Insights listed transfer learning as one of the top-9 NLP trends for 2023. In particular, their research mentions the startup QuillBot which makes an AI-powered paraphrasing tool. Transfer learning powers its text slider and thesaurus that suggests synonyms. The tool also checks grammar, creates summaries, generates citations, and checks plagiarism.

Transfer Learning in Natural Language Processing

A figure summarizing some of the QuillBot experiments. Source: QuillBlog


Transfer Learning in Computer Vision

Computer vision is another field where transfer learning has made significant impacts:

  • Image Classification: Using models like VGG, ResNet, and EfficientNet pre-trained on ImageNet for various classification tasks.

  • Object Detection: Fine-tuning models like YOLO or Faster R-CNN on specific detection tasks.

  • Semantic Segmentation: Applying models like U-Net pre-trained on medical image datasets for different segmentation tasks.

Transfer Learning in Computer Vision 1

Data characteristics overview from the literature review on transfer learning for medical image classification. Source: BMC Medical Imaging


In 2024, a group of scientists from the University of Nottingham suggested a method for real-time heating optimization based on the clothing insulation level classification. They applied transfer learning to teach their model to distinguish between light, medium, and warm clothes people can wear inside.

Transfer Learning in Computer Vision 2

The suggested method for detecting and classifying clothing levels for indoor thermal environment control. Source: Building and Environment


Transfer Learning in Speech Recognition

AI assistant developers use transfer learning to improve their voice assistants’ speech recognition capabilities. Models pre-trained on vast amounts of general audio data are fine-tuned with specific voice commands and accents. This helps provide more accurate and context-aware responses, improving user experience in smart home environments.

Transfer Learning in Robotics

OpenAI developed a robotic hand that can manipulate objects with remarkable dexterity. Using reinforcement learning, they initially trained their model in a simulated environment and then transferred this knowledge to the physical robot.

Transfer Learning in Robotics

The Dactyl system is trained in simulation and transfers its knowledge to reality, adapting to real-world physics. Source: OpenAI


This transfer from simulation to real-world application enables the robotic hand to perform complex tasks like solving a Rubik’s Cube, which requires fine motor skills and adaptability to various scenarios.

Transfer Learning in Financial Forecasting

Financial institutions use transfer learning to improve stock market prediction models. A model pre-trained on a large corpus of financial data across different markets can be fine-tuned on specific stocks or market conditions. This allows the model to leverage learned patterns and improve the accuracy of predictions, aiding in investment strategies and risk management.

Transfer Learning in Financial Forecasting

The transfer learning process for models trained on the source task with 379 stock market indices by industry. Source: Expert Systems with Applications


Transfer Learning Limitations

The powerful approach of transfer learning still comes with several limitations and challenges.

Domain Discrepancy

One primary limitation is the issue of domain mismatch. Models pre-trained on a particular dataset might not transfer well to a different domain with distinct characteristics. For instance, a model successfully trained on general text data may not perform optimally when fine-tuned on legal or medical texts.

This domain discrepancy can lead to suboptimal performance, requiring large amounts of domain-specific data to achieve the desired accuracy.

Potential Bias

Additionally, the quality and size of the pre-trained model's data significantly impact the transferability. If the pre-trained data is biased or unrepresentative of the target task, the model might perpetuate these biases, leading to inaccurate or unfair outcomes.

Potential Bias

In 2020, Barreth Zoch and colleagues questioned the efficiency of transfer learning as the dominant approach in machine vision. They stressed it hurts performance when more robust data augmentation is used. Source: Neural Information Processing Systems


Resource Consumption

Another challenge is the computational and resource requirements for transfer learning. While transfer learning can reduce the need for large training datasets, the initial models often require significant computational power and time to pre-train. Fine-tuning these models, although less intensive than training from scratch, still demands considerable computational resources, especially for large-scale models like BERT or GPT.

Final Thoughts

Due to its efficiency, transfer learning has become the dominant approach for many tasks in NLP, computer vision, and speech recognition. It excels in scenarios where large-scale annotated data is scarce or costly to obtain, making it invaluable for specialized applications like medical image analysis and financial forecasting.

However, it is essential to recognize that transfer learning is not a one-size-fits-all solution. Its success highly depends on the compatibility between the source and target domains, and a significant domain mismatch can limit its effectiveness. Additionally, the resource demands for fine-tuning large pre-trained models can be substantial, necessitating significant computational power and expertise.

Thus, while transfer learning offers impressive benefits, it should be considered alongside other approaches. Data scientists and engineers must carefully evaluate its suitability for their specific tasks, balancing the potential gains against the inherent limitations and resource requirements.

Article written by:

Toloka Team

Updated:

Jun 12, 2024

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe
to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

More about Toloka

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?