Products

Resources

Impact on AI

Company

Conference

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

The 31st International Conference on Computational Linguistics (COLING)

Jan 19, 2025

Conference

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

The 31st International Conference on Computational Linguistics (COLING)

Jan 19, 2025

Conference

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

The 31st International Conference on Computational Linguistics (COLING)

Jan 19, 2025

Conference

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

The 31st International Conference on Computational Linguistics (COLING)

Jan 19, 2025

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

Where:

COLING 2025, Abu Dhabi, UAE

Date:

Jan 19, 2025

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

Where:

COLING 2025, Abu Dhabi, UAE

Date:

Jan 19, 2025

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

Where:

COLING 2025, Abu Dhabi, UAE

Date:

Jan 19, 2025

Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop

Where:

COLING 2025, Abu Dhabi, UAE

Date:

Jan 19, 2025

Overview

Training and deploying machine learning models relies on a large amount of human-annotated data. As human labeling becomes increasingly expensive and time-consuming, recent research has developed multiple strategies to speed up annotation and reduce costs and human workload: generating synthetic training data, active learning, and hybrid labeling. This tutorial is oriented toward practical applications: we will present the basics of each strategy, highlight their benefits and limitations, and discuss in detail real-life case studies. Additionally, we will walk through best practices for managing human annotators and controlling the quality of the final dataset. The tutorial includes a hands-on workshop, where attendees will be guided in implementing a hybrid annotation setup. This tutorial is designed for NLP practitioners from both research and industry backgrounds who are involved in or interested in optimizing data labeling projects.

Part 1: Introduction (20 min)
This section addresses the critical need for extensive labeled datasets and introduces key concepts.

Part 2: LM workflows (30 min)
This section will demonstrate best practices for common workflows involving language models (LMs) and large language models (LLMs). These workflows aim to (i) create efficient LMs with acceptable performance optimized for labeling data, and (ii) generate synthetic data for data augmentation.

Part 3: Active learning with LMs (40 min)
This section presents Active learning (AL) in data annotation. We discuss key strategies for both generative and non-generative AL, their applications, advantages, and limitations.

Part 4: Quality control and managing human workers (30 min)
This section focuses on quality control and best practices in working with human annotators.

Part 5: Hybrid pipelines (40 min)
This section presents developing hybrid pipelines, e.g. effectively combining human and model labeling to achieve the best balance of quality, cost, and speed.

Part 6: Limitations (20 min)
This section addresses the challenges of labeling tasks with LMs, the various reasons behind these difficulties, and future research directions to lift these limitations.

Part 7: Hands-on session: Hybrid data annotation (30 min)
In this hands-on session, we will implement a hybrid approach on a real-world dataset and demonstrate improvements in annotation quality.


Speakers

Natalia Fedorova

Toloka Partnership Manager, Toloka

Profile link

Boris Obmoroshev

Toloka AI R&D and Analytics Director, Toloka

Profile link

Sergei Tilga

Head of R&D, Toloka

Profile link

Ekaterina Artemova

Machine Learning Researcher, Toloka

Profile link

Konstantin Chernyshev

Machine Learning Researcher, Toloka

Profile link

Akim Tsvigun

Natural Language Processing Lead @ Nebius AI, University of Amsterdam

Profile link

Dominik Schlechtweg

Research Group Lead, Universität Stuttgart

Profile link

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe

Don't miss a thing!

Get all the latest on our webinars, meetups, and other events.

Subscribe