← Events
/
Conference
Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop
January 19, 2025
—
Jan 24, 2025
·
COLING 2025, Abu Dhabi, UAE
Speaking session
Sunday, January, 19th 14:00-17:30
Part 1: Introduction (20 min)
This section addresses the critical need for extensive labeled datasets and introduces key concepts.
Part 2: LM workflows (30 min)
This section will demonstrate best practices for common workflows involving language models (LMs) and large language models (LLMs). These workflows aim to (i) create efficient LMs with acceptable performance optimized for labeling data, and (ii) generate synthetic data for data augmentation.
Part 3: Active learning with LMs (40 min)
This section presents Active learning (AL) in data annotation. We discuss key strategies for both generative and non-generative AL, their applications, advantages, and limitations.
Part 4: Quality control and managing human workers (30 min)
This section focuses on quality control and best practices in working with human annotators.
Part 5: Hybrid pipelines (40 min)
This section presents developing hybrid pipelines, e.g. effectively combining human and model labeling to achieve the best balance of quality, cost, and speed.
Part 6: Limitations (20 min)
This section addresses the challenges of labeling tasks with LMs, the various reasons behind these difficulties, and future research directions to lift these limitations.
Part 7: Hands-on session: Hybrid data annotation (30 min)
In this hands-on session, we will implement a hybrid approach on a real-world dataset and demonstrate improvements in annotation quality.
Materials & slides
Hosts
Natalia Fedorova ↗
Toloka Partnership Manager, Toloka
Boris Obmoroshev ↗
R&D Analytics Director, Toloka
Sergei Tilga ↗
Head of R&D
Ekaterina Artemova ↗
ML Researcher, Toloka
Konstantin Chernyshev ↗
Machine Learning Researcher
Akim Tsvigun ↗
Natural Language Processing Lead @ Nebius AI, University of Amsterdam
Dominik Schlechtweg ↗
Research Group Lead, Universität Stuttgart







