← Events

/

Conference

January 19, 2025

Jan 24, 2025

·

COLING 2025, Abu Dhabi, UAE

Speaking session

Sunday, January, 19th 14:00-17:30

Part 1: Introduction (20 min)
This section addresses the critical need for extensive labeled datasets and introduces key concepts.

Part 2: LM workflows (30 min)
This section will demonstrate best practices for common workflows involving language models (LMs) and large language models (LLMs). These workflows aim to (i) create efficient LMs with acceptable performance optimized for labeling data, and (ii) generate synthetic data for data augmentation.

Part 3: Active learning with LMs (40 min)
This section presents Active learning (AL) in data annotation. We discuss key strategies for both generative and non-generative AL, their applications, advantages, and limitations.

Part 4: Quality control and managing human workers (30 min)
This section focuses on quality control and best practices in working with human annotators.

Part 5: Hybrid pipelines (40 min)
This section presents developing hybrid pipelines, e.g. effectively combining human and model labeling to achieve the best balance of quality, cost, and speed.

Part 6: Limitations (20 min)
This section addresses the challenges of labeling tasks with LMs, the various reasons behind these difficulties, and future research directions to lift these limitations.

Part 7: Hands-on session: Hybrid data annotation (30 min)
In this hands-on session, we will implement a hybrid approach on a real-world dataset and demonstrate improvements in annotation quality.

Materials & slides

Presentation

Practical part