Conference

ICML 2023

Reinforcement Learning from Human Feedback: A Tutorial

Image
Image
Image

Reinforcement Learning from Human Feedback: A Tutorial

Reinforcement learning from human feedback (RLHF) has dramatically improved the real-world performance and user experience of large machine learning models. However, this approach has remained largely out of reach for most academic researchers.

Our tutorial covers the two essential parts of a successful RLHF project: the core machine learning techniques behind RLHF, and the human-in-the-loop methodology for collecting human feedback data. As a take-home assignment, you will run your first RLHF experiment with a simple creative task to make top-5 lists of items the user requests (for instance, top ideas for Thanksgiving meals).

Download our ICML 2023 presentation
Download

Agenda

July 24, 12:30 pm

Introduction

July 24, 12:40 pm

Technical overview

July 24, 1:30 pm

Break

July 24, 2:00 pm

Human annotation for RLHF

July 24, 2:45pm

Conclusion and Q&A

Technical overview

The technical overview will break down the RLHF process into three primary stages: language modeling for the base policy, modeling of human preferences, and performing optimization with RL. This part of the tutorial will be technical and will describe potential research questions, pitfalls, and tips for successful projects.

Image
Nathan Lambert
Hugging FaceResearch ScientistProfile link

Human Annotation for RLHF

In this section, we will address challenges in collecting human-produced texts and performing human preference scoring. We will focus on three aspects of human annotation for RLHF:

  • Basics of data labeling. Introduction to data labeling with crowdsourcing and integrating human input into ML systems. We'll discuss avoiding common pitfalls and providing good instructions to the crowd.
  • Collecting human-augmented texts. We'll address the challenges of gathering human-augmented texts, which are crucial for training an initial language model for instruction tasks.
  • Collecting human scores. We'll show you three approaches to obtain human scores of prompt completions used for training a reward model.
Image
Dmitry Ustalov
TolokaHead of Ecosystem DevelopmentProfile link

Practical assignment

To apply what you learn about RLHF methods, you'll have the opportunity to complete a short practical assignment after the tutorial. The assigned task will involve generating top-5 lists for a user's query. You'll use example lists as human demonstration data at the supervised fine-tuning stage, then run preference modeling and RL fine-tuning to make the model's outputs more helpful and faithful to the prompt.

Tutorial team

Image
Nathan Lambert
Hugging FaceResearch ScientistProfile link
Image
Dmitry Ustalov
TolokaHead of Ecosystem DevelopmentProfile link
Image
Nikita Pavlichenko
TolokaMachine Learning ResearcherProfile link
Image
Max Ryabinin
YandexSenior Research ScientistProfile link
Image
Tristan Thrush
Hugging FaceResearch EngineerProfile link
Image
Nazneen Rajani
Hugging FaceResearch LeadProfile link
Image
Lewis Tunstall
Hugging FaceMachine Learning EngineerProfile link
Image
Sergey Koshelev
TolokaResearcherProfile link
Image
Natalia Fedorova
TolokaAcademic Program ManagerProfile link
(

Don't miss out

Be the first to hear about our workshops, 
tutorials, and webinars.
Fractal