Solutions

Datasets

Research

Resources

Company

Talk to us

Back to Events

Conference

ICML 2023

Reinforcement Learning from Human Feedback: A Tutorial

Jul 24, 2023

23:00 GMT+3

Back to Events

Conference

ICML 2023

Reinforcement Learning from Human Feedback: A Tutorial

Jul 24, 2023

23:00 GMT+3

Back to Events

Conference

ICML 2023

Reinforcement Learning from Human Feedback: A Tutorial

Jul 24, 2023

23:00 GMT+3

Back to Events

Conference

ICML 2023

Reinforcement Learning from Human Feedback: A Tutorial

Jul 24, 2023

23:00 GMT+3

ICML 2023

Where:

Online

Date:

Jul 24, 2023

23:00 GMT+3

Reinforcement Learning from Human Feedback: A Tutorial

Reinforcement learning from human feedback (RLHF) has dramatically improved the real-world performance and user experience of large machine learning models. However, this approach has remained largely out of reach for most academic researchers.

Our tutorial covers the two essential parts of a successful RLHF project: the core machine learning techniques behind RLHF, and the human-in-the-loop methodology for collecting human feedback data. As a take-home assignment, you will run your first RLHF experiment with a simple creative task to make top-5 lists of items the user requests (for instance, top ideas for Thanksgiving meals).

Download our ICML 2023 presentation

Agenda

Technical overview

The technical overview will break down the RLHF process into three primary stages: language modeling for the base policy, modeling of human preferences, and performing optimization with RL. This part of the tutorial will be technical and will describe potential research questions, pitfalls, and tips for successful projects.

Human Annotation for RLHF

In this section, we will address challenges in collecting human-produced texts and performing human preference scoring. We will focus on three aspects of human annotation for RLHF:

Basics of data labeling. Introduction to data labeling with crowdsourcing and integrating human input into ML systems. We'll discuss avoiding common pitfalls and providing good instructions to the crowd.
Collecting human-augmented texts. We'll address the challenges of gathering human-augmented texts, which are crucial for training an initial language model for instruction tasks.
Collecting human scores. We'll show you three approaches to obtain human scores of prompt completions used for training a reward model.

Practical assignment

To apply what you learn about RLHF methods, you'll have the opportunity to complete a short practical assignment after the tutorial. The assigned task will involve generating top-5 lists for a user's query. You'll use example lists as human demonstration data at the supervised fine-tuning stage, then run preference modeling and RL fine-tuning to make the model's outputs more helpful and faithful to the prompt.