Reinforcement Learning from Human Feedback: A Tutorial
Reinforcement learning from human feedback (RLHF) has dramatically improved the real-world performance and user experience of large machine learning models. However, this approach has remained largely out of reach for most academic researchers.
Our tutorial covers the two essential parts of a successful RLHF project: the core machine learning techniques behind RLHF, and the human-in-the-loop methodology for collecting human feedback data. As a take-home assignment, you will run your first RLHF experiment with a simple creative task to make top-5 lists of items the user requests (for instance, top ideas for Thanksgiving meals).
Download our ICML 2023 presentation
Agenda
Technical overview
The technical overview will break down the RLHF process into three primary stages: language modeling for the base policy, modeling of human preferences, and performing optimization with RL. This part of the tutorial will be technical and will describe potential research questions, pitfalls, and tips for successful projects.
Human Annotation for RLHF
In this section, we will address challenges in collecting human-produced texts and performing human preference scoring. We will focus on three aspects of human annotation for RLHF:
Basics of data labeling. Introduction to data labeling with crowdsourcing and integrating human input into ML systems. We'll discuss avoiding common pitfalls and providing good instructions to the crowd.
Collecting human-augmented texts. We'll address the challenges of gathering human-augmented texts, which are crucial for training an initial language model for instruction tasks.
Collecting human scores. We'll show you three approaches to obtain human scores of prompt completions used for training a reward model.
Practical assignment
To apply what you learn about RLHF methods, you'll have the opportunity to complete a short practical assignment after the tutorial. The assigned task will involve generating top-5 lists for a user's query. You'll use example lists as human demonstration data at the supervised fine-tuning stage, then run preference modeling and RL fine-tuning to make the model's outputs more helpful and faithful to the prompt.