Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback

Name: Reinforcement Learning from Human Feedback
Rating: 4.6875 (6 reviews)

Instructor: Nikita Namjoshi

Access provided by FutureX

3,153 already enrolled

Project

Build in-demand job skills with step-by-step instructions

4.7

(32 reviews)

Intermediate level

Recommended experience

1 hour

Learn at your own pace

Hands-on learning

Learn more

Project

Build in-demand job skills with step-by-step instructions

4.7

(32 reviews)

Intermediate level

Recommended experience

1 hour

Learn at your own pace

Hands-on learning

Learn more

What you'll learn

Get a conceptual understanding of Reinforcement Learning from Human Feedback (RLHF), as well as the datasets needed for this technique.
Fine-tune the Llama 2 model using RLHF with the open source Google Cloud Pipeline Components Library.
Evaluate tuned model performance against the base model with evaluation methods.

Skills you'll practice

Details to know

Taught in English

No downloads or installation required

Only available on desktop

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Learn, practice, and apply job-ready skills in less than 2 hours

Receive training from industry experts
Gain hands-on experience solving real-world job tasks

About this project

Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences.

Reinforcement Learning from Human Feedback (RLHF) is currently the main method for aligning LLMs with human values and preferences. RLHF is also used for further tuning a base LLM to align with values and preferences that are specific to your use case. In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will: 1. Explore the two datasets that are used in RLHF training: the “preference” and “prompt” datasets. 2. Use the open source Google Cloud Pipeline Components Library, to fine-tune the Llama 2 model with RLHF. 3. Assess the tuned LLM against the original base model by comparing loss curves and using the “Side-by-Side (SxS)” method.

Instructor

Instructor ratings

(8 ratings)

Nikita Namjoshi

DeepLearning.AI

3 Courses 7,456 learners

Offered by

DeepLearning.AI

How you'll learn

Hands-on, project-based learning
Practice new skills by completing job-related tasks with step-by-step instructions.
No downloads or installation required
Access the tools and resources you need in a cloud environment.
Available only on desktop
This project is designed for laptops or desktop computers with a reliable Internet connection, not mobile devices.