Generative AI Advanced Fine-Tuning for LLMs

Generative AI Advanced Fine-Tuning for LLMs

This course is part of multiple programs.

Instructors: Joseph Santarcangelo

Access provided by FutureX

22,680 already enrolled

2 modules

Gain insight into a topic and learn the fundamentals.

130 reviews

Intermediate level

Recommended experience

9 hours to complete

Flexible schedule

Learn at your own pace

2 modules

Gain insight into a topic and learn the fundamentals.

130 reviews

Intermediate level

Recommended experience

9 hours to complete

Flexible schedule

Learn at your own pace

What you'll learn

In-demand generative AI engineering skills in fine-tuning LLMs that employers are actively seeking
Instruction tuning and reward modeling using Hugging Face, plus understanding LLMs as policies and applying RLHF techniques
Direct preference optimization (DPO) with partition function and Hugging Face, including how to define optimal solutions to DPO problems
Using proximal policy optimization (PPO) with Hugging Face to build scoring functions and tokenize datasets for fine-tuning

Skills you'll gain

Tools you'll learn

Generative AI

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

5 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is available as part of

When you enroll in this course, you'll also be asked to select a specific program.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 2 modules in this course

"Fine-tuning large language models (LLMs) is essential for aligning them with specific business needs, improving accuracy, and optimizing performance. In today’s AI-driven world, organizations rely on fine-tuned models to generate precise, actionable insights that drive innovation and efficiency. This course equips aspiring generative AI engineers with the in-demand skills employers are actively seeking.

You’ll explore advanced fine-tuning techniques for causal LLMs, including instruction tuning, reward modeling, and direct preference optimization. Learn how LLMs act as probabilistic policies for generating responses and how to align them with human preferences using tools such as Hugging Face. You’ll dive into reward calculation, reinforcement learning from human feedback (RLHF), proximal policy optimization (PPO), the PPO trainer, and optimal strategies for direct preference optimization (DPO). The hands-on labs in the course will provide real-world experience with instruction tuning, reward modeling, PPO, and DPO, giving you the tools to confidently fine-tune LLMs for high-impact applications. Build job-ready generative AI skills in just two weeks! Enroll today and advance your career in AI!"

In this module, you will explore advanced techniques for fine-tuning large language models (LLMs) through instruction tuning and reward modeling. You’ll begin by defining instruction tuning and learning its process, including dataset loading, text generation pipelines, and training arguments using Hugging Face. You’ll then delve into reward modeling, where you’ll preprocess datasets, apply low-rank adaptation (LoRA) configurations, and quantify quality responses to guide model optimization and align with human preferences. You’ll also describe and utilize reward trainers and reward model loss functions. In addition, the hands-on labs will reinforce your learning with practical experience in instruction tuning and reward modeling, empowering you to effectively customize LLMs for targeted tasks.

What's included

6 videos4 readings2 assignments2 app items3 plugins

6 videosTotal 36 minutes

Course Introduction3 minutes
Basics of Instruction-Tuning7 minutes
Instruction-Tuning with Hugging Face7 minutes
Reward Modeling: Response Evaluation5 minutes
Reward Model Training 7 minutes
Reward Modeling with Hugging Face8 minutes

4 readingsTotal 18 minutes

Course Overview3 minutes
Specialization Overview10 minutes
Best Practices for Instruction-Tuning Large Language Models 3 minutes
Summary and Highlights 2 minutes

2 assignmentsTotal 30 minutes

Practice Quiz: Instruction-Tuning and Reward Modeling 9 minutes
Different Approaches to Instruction-Tuning21 minutes

2 app itemsTotal 150 minutes

Instruction Fine-Tuning LLMs90 minutes
Lab: Reward Modeling60 minutes

3 pluginsTotal 35 minutes

Helpful tips for Course Completion5 minutes
Instruction Tuning15 minutes
Reward Modeling & Response Evaluation15 minutes

In this module, you will explore advanced techniques for fine-tuning large language models (LLMs) using reinforcement learning from human feedback (RLHF), proximal policy optimization (PPO), and direct preference optimization (DPO). You’ll begin by describing how LLMs function as probabilistic distributions and how these can be transformed into policies to generate responses based on input text. You’ll examine the relationship between policies and language models as a function of parameters, such as omega, and how rewards can be calculated using human feedback. This includes training response samples, evaluating agent performance, and defining scoring functions for tasks like sentiment analysis using PPO. You’ll also be able to explain PPO configuration, learning rates, and the PPO trainer’s role in optimizing chatbot responses using Hugging Face tools. The module further introduces DPO, a more direct and efficient way to align models with human preferences. While complex topics like PPO and reinforcement learning are introduced, you are not expected to understand them in depth for this course. The hands-on labs in this module will allow you to practice applying RLHF and DPO. To support your learning, a cheat sheet and glossary are included for quick reference.