Efficiently Serving LLMs

Efficiently Serving LLMs

Instructor: Travis Addair

Project

Build in-demand job skills with step-by-step instructions

Intermediate level

Recommended experience

1 hour

Learn at your own pace

Hands-on learning

Learn more

Project

Build in-demand job skills with step-by-step instructions

Intermediate level

Recommended experience

1 hour

Learn at your own pace

Hands-on learning

Learn more

What you'll learn

Learn how Large Language Models (LLMs) repeatedly predict the next token, and how techniques like KV caching can greatly speed up text generation.
Code for efficient LLM app serving, balancing model output speed and serving many users at once.
Explore the fundamentals of Low Rank Adapters and see how Predibase builds their framework inference server to serve fine-tuned models at once.

Skills you'll practice

Details to know

Taught in English

No downloads or installation required

Only available on desktop

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Learn, practice, and apply job-ready skills in less than 2 hours

Receive training from industry experts
Gain hands-on experience solving real-world job tasks

About this project

Join our new short course, Efficiently Serving Large Language Models, to build a ground-up understanding of how to serve LLM applications from Travis Addair, CTO at Predibase. Whether you’re ready to launch your own application or just getting started building it, the topics you’ll explore in this course will deepen your foundational knowledge of how LLMs work, and help you better understand the performance trade-offs you must consider when building LLM applications that will serve large numbers of users.

You’ll walk through the most important optimizations that allow LLM vendors to efficiently serve models to many customers, including strategies for working with multiple fine-tuned models at once. In this course, you will: 1. Learn how auto-regressive large language models generate text one token at a time. 2. Implement the foundational elements of a modern LLM inference stack in code, including KV caching, continuous batching, and model quantization, and benchmark their impacts on inference throughput and latency. 3. Explore the details of how LoRA adapters work, and learn how batching techniques allow different LoRA adapters to be served to multiple customers simultaneously. 4. Get hands-on with Predibase’s LoRAX framework inference server to see these optimization techniques implemented in a real world LLM inference server. Knowing more about how LLM servers operate under the hood will greatly enhance your understanding of the options you have to increase the performance and efficiency of your LLM-powered applications.

Instructor

Travis Addair

DeepLearning.AI

1 Course618 learners

Offered by

DeepLearning.AI

How you'll learn

Hands-on, project-based learning
Practice new skills by completing job-related tasks with step-by-step instructions.
No downloads or installation required
Access the tools and resources you need in a cloud environment.
Available only on desktop
This project is designed for laptops or desktop computers with a reliable Internet connection, not mobile devices.

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

New to Algorithms? Start here.

Exploring Deep Learning Frameworks: Tools for Building Intelligent Systems

April 17, 2025

Article

Natural Language Processing Job Description

May 5, 2025

Article

How Does Natural Language Processing Work?

April 21, 2025

Article

Essential Deep Learning Skills

March 17, 2025

Article

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

In Projects, you'll complete an activity or scenario by following a set of instructions in an interactive hands-on environment. Projects are completed in a real cloud environment and within real instances of various products as opposed to a simulation or demo environment.

By purchasing a Project, you'll get everything you need to complete the Project including temporary access to any product required to complete the Project.

Even though Projects are technically available on mobile devices, we highly recommend that you complete Projects on a laptop or desktop only.