What Is Relationship Management? And Why Do Businesses Use It?
November 21, 2024
Article
This course is part of Reinforcement Learning Specialization
Instructors: Martha White
Instructor ratings
We asked all learners to give feedback on our instructors based on the quality of their teaching style.
34,649 already enrolled
Included with
(1,236 reviews)
Recommended experience
Intermediate level
Probabilities & Expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), implementing algorithms from pseudocode
(1,236 reviews)
Recommended experience
Intermediate level
Probabilities & Expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), implementing algorithms from pseudocode
Add to your LinkedIn profile
5 assignments
Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review
In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning.
By the end of this course you will be able to: - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model - Understand the connections between Monte Carlo and Dynamic Programming and TD. - Implement and apply the TD algorithm, for estimating value functions - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between on-policy and off-policy control - Understand planning with simulated experience (as opposed to classic planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna
Welcome to the second course in the Reinforcement Learning Specialization: Sample-Based Learning Methods, brought to you by the University of Alberta, Onlea, and Coursera. In this pre-course module, you'll be introduced to your instructors, and get a flavour of what the course has in store for you. Make sure to introduce yourself to your classmates in the "Meet and Greet" section!
2 videos2 readings1 discussion prompt
This week you will learn how to estimate value functions and optimal policies, using only sampled experience from the environment. This module represents our first step toward incremental learning methods that learn from the agent’s own interaction with the world, rather than a model of the world. You will learn about on-policy and off-policy methods for prediction and control, using Monte Carlo methods---methods that use sampled returns. You will also be reintroduced to the exploration problem, but more generally in RL, beyond bandits.
11 videos3 readings1 assignment1 programming assignment1 discussion prompt
This week, you will learn about one of the most fundamental concepts in reinforcement learning: temporal difference (TD) learning. TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods. TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world, and do not require knowledge of the model. TD methods are similar to DP methods in that they bootstrap, and thus can learn online---no waiting until the end of an episode. You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping. For this module, we first focus on TD for prediction, and discuss TD for control in the next module. This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.
6 videos2 readings1 assignment1 programming assignment1 discussion prompt
This week, you will learn about using temporal difference learning for control, as a generalized policy iteration strategy. You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa, Q-learning and Expected Sarsa. You will see some of the differences between the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both. You will implement Expected Sarsa and Q-learning, on Cliff World.
9 videos3 readings1 assignment1 programming assignment1 discussion prompt
Up until now, you might think that learning with and without a model are two distinct, and in some ways, competing strategies: planning with Dynamic Programming verses sample-based learning via TD methods. This week we unify these two strategies with the Dyna architecture. You will learn how to estimate the model from data and then use this model to generate hypothetical experience (a bit like dreaming) to dramatically improve sample efficiency compared to sample-based methods like Q-learning. In addition, you will learn how to design learning systems that are robust to inaccurate models.
11 videos4 readings2 assignments1 programming assignment1 discussion prompt
We asked all learners to give feedback on our instructors based on the quality of their teaching style.
The University of Alberta is considered among the world’s leading public research- and teaching-intensive universities, known for excellence across the humanities, sciences, creative arts, business, engineering and health sciences. As one of Canada’s top universities, we are investing in purpose-built online post-secondary education—rooted in innovative digital pedagogies, world-class faculty, exceptional design, and a championed student experience.
The Alberta Machine Intelligence Institute (Amii) is home to some of the world’s top talent in machine intelligence. We’re an Alberta-based research institute that pushes the bounds of academic knowledge and guides business understanding of artificial intelligence and machine learning.
University of Alberta
Course
Illinois Tech
Build toward a degree
Course
University of Alberta
Course
University of Washington
Specialization
1,236 reviews
82.29%
13.33%
2.82%
0.56%
0.97%
Showing 3 of 1236
Reviewed on Feb 27, 2020
Itwasgoodinsubstane but there is plenty of issues with the automated grader. you spend most time dealing with the letter not on actual learning of the matter.
Reviewed on Mar 13, 2022
The videos are very clear and do a good job explaining the material from the textbook. The assignments are relevant and just right in terms of length and difficulty.
Reviewed on Feb 14, 2021
Excellent course that naturally extends the first specialization course. The application examples in programming are very good and I loved how RL gets closer and closer to how a living being thinks.
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Earn a degree from world-class universities - 100% online
Upskill your employees to excel in the digital economy
Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:
The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.
If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.