In this capstone project course, we'll compare genome sequences of COVID-19 mutations to identify potential areas a drug therapy can look to target. The first step in drug discovery involves identifying target subsequences of theirs genome to target. We'll start by comparing the genomes of virus mutations to look for similarities. Then, we'll perform PCA to cut down our number of dimensions and identify the most common features. Next, we'll use K-means clustering in Python to find the optimal number of groups and trace the lineage of the virus. Finally, we'll predict similarity between the sequences and use this to pick a target subsequence. Throughout the course, each section will consist of a programming assignment coupled with a guide video and helpful hints. By the end, you'll be well on your way to discovering ways to combat disease with genome sequencing.
Capstone Project: Advanced AI for Drug Discovery
This course is part of AI for Scientific Research Specialization
Instructors: Rajvir Dua
Sponsored by Coursera for Reliance Family
2,329 already enrolled
Recommended experience
What you'll learn
Analyzing genome sequences to find similarities and identify target subsequences using predctive models.
Skills you'll gain
Details to know
Add to your LinkedIn profile
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review
There are 4 modules in this course
In this module, we'll start to get familiar with our dataset by performing some basic EDA and comparing genome sequences. By analyzing the mutations of the COVID-19 virus, we'll be able to identify some common properties of the genome that our drug should look to target.
What's included
4 videos1 programming assignment1 discussion prompt1 ungraded lab
In this module, we'll continue to work with out genome sequence data - using PCA to identify groups and delicate the most important features. After reducing the number of dimensions in the dataset, we'll be able to use K-means to form clusters and visualize the different areas in 2-D space.
What's included
2 videos1 reading1 programming assignment1 ungraded lab
In this module, we'll cluster the genome sequences using the K-means algorithm. We'll optimize the number of clusters by comparing silhouette scores across a wide variety of inputs to identify the greatest drop-off. Finally, we'll set ourselves up to using prediction pipelines to predict bit scores and drug therapies in the last module.
What's included
2 videos1 programming assignment1 discussion prompt1 ungraded lab
In this module, we'll test a variety of regressors to see which one performs best in predicting bit scores for each genome sequence. Then, we'll use our chosen model to find the genome equines that are most closely related and trace out a possible subsequence to target with a combative drug.
What's included
2 videos1 programming assignment1 ungraded lab
Offered by
Why people choose Coursera for their career
Recommended if you're interested in Data Science
Novartis
Stanford University
Peking University
Fred Hutchinson Cancer Center
Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy