Statistical Inference and Hypothesis Testing in Data Science Applications

Statistical Inference and Hypothesis Testing in Data Science Applications

This course is part of Data Science Foundations: Statistical Inference Specialization

Instructor: Jem Corcoran

Sponsored by Louisiana Workforce Commission

6,555 already enrolled

6 modules

Gain insight into a topic and learn the fundamentals.

4.7

(47 reviews)

Intermediate level

Recommended experience

36 hours to complete

3 weeks at 12 hours a week

Flexible schedule

Learn at your own pace

6 modules

Gain insight into a topic and learn the fundamentals.

4.7

(47 reviews)

Intermediate level

Recommended experience

36 hours to complete

3 weeks at 12 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Define a composite hypothesis and the level of significance for a test with a composite null hypothesis.
Define a test statistic, level of significance, and the rejection region for a hypothesis test. Give the form of a rejection region.
Perform tests concerning a true population variance.
Compute the sampling distributions for the sample mean and sample minimum of the exponential distribution.

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

1 quiz, 5 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

Build your subject-matter expertise

This course is part of the Data Science Foundations: Statistical Inference Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

There are 6 modules in this course

This course will focus on theory and implementation of hypothesis testing, especially as it relates to applications in data science. Students will learn to use hypothesis tests to make informed decisions from data. Special attention will be given to the general logic of hypothesis testing, error and error rates, power, simulation, and the correct computation and interpretation of p-values. Attention will also be given to the misuse of testing concepts, especially p-values, and the ethical implications of such misuse.

This course can be taken for academic credit as part of CU Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU Boulder’s departments of Applied Mathematics, Computer Science, Information Science, and others. With performance-based admissions and no application process, the MS-DS is ideal for individuals with a broad range of undergraduate education and/or professional experience in computer science, information science, mathematics, and statistics. Learn more about the MS-DS program at https://www.coursera.org/degrees/master-of-science-data-science-boulder.

Welcome to the course! This module contains logistical information to get you started!

What's included

3 readings1 discussion prompt1 ungraded lab

In this module, we will define a hypothesis test and develop the intuition behind designing a test. We will learn the language of hypothesis testing, which includes definitions of a null hypothesis, an alternative hypothesis, and the level of significance of a test. We will walk through a very simple test.

What's included

6 videos11 readings1 quiz1 programming assignment2 ungraded labs

6 videosTotal 69 minutes

What is Hypothesis Testing?3 minutesPreview module
Types of Hypotheses14 minutes
Normal Computations23 minutes
Errors in Hypothesis Testing7 minutes
Test Statistics and Significance14 minutes
A First Test4 minutes

11 readingsTotal 105 minutes

What is Hypothesis Testing?5 minutes
Types of Hypotheses10 minutes
Video Slides for Types of Hypotheses10 minutes
Normal Computations10 minutes
Video Slides for Normal Computations10 minutes
Errors in Hypothesis Testing10 minutes
Video Slides for Errors in Hypothesis Testing10 minutes
Test Statistics and Significance10 minutes
Video Slides for Test Statistics and Level of Significance10 minutes
A First Test10 minutes
Video Slides for A First Test10 minutes

1 quizTotal 30 minutes

Introduction to Hypothesis Testing30 minutes

1 programming assignmentTotal 180 minutes

Intro to Hypothesis Testing Lab180 minutes

2 ungraded labsTotal 120 minutes

An Introduction to R and Jupyter Notebooks60 minutes
Visualizing Errors in Hypothesis Testing60 minutes

In this module, we will expand the lessons of Module 1 to composite hypotheses for both one and two-tailed tests. We will define the “power function” for a test and discuss its interpretation and how it can lead to the idea of a “uniformly most powerful” test. We will discuss and interpret “p-values” as an alternate approach to hypothesis testing.

What's included

7 videos7 readings1 assignment1 programming assignment1 ungraded lab

7 videosTotal 124 minutes

Composite Hypotheses and Level of Significance16 minutesPreview module
One-Tailed Tests20 minutes
Power Functions13 minutes
Hypothesis Testing with P-Values21 minutes
Two Tailed Tests12 minutes
CLT: A Brief Review16 minutes
Hypothesis Tests for Proportions23 minutes

7 readingsTotal 70 minutes

Video Slides for Composite Hypotheses and Level of Significance10 minutes
Video Slides for One-Tailed Tests10 minutes
Video Slides for Power Functions10 minutes
Video Slides for Hypothesis Testing with P-Values10 minutes
Video Slides for Two-Tailed Tests10 minutes
Video Slides for CLT: A Brief Review10 minutes
Video Slides for Hypothesis Tests for Proportions10 minutes

1 assignmentTotal 30 minutes

Constructing Tests30 minutes

1 programming assignmentTotal 180 minutes

The Basics of Hypothesis Testing180 minutes

1 ungraded labTotal 60 minutes

Distributions of P-Values60 minutes

In this module, we will learn about the chi-squared and t distributions and their relationships to sampling distributions. We will learn to identify when hypothesis tests based on these distributions are appropriate. We will review the concept of sample variance and derive the “t-test”. Additionally, we will derive our first two-sample test and apply it to make some decisions about real data.

What's included

7 videos7 readings1 assignment1 programming assignment1 ungraded lab

7 videosTotal 139 minutes

The t and Chi-Squared Distributions41 minutesPreview module
The Sample Variance for the Normal Distribution23 minutes
t-Tests18 minutes
Two Sample Tests for Means15 minutes
Two Sample t-Tests for a Difference of Means17 minutes
Welch's t-Test and Paired Data13 minutes
Comparing Population Proportions8 minutes

7 readingsTotal 70 minutes

Video Slides for the t and Chi-Squared Distributions10 minutes
Video Slides for the Sample Variance and the Normal Distribution10 minutes
Video Slides for t-Tests10 minutes
Video Slides for Two Sample Tests for Means10 minutes
Video Slides for Differences in Population Means10 minutes
Video Slides for Welch's Test and Paired Data10 minutes
Video Slides for Comparing Population Proportions10 minutes

1 assignmentTotal 30 minutes

More Hypothesis Tests!30 minutes

1 programming assignmentTotal 180 minutes

t-Tests180 minutes

1 ungraded labTotal 60 minutes

t-Tests and Two Sample Tests60 minutes

In this module, we will consider some problems where the assumption of an underlying normal distribution is not appropriate and will expand our ability to construct hypothesis tests for this case. We will define the concept of a “uniformly most powerful” (UMP) test, whether or not such a test exists for specific problems, and we will revisit some of our earlier tests from Modules 1 and 2 through the UMP lens. We will also introduce the F-distribution and its role in testing whether or not two population variances are equal.

What's included

6 videos6 readings2 assignments

6 videosTotal 117 minutes

Properties of the Exponential Distribution13 minutesPreview module
Two Tests27 minutes
Best Tests22 minutes
UMP Tests10 minutes
A Test for the Variance of the Normal Distribution12 minutes
The F-Distribution and a Ratio of Variances31 minutes

6 readingsTotal 60 minutes

Video Slides for Properties of the Exponential Distribution10 minutes
Video Slides for Two Hypothesis Tests for the Exponential10 minutes
Video Slides for Best Tests10 minutes
Video Slides for UMP Tests10 minutes
Video Slides for a Normal Variance Test10 minutes
Video Slides for an F-Distribution and a Ratio of Variances10 minutes

2 assignmentsTotal 60 minutes

Best Tests and Some General Skills30 minutes
Uniformly Most Powerful Tests and F-Tests30 minutes

In this module, we develop a formal approach to hypothesis testing, based on a “likelihood ratio” that can be more generally applied than any of the tests we have discussed so far. We will pay special attention to the large sample properties of the likelihood ratio, especially Wilks’ Theorem, that will allow us to come up with approximate (but easy) tests when we have a large sample size. We will close the course with two chi-squared tests that can be used to test whether the distributional assumptions we have been making throughout this course are valid.