Data Processing and Manipulation

Data Processing and Manipulation

This course is part of Data Wrangling with Python Specialization

Instructor: Di Wu

Access provided by Coursera Learning Team

4 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

3 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

4 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

3 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Understand the importance of data processing and manipulation in the data analysis pipeline.
Learn techniques to handle missing values and outliers, data reduction, and data scaling and discretization.
Understand the concept of data cube and perform multidimensional aggregation for exploratory analysis.

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

6 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Data Wrangling with Python Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 4 modules in this course

The "Data Processing and Manipulation" course provides students with a comprehensive understanding of various data processing and manipulation concepts and tools. Participants will learn how to handle missing values, detect outliers, perform sampling and dimension reduction, apply scaling and discretization techniques, and explore data cube and pivot table operations. This course equips students with essential skills for efficiently preparing and transforming data for analysis and decision-making.

Learning Objectives: 1. Understand the importance of data processing and manipulation in the data analysis pipeline. 2. Learn techniques to handle missing values in datasets, including imputation and exclusion strategies. 3. Identify and detect outliers to assess their impact on data analysis and decision-making. 4. Explore sampling methods and dimension reduction techniques for large datasets and high-dimensional data. 5. Apply data scaling techniques to normalize and standardize variables for meaningful comparisons. 6. Utilize discretization to transform continuous data into categorical representations, simplifying analysis. 7. Understand the concept of data cube and perform multidimensional aggregation for exploratory analysis. 8. Create pivot tables to summarize and reshape data, gaining valuable insights from complex datasets. Throughout the course, students will actively engage in practical exercises and projects, allowing them to apply data processing and manipulation techniques to real-world datasets. By the end of the course, participants will be well-equipped to effectively prepare, clean, and transform data for subsequent analysis tasks and data-driven decision-making.

The "Missing Values and Outliers" week focuses on how to handle missing values and detect outliers using the Pandas library. You will learn essential techniques to identify and address missing data effectively, as well as methods to detect and manage outliers in datasets.

What's included

3 videos5 readings2 assignments1 discussion prompt

3 videos Total 33 minutes

Missing Values 20 minutes
Outliers Detection using Statistics 6 minutes
Outliers Detection using IQR 8 minutes

5 readings Total 220 minutes

Assessment Strategy 30 minutes
Activity Strategy 10 minutes
Missing Values Demo 60 minutes
Outliers Detection using Statistics Demo 60 minutes
Outliers Detection using IQR 60 minutes

2 assignments Total 60 minutes

Missing Values Quiz 30 minutes
Outliers Detection Quiz 30 minutes

1 discussion prompt Total 120 minutes

Missing Value and Outliers Detection Exploration Exercise 120 minutes

The "Data Reduction" week focuses on how to reduce data through sampling and dimensionality reduction using the Pandas library. You will learn essential techniques to obtain manageable subsets of data while preserving meaningful information for analysis and visualization.

What's included

2 videos3 readings1 assignment1 discussion prompt

The "Scaling and Discretization" week focuses on the importance of data scaling and discretization in the data preprocessing process. You will learn why and how to perform data scaling to normalize variables and handle data with different scales. Additionally, you will explore the concept of data discretization and its application in transforming continuous data into categorical representations.

What's included

2 videos3 readings1 assignment1 discussion prompt

2 videos Total 24 minutes

Data Scaling 12 minutes
Data Discretization 12 minutes

3 readings Total 240 minutes

Data Scaling Demo 60 minutes
Data Discretization Demo 60 minutes
Scaling and Discretization Case Study 120 minutes

1 assignment Total 30 minutes

Scaling and Discretization Quiz 30 minutes

1 discussion prompt Total 120 minutes

Scaling and Discretization Exploration Exercise 120 minutes

The "Data Warehouse" week focuses on the concepts and methodologies of organizing data using data cubes and pivot tables in Pandas. You will learn the importance of data warehousing for efficient data management and analysis, as well as how to construct data cubes and pivot tables to facilitate multidimensional data exploration.