AWS Data Processing and Analysis

AWS Data Processing and Analysis

This course is part of AWS Certified Data Analytics Specialty (2023) Hands-on Specialization

Instructor: Packt - Course Instructors

Access provided by Coursera Learning Team

2 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

9 hours to complete

Flexible schedule

Learn at your own pace

2 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

9 hours to complete

Flexible schedule

Learn at your own pace

What you'll learn

Integrate and scale data pipelines using AWS Lambda and Glue for efficient data processing
Analyze real-time data streams with Kinesis Analytics and OpenSearch to gain actionable insights
Implement security measures and manage data workflows for high-performance analysis
Analyze real-time data streams with Kinesis Analytics and OpenSearch to gain actionable insights

Skills you'll gain

Data Lakes
Data Warehousing
Apache Hadoop
Real Time Data
Apache Spark
Serverless Computing
Query Languages
Amazon S3
Apache Hive
Data Processing
Data Visualization
AWS Kinesis
Extract, Transform, Load
Data Pipelines
Amazon Web Services
Skills section collapsed. Showing 12 of 15 skills.

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

2 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the AWS Certified Data Analytics Specialty (2023) Hands-on Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 2 modules in this course

Updated in May 2025.

This course now features Coursera Coach! A smarter way to learn with interactive, real-time conversations that help you test your knowledge, challenge assumptions, and deepen your understanding as you progress through the course. This course takes you through the complete process of data handling, starting with AWS data processing services. You’ll begin with AWS Lambda, learning how to integrate serverless functions and manage scalable data pipelines. With practical exercises, you’ll explore how AWS Glue helps automate data preparation and manage complex ETL jobs, making data lake partitioning and modification of Glue Data Catalog easy to understand. Hands-on experience with Glue Studio and DataBrew will further enhance your knowledge in preparing data for analysis. The course also delves into processing large datasets using Amazon EMR, where you’ll work with Apache Spark, Hive, and other tools in the Hadoop ecosystem. You’ll learn to optimize data processing with EMR, partition and store data efficiently, and integrate it with AWS services like Kinesis and Redshift. Exercises in Apache Spark will show you how to analyze data streams and deliver actionable insights in real time. Lastly, you'll focus on the analysis aspect using services like Kinesis Analytics, OpenSearch, and Athena. The course will guide you through setting up advanced analytics using Kinesis, creating real-time monitoring applications, and visualizing data using OpenSearch and QuickSight. By the end of this course, you’ll be well-equipped to build, process, and analyze data pipelines at scale using AWS’s powerful tools. This course is ideal for data engineers, IT professionals, and data analysts aiming to leverage AWS for data processing and analysis. Some familiarity with AWS services is recommended.

In this module, we will delve into AWS processing services, beginning with an introduction to AWS Lambda and Glue. You’ll learn how to integrate these tools for serverless and ETL workflows. We will also explore advanced topics such as Glue ETL job execution, Lambda's cost optimization strategies, and EMR’s integration with other AWS services like Apache Spark, Hive, and Hadoop. Hands-on exercises will cover using Spark with Kinesis and Redshift, and how to process data lakes with EMR.

What's included

35 videos2 readings

35 videos Total 214 minutes

Section Introduction: Processing 1 minute
What Is AWS Lambda? 5 minutes
Lambda Integration - Part 1 5 minutes
Lambda Integration - Part 2 7 minutes
Lambda Costs, Promises, and Anti-Patterns 5 minutes
(Exercise) AWS Lambda 9 minutes
What Is Glue? + Partitioning Your Data Lake 6 minutes
Glue, Hive, and ETL 14 minutes
Modifying the Glue Data Catalog from ETL Scripts 2 minutes
Glue ETL: Developer Endpoints, Running ETL Jobs with Bookmarks 4 minutes
Glue Costs and Anti-Patterns 3 minutes
AWS Glue Studio 5 minutes
AWS Glue Data Quality 3 minutes
AWS Glue DataBrew 9 minutes
AWS Lake Formation 9 minutes
AWS Lake Security 4 minutes
Elastic MapReduce (EMR) Architecture and Usage 9 minutes
EMR, AWS integration, and Storage 8 minutes
EMR Promises; Introduction to Hadoop 8 minutes
EMR Serverless, EMR, and EKS 12 minutes
Introduction to Apache Spark 9 minutes
Spark Integration with Kinesis and Redshift 4 minutes
Spark integration with Athena 3 minutes
Hive on EMR 8 minutes
Pig on EMR 2 minutes
HBase on EMR 4 minutes
Presto on EMR 3 minutes
Zeppelin and EMR Notebooks 5 minutes
Hue, Splunk, and Flume 4 minutes
S3DistCP and Other Services 5 minutes
EMR Security and Instance Types 6 minutes
(Exercise) Elastic MapReduce, Part 1 17 minutes
(Exercise) Elastic MapReduce, Part 2 10 minutes
AWS Data Pipeline 5 minutes
AWS Step Functions 4 minutes

2 readings Total 20 minutes

Introduction to the Course 'AWS Data Processing and Analysis' 10 minutes
Full Specialization Resources 10 minutes

In this module, we will focus on analyzing and querying data using AWS’s powerful analytics services. We begin with an introduction to Kinesis Analytics, OpenSearch, and Athena, followed by performance tuning and security best practices. Through hands-on exercises, you’ll build real-world applications to monitor data streams, optimize queries using Glue and Athena, and perform data warehousing with Redshift. Additionally, we’ll explore Redshift's durability, distribution styles, and newer features like AQUA and serverless options to improve large-scale data analytics.

What's included

32 videos1 reading2 assignments

32 videos Total 220 minutes

Section Introduction: Analysis 1 minute
Introduction to Kinesis Analytics 8 minutes
Kinesis Analytics Costs; RANDOM_CUT_FOREST 2 minutes
(Exercise) Kinesis Analytics, Part 1 7 minutes
(Exercise) Kinesis Analytics, Part 2 10 minutes
(Exercise) Kinesis Analytics, Part 3 17 minutes
(Exercise) Kinesis Analytics, Part 4 5 minutes
Introduction to OpenSearch (formerly Elasticsearch) 11 minutes
Amazon OpenSearch Service 7 minutes
OpenSearch Index Management and Designing for Stability 11 minutes
Amazon OpenSearch Service Performance 2 minutes
Amazon OpenSearch Serverless 2 minutes
(Exercise) Amazon OpenSearch Service 26 minutes
Introduction to Athena 4 minutes
Athena and Glue, Costs, and Security 8 minutes
Athena Performance 2 minutes
Athena ACID Transactions 3 minutes
(Exercise) AWS Glue and Athena 13 minutes
Redshift Introduction and Architecture 9 minutes
Redshift Spectrum and Performance Tuning 5 minutes
Redshift Durability and Scaling 4 minutes
Redshift Distribution Styles 3 minutes
Redshift Sort Keys 3 minutes
Redshift Data Flows and the COPY command 8 minutes
Redshift Integration / WLM / Vacuum / Anti-Patterns 11 minutes
Redshift Resizing (Elastic Versus Classic) and New Redshift Features in 2020 4 minutes
Newer Redshift Features, AQUA 6 minutes
Redshift Security Concerns 2 minutes
Redshift Serverless 7 minutes
(Exercise) Redshift Spectrum, Part 1 8 minutes
(Exercise) Redshift Spectrum, Part 2 6 minutes
Amazon Relational Database Service (RDS) and Aurora 4 minutes