This course is all about big data. It’s for students with SQL experience that want to take the next step on their data journey by learning distributed computing using Apache Spark. Students will gain a thorough understanding of this open-source standard for working with large datasets. Students will gain an understanding of the fundamentals of data analysis using SQL on Spark, setting the foundation for how to combine data with advanced analytics at scale and in production environments. The four modules build on one another and by the end of the course you will understand: the Spark architecture, queries within Spark, common ways to optimize Spark SQL, and how to build reliable data pipelines.
Distributed Computing with Spark SQL
This course is part of Learn SQL Basics for Data Science Specialization
Instructors: Brooke Wenig
49,194 already enrolled
Included with
(687 reviews)
What you'll learn
Use the collaborative Databricks workspace to write scalable Spark SQL code that executes against a cluster of machines
Inspect the Spark UI to analyze query performance and identify bottlenecks
Create an end-to-end pipeline that reads data, transforms it, and saves the result
Build a medallion (bronze, silver, gold) lakehouse architecture with Delta Lake to ensure the reliability, scalability, and performance of your data
Skills you'll gain
Details to know
Add to your LinkedIn profile
4 assignments
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review
There are 4 modules in this course
In this module, you will be able to discuss the core concepts of distributed computing and be able to recognize when and where to apply them. You'll be able to identify the basic data structure of Apache Spark™, known as a DataFrame. Additionally, you'll review the collaborative Databricks workspace.
What's included
6 videos3 readings1 assignment1 discussion prompt
In this module, you will be able to explain the core concepts of Spark. We'll discuss common ways to increase query performance by caching data and modifying Spark configurations. We'll also review the Spark UI to analyze performance and identify bottlenecks, as well as optimize queries with Adaptive Query Execution.
What's included
6 videos1 reading1 assignment
In this module, you will be able to identify and discuss the general demands of data applications. You'll be able to review data in a variety of formats and compare and contrast the tradeoffs between these formats. You will explore and examine semi-structured JSON data (common in big data environments) as well as schemas and parallel data writes. You will be able to understand an end-to-end pipeline that reads data, transforms it, and how it saves the result.
What's included
7 videos1 reading1 assignment
In this module, you will identify the key characteristics of data lakes, data warehouses, and lakehouses. Lakehouses combine the scalability and low-cost storage of data lakes with the speed and ACID transactional guarantees of data warehouses. You will review a production grade lakehouse combined with Spark in an open-source project, Delta Lake. Whoever said time travel isn't possible hasn't been to a lakehouse!
What's included
8 videos1 reading1 assignment1 discussion prompt
Instructors
Offered by
Recommended if you're interested in Data Analysis
Why people choose Coursera for their career
Learner reviews
Showing 3 of 687
687 reviews
- 5 stars
64.38%
- 4 stars
23.11%
- 3 stars
6.39%
- 2 stars
2.32%
- 1 star
3.77%
New to Data Analysis? Start here.
Open new doors with Coursera Plus
Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:
The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.
If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.