University of California, Davis
Distributed Computing with Spark SQL
University of California, Davis

Distributed Computing with Spark SQL

Brooke Wenig
Conor Murphy

Instructors: Brooke Wenig

49,194 already enrolled

Included with Coursera Plus

Gain insight into a topic and learn the fundamentals.
4.4

(687 reviews)

Intermediate level
Some related experience required
Flexible schedule
Approx. 8 hours
Learn at your own pace
86%
Most learners liked this course
Gain insight into a topic and learn the fundamentals.
4.4

(687 reviews)

Intermediate level
Some related experience required
Flexible schedule
Approx. 8 hours
Learn at your own pace
86%
Most learners liked this course

What you'll learn

  • Use the collaborative Databricks workspace to write scalable Spark SQL code that executes against a cluster of machines

  • Inspect the Spark UI to analyze query performance and identify bottlenecks

  • Create an end-to-end pipeline that reads data, transforms it, and saves the result

  • Build a medallion (bronze, silver, gold) lakehouse architecture with Delta Lake to ensure the reliability, scalability, and performance of your data

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

4 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Placeholder

Build your subject-matter expertise

This course is part of the Learn SQL Basics for Data Science Specialization
When you enroll in this course, you'll also be enrolled in this Specialization.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate
Placeholder
Placeholder

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

Placeholder

There are 4 modules in this course

In this module, you will be able to discuss the core concepts of distributed computing and be able to recognize when and where to apply them. You'll be able to identify the basic data structure of Apache Spark™, known as a DataFrame. Additionally, you'll review the collaborative Databricks workspace.

What's included

6 videos3 readings1 assignment1 discussion prompt

In this module, you will be able to explain the core concepts of Spark. We'll discuss common ways to increase query performance by caching data and modifying Spark configurations. We'll also review the Spark UI to analyze performance and identify bottlenecks, as well as optimize queries with Adaptive Query Execution.

What's included

6 videos1 reading1 assignment

In this module, you will be able to identify and discuss the general demands of data applications. You'll be able to review data in a variety of formats and compare and contrast the tradeoffs between these formats. You will explore and examine semi-structured JSON data (common in big data environments) as well as schemas and parallel data writes. You will be able to understand an end-to-end pipeline that reads data, transforms it, and how it saves the result.

What's included

7 videos1 reading1 assignment

In this module, you will identify the key characteristics of data lakes, data warehouses, and lakehouses. Lakehouses combine the scalability and low-cost storage of data lakes with the speed and ACID transactional guarantees of data warehouses. You will review a production grade lakehouse combined with Spark in an open-source project, Delta Lake. Whoever said time travel isn't possible hasn't been to a lakehouse!

What's included

8 videos1 reading1 assignment1 discussion prompt

Instructors

Instructor ratings
4.6 (151 ratings)
Brooke Wenig
University of California, Davis
1 Course49,194 learners

Offered by

Recommended if you're interested in Data Analysis

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

Showing 3 of 687

4.4

687 reviews

  • 5 stars

    64.38%

  • 4 stars

    23.11%

  • 3 stars

    6.39%

  • 2 stars

    2.32%

  • 1 star

    3.77%

OD
5

Reviewed on Mar 25, 2020

CG
4

Reviewed on May 30, 2022

SK
5

Reviewed on Jun 12, 2022

New to Data Analysis? Start here.

Placeholder

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions