This course is all about big data. It’s for students with SQL experience that want to take the next step on their data journey by learning distributed computing using Apache Spark. Students will gain a thorough understanding of this open-source standard for working with large datasets. Students will gain an understanding of the fundamentals of data analysis using SQL on Spark, setting the foundation for how to combine data with advanced analytics at scale and in production environments. The four modules build on one another and by the end of the course you will understand: the Spark architecture, queries within Spark, common ways to optimize Spark SQL, and how to build reliable data pipelines.
Offrez à votre carrière le cadeau de Coursera Plus avec $160 de réduction, facturé annuellement. Économisez aujourd’hui.
Distributed Computing with Spark SQL
Ce cours fait partie de Spécialisation Learn SQL Basics for Data Science
Instructeurs : Brooke Wenig
49 194 déjà inscrits
Inclus avec
(687 avis)
Ce que vous apprendrez
Use the collaborative Databricks workspace to write scalable Spark SQL code that executes against a cluster of machines
Inspect the Spark UI to analyze query performance and identify bottlenecks
Create an end-to-end pipeline that reads data, transforms it, and saves the result
Build a medallion (bronze, silver, gold) lakehouse architecture with Delta Lake to ensure the reliability, scalability, and performance of your data
Compétences que vous acquerrez
- Catégorie : Data Science
- Catégorie : SQL
- Catégorie : Apache Spark
- Catégorie : Delta Lake
Détails à connaître
Ajouter à votre profil LinkedIn
4 devoirs
Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées
Élaborez votre expertise du sujet
- Apprenez de nouveaux concepts auprès d'experts du secteur
- Acquérez une compréhension de base d'un sujet ou d'un outil
- Développez des compétences professionnelles avec des projets pratiques
- Obtenez un certificat professionnel partageable
Obtenez un certificat professionnel
Ajoutez cette qualification à votre profil LinkedIn ou à votre CV
Partagez-le sur les réseaux sociaux et dans votre évaluation de performance
Il y a 4 modules dans ce cours
In this module, you will be able to discuss the core concepts of distributed computing and be able to recognize when and where to apply them. You'll be able to identify the basic data structure of Apache Spark™, known as a DataFrame. Additionally, you'll review the collaborative Databricks workspace.
Inclus
6 vidéos3 lectures1 devoir1 sujet de discussion
In this module, you will be able to explain the core concepts of Spark. We'll discuss common ways to increase query performance by caching data and modifying Spark configurations. We'll also review the Spark UI to analyze performance and identify bottlenecks, as well as optimize queries with Adaptive Query Execution.
Inclus
6 vidéos1 lecture1 devoir
In this module, you will be able to identify and discuss the general demands of data applications. You'll be able to review data in a variety of formats and compare and contrast the tradeoffs between these formats. You will explore and examine semi-structured JSON data (common in big data environments) as well as schemas and parallel data writes. You will be able to understand an end-to-end pipeline that reads data, transforms it, and how it saves the result.
Inclus
7 vidéos1 lecture1 devoir
In this module, you will identify the key characteristics of data lakes, data warehouses, and lakehouses. Lakehouses combine the scalability and low-cost storage of data lakes with the speed and ACID transactional guarantees of data warehouses. You will review a production grade lakehouse combined with Spark in an open-source project, Delta Lake. Whoever said time travel isn't possible hasn't been to a lakehouse!
Inclus
8 vidéos1 lecture1 devoir1 sujet de discussion
Instructeurs
Offert par
Recommandé si vous êtes intéressé(e) par Data Analysis
Coursera Project Network
Coursera Project Network
University of California, Davis
Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?
Avis des étudiants
Affichage de 3 sur 687
687 avis
- 5 stars
64,38 %
- 4 stars
23,11 %
- 3 stars
6,39 %
- 2 stars
2,32 %
- 1 star
3,77 %
Ouvrez de nouvelles portes avec Coursera Plus
Accès illimité à plus de 7 000 cours de renommée internationale, à des projets pratiques et à des programmes de certificats reconnus sur le marché du travail, tous inclus dans votre abonnement
Faites progresser votre carrière avec un diplôme en ligne
Obtenez un diplôme auprès d’universités de renommée mondiale - 100 % en ligne
Rejoignez plus de 3 400 entreprises mondiales qui ont choisi Coursera pour les affaires
Améliorez les compétences de vos employés pour exceller dans l’économie numérique
Foire Aux Questions
Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:
The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.
If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.