ETL and Data Pipelines with Shell, Airflow and Kafka

Schenken Sie Ihrer Karriere Coursera Plus mit einem Rabatt von $160 , der jährlich abgerechnet wird. Sparen Sie heute.

ETL and Data Pipelines with Shell, Airflow and Kafka

Name: ETL and Data Pipelines with Shell, Airflow and Kafka
Rating: 4.482093663911845 (363 reviews)

Dieser Kurs ist Teil mehrerer Programme.

Dozenten: Jeff Grossman

49.605 bereits angemeldet

Bei Coursera Plus enthalten

Mehr erfahren

5 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

4.5

(363 Bewertungen)

Stufe Mittel

Empfohlene Erfahrung

Flexibler Zeitplan

Ca. 17 Stunden

In Ihrem eigenen Lerntempo lernen

87%

Den meisten Lernenden gefiel dieser Kurs

5 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

4.5

(363 Bewertungen)

Stufe Mittel

Empfohlene Erfahrung

Flexibler Zeitplan

Ca. 17 Stunden

In Ihrem eigenen Lerntempo lernen

87%

Den meisten Lernenden gefiel dieser Kurs

Was Sie lernen werden

Describe and contrast Extract, Transform, Load (ETL) processes and Extract, Load, Transform (ELT) processes.
Explain batch vs concurrent modes of execution.
Implement ETL workflow through bash and Python functions.
Describe data pipeline components, processes, tools, and technologies.

Kompetenzen, die Sie erwerben

Kategorie: Extract Transform and Load (ETL)
Kategorie: Data Engineer
Kategorie: Apache Kafka
Kategorie: Apache Airflow
Kategorie: Data Pipelines

Wichtige Details

Zertifikat zur Vorlage

Zu Ihrem LinkedIn-Profil hinzufügen

Bewertungen

11 Aufgaben

Unterrichtet in Englisch

Erfahren Sie, wie Mitarbeiter führender Unternehmen gefragte Kompetenzen erwerben.

Weitere Informationen zu Coursera für Unternehmen

Erweitern Sie Ihre Fachkenntnisse

Dieser Kurs ist als Teil verfügbar

Wenn Sie sich für diesen Kurs anmelden, müssen Sie auch ein bestimmtes Programm auswählen.

Lernen Sie neue Konzepte von Branchenexperten
Gewinnen Sie ein Grundverständnis bestimmter Themen oder Tools
Erwerben Sie berufsrelevante Kompetenzen durch praktische Projekte
Erwerben Sie ein Berufszertifikat zur Vorlage

Erwerben Sie ein Karrierezertifikat.

Fügen Sie diese Qualifikation zur Ihrem LinkedIn-Profil oder Ihrem Lebenslauf hinzu.

Teilen Sie es in den sozialen Medien und in Ihrer Leistungsbeurteilung.

In diesem Kurs gibt es 5 Module

Delve into the two different approaches to converting raw data into analytics-ready data. One approach is the Extract, Transform, Load (ETL) process. The other contrasting approach is the Extract, Load, and Transform (ELT) process. ETL processes apply to data warehouses and data marts. ELT processes apply to data lakes, where the data is transformed on demand by the requesting/calling application.

In this course, you will learn about the different tools and techniques that are used with ETL and Data pipelines. Both ETL and ELT extract data from source systems, move the data through the data pipeline, and store the data in destination systems. During this course, you will experience how ELT and ETL processing differ and identify use cases for both. You will identify methods and tools used for extracting the data, merging extracted data either logically or physically, and for loading data into data repositories. You will also define transformations to apply to source data to make the data credible, contextual, and accessible to data users. You will be able to outline some of the multiple methods for loading data into the destination system, verifying data quality, monitoring load failures, and the use of recovery mechanisms in case of failure. By the end of this course, you will also know how to use Apache Airflow to build data pipelines as well be knowledgeable about the advantages of using this approach. You will also learn how to use Apache Kafka to build streaming pipelines as well as the core components of Kafka which include: brokers, topics, partitions, replications, producers, and consumers. Finally, you will complete a shareable final project that enables you to demonstrate the skills you acquired in each module.

ETL or Extract, Transform, and Load processes are used for cases where flexibility, speed, and scalability of data are important. You will explore some key differences between similar processes, ETL and ELT, which include the place of transformation, flexibility, Big Data support, and time-to-insight. You will learn that there is an increasing demand for access to raw data that drives the evolution from ETL to ELT. Data extraction involves advanced technologies including database querying, web scraping, and APIs. You will also learn that data transformation is about formatting data to suit the application and that data is loaded in batches or streamed continuously.

Das ist alles enthalten

7 Videos2 Lektüren2 Aufgaben1 Plug-in

7 VideosInsgesamt 32 Minuten

Course Intro video5 MinutenModulvorschau
ETL Fundamentals5 Minuten
ELT Basics4 Minuten
Comparing ETL and ELT4 Minuten
Data Extraction Techniques4 Minuten
Introduction to Data Transformation Techniques4 Minuten
Data Loading Techniques3 Minuten

2 LektürenInsgesamt 7 Minuten

Course Introduction4 Minuten
Summary & Highlights3 Minuten

2 AufgabenInsgesamt 40 Minuten

ETL and ELT Processes10 Minuten
Graded Quiz: ETL and ELT Processes30 Minuten

1 Plug-inInsgesamt 5 Minuten

Interactivity: Tell the Difference between ETL and ELT5 Minuten

Extract, transform and load (ETL) pipelines are created with Bash scripts that can be run on a schedule using cron. Data pipelines move data from one place, or form, to another. Data pipeline processes include scheduling or triggering, monitoring, maintenance, and optimization. Furthermore, Batch pipelines extract and operate on batches of data. Whereas streaming data pipelines ingest data packets one-by-one in rapid succession. In this module, you will learn that streaming pipelines apply when the most current data is needed. You will explore that parallelization and I/O buffers help mitigate bottlenecks. You will also learn how to describe data pipeline performance in terms of latency and throughput.

Das ist alles enthalten

5 Videos4 Lektüren4 Aufgaben1 App-Element1 Plug-in

5 VideosInsgesamt 25 Minuten

ETL Using Shell Scripting4 MinutenModulvorschau
Introduction to Data Pipelines4 Minuten
Key Data Pipeline Processes4 Minuten
Batch versus Streaming Data Pipeline Use Cases4 Minuten
Data Pipeline Tools and Technologies6 Minuten

4 LektürenInsgesamt 15 Minuten

Linux Commands and Shell Scripting2 Minuten
ETL Techniques10 Minuten
Summary & Highlights1 Minute
Summary & Highlights2 Minuten

4 AufgabenInsgesamt 80 Minuten

Practice Quiz: ETL using Shell Scripts10 Minuten
Practice Quiz: An Introduction to Data Pipelines10 Minuten
Graded Quiz: ETL using Shell Scripts30 Minuten
Graded Quiz: An Introduction to Data Pipelines30 Minuten

1 App-ElementInsgesamt 30 Minuten

Hands-On Lab: ETL using Shell Scripts30 Minuten

1 Plug-inInsgesamt 10 Minuten

Interactivity: Differentiate between Batch Processing and Stream Processing10 Minuten

The key advantage of Apache Airflow's approach to representing data pipelines as DAGs is that they are expressed as code, which makes your data pipelines more maintainable, testable, and collaborative. Tasks, the nodes in a DAG, are created by implementing Airflow's built-in operators. In this module, you will learn about Apache Airflow having a rich UI that simplifies working with data pipelines. You will explore how to visualize your DAG in graph or tree mode. You will also learn about the key components of a DAG definition file, and you will learn that Airflow logs are saved into local file systems and then sent to cloud storage, search engines, and log analyzers.

Das ist alles enthalten

5 Videos1 Lektüre2 Aufgaben4 App-Elemente1 Plug-in

5 VideosInsgesamt 25 Minuten

Apache Airflow Overview6 MinutenModulvorschau
Advantages of Representing Data Pipelines as DAGs in Apache Airflow6 Minuten
Apache Airflow UI3 Minuten
Build a DAG Using Airflow4 Minuten
Airflow Logging and Monitoring4 Minuten

1 LektüreInsgesamt 3 Minuten

Summary & Highlights3 Minuten

2 AufgabenInsgesamt 40 Minuten

Practice Quiz: Building Data Pipelines using Airflow10 Minuten
Graded Quiz: Building Data Pipelines using Airflow30 Minuten

4 App-ElementeInsgesamt 120 Minuten

Hands-on Lab: Getting Started with Apache Airflow20 Minuten
Hands-on Lab: Create a DAG for Apache Airflow with PythonOperator40 Minuten
Hands-on Lab: Create a DAG for Apache Airflow with BashOperator40 Minuten
Hands-on Lab: Monitoring a DAG20 Minuten

1 Plug-inInsgesamt 15 Minuten

Reading: DAG Structure and Operators15 Minuten

Apache Kafka is a very popular open source event streaming pipeline. An event is a type of data that describes the entity’s observable state updates over time. Popular Kafka service providers include Confluent Cloud, IBM Event Stream, and Amazon MSK. Additionally, Kafka Streams API is a client library supporting you with data processing in event streaming pipelines. In this module, you will learn that the core components of Kafka are brokers, topics, partitions, replications, producers, and consumers. You will explore two special types of processors in the Kafka Stream API stream-processing topology: The source processor and the sink processor. You will also learn about building event streaming pipelines using Kafka.

Das ist alles enthalten

4 Videos1 Lektüre2 Aufgaben3 App-Elemente1 Plug-in

4 VideosInsgesamt 26 Minuten

Distributed Event Streaming Platform Components5 MinutenModulvorschau
Apache Kafka Overview6 Minuten
Building Event Streaming Pipelines using Kafka9 Minuten
Kafka Streaming Process5 Minuten

1 Lektüre

Summary & Highlights0 Minuten

2 AufgabenInsgesamt 40 Minuten

Practice Quiz: Building Streaming Pipelines using Kafka10 Minuten
Graded Quiz: Building Streaming Pipelines using Kafka30 Minuten

3 App-ElementeInsgesamt 90 Minuten

Hands-on Lab: Working with Streaming Data using Kafka20 Minuten
[Optional] Hands-on Lab: Kafka Message Keys and Offset40 Minuten
[Optional] Hands-on Lab: Kafka Python Client30 Minuten

1 Plug-inInsgesamt 30 Minuten

Kafka Python Client30 Minuten

In this final assignment module, you will apply your newly gained knowledge to explore two very exciting hands-on labs. “Creating ETL Data Pipelines using Apache Airflow” and “Creating Streaming Data Pipelines using Kafka”. You will explore building these ETL pipelines using real-world scenarios. You will extract, transform, and load data into a CSV file. You will also create a topic named “toll” in Apache Kafka, download and customize a streaming data consumer, as well as verifying that streaming data has been collected in the database table.

Das ist alles enthalten

4 Lektüren1 Aufgabe1 peer review3 App-Elemente

4 LektürenInsgesamt 24 Minuten

Project Overview10 Minuten
Graded Timed Final Exam Instructions10 Minuten
Congrats & Next Steps2 Minuten
Thanks from the Course Team2 Minuten

1 AufgabeInsgesamt 90 Minuten

Timed Final Quiz 90 Minuten

1 peer reviewInsgesamt 60 Minuten

Peer Review: Project Submission and Peer Review60 Minuten

3 App-ElementeInsgesamt 225 Minuten

Hands-on Lab: Build ETL Data Pipelines with BashOperator using Apache Airflow90 Minuten
[Optional] Hands-on Lab: Build an ETL Pipeline using PythonOperator with Apache Airflow90 Minuten
[Optional] Hands-on Lab: Build a Streaming ETL Pipeline using Kafka45 Minuten

Dozenten

Lehrkraftbewertungen

4.7 (100 Bewertungen)

Jeff Grossman

IBM

2 Kurse62.199 Lernende

Yan Luo

IBM

7 Kurse316.242 Lernende

von

IBM

Warum entscheiden sich Menschen für Coursera für ihre Karriere?

Felipe M.

Lernender seit 2018

„Es ist eine großartige Erfahrung, in meinem eigenen Tempo zu lernen. Ich kann lernen, wenn ich Zeit und Nerven dazu habe.“

Jennifer J.

Lernender seit 2020

„Bei einem spannenden neuen Projekt konnte ich die neuen Kenntnisse und Kompetenzen aus den Kursen direkt bei der Arbeit anwenden.“

Larry W.

Lernender seit 2021

„Wenn mir Kurse zu Themen fehlen, die meine Universität nicht anbietet, ist Coursera mit die beste Alternative.“

Chaitanya A.

„Man lernt nicht nur, um bei der Arbeit besser zu werden. Es geht noch um viel mehr. Bei Coursera kann ich ohne Grenzen lernen.“

Bewertungen von Lernenden

Zeigt 3 von 363

4.5

363 Bewertungen

5 stars
70,02 %
4 stars
17,16 %
3 stars
7,08 %
2 stars
2,99 %
1 star
2,72 %

Geprüft am 23. Apr. 2022

Geprüft am 31. März 2023

Geprüft am 12. Juli 2023

Weitere Bewertungen anzeigen

Neue Karrieremöglichkeiten mit Coursera Plus

Unbegrenzter Zugang zu über 7.000 erstklassigen Kursen, praktischen Projekten und Zertifikatsprogrammen, die Sie auf den Beruf vorbereiten – alles in Ihrem Abonnement enthalten

Mehr erfahren

Bringen Sie Ihre Karriere mit einem Online-Abschluss voran.

Erwerben Sie einen Abschluss von erstklassigen Universitäten – 100 % online

Erkunden Sie die Abschlüsse

Schließen Sie sich mehr als 3.400 Unternehmen in aller Welt an, die sich für Coursera for Business entschieden haben.

Schulen Sie Ihre Mitarbeiter*innen, um sich in der digitalen Wirtschaft zu behaupten.

Mehr erfahren

Häufig gestellte Fragen

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.

Weitere Fragen

Besuchen Sie die das Hilfe-Center für Kursteilnehmer.

ETL and Data Pipelines with Shell, Airflow and Kafka

Was Sie lernen werden

Kompetenzen, die Sie erwerben

Wichtige Details

Erfahren Sie, wie Mitarbeiter führender Unternehmen gefragte Kompetenzen erwerben.

Erweitern Sie Ihre Fachkenntnisse

Erwerben Sie ein Karrierezertifikat.

In diesem Kurs gibt es 5 Module

Data Processing Techniques

Das ist alles enthalten

ETL & Data Pipelines: Tools and Techniques

Das ist alles enthalten

Building Data Pipelines using Airflow

Das ist alles enthalten

Building Streaming Pipelines using Kafka

Das ist alles enthalten

Final Assignment

Das ist alles enthalten

Dozenten

von

Empfohlen, wenn Sie sich für Data Management interessieren

Data Engineering: Pipelines, ETL, Hadoop

Engineering Data Ecosystems: Pipelines, ETL, Spark

Building Batch Data Pipelines on Google Cloud - 한국어

Source Systems, Data Ingestion, and Pipelines

Warum entscheiden sich Menschen für Coursera für ihre Karriere?

Bewertungen von Lernenden

Neue Karrieremöglichkeiten mit Coursera Plus

Bringen Sie Ihre Karriere mit einem Online-Abschluss voran.

Schließen Sie sich mehr als 3.400 Unternehmen in aller Welt an, die sich für Coursera for Business entschieden haben.

Häufig gestellte Fragen

When will I have access to the lectures and assignments?

What will I get if I subscribe to this Certificate?

What is the refund policy?

Weitere Fragen