Chevron Left
Back to Source Systems, Data Ingestion, and Pipelines

Learner Reviews & Feedback for Source Systems, Data Ingestion, and Pipelines by DeepLearning.AI

4.7
stars
50 ratings

About the Course

In this course, you will explore various types of source systems, learn how they generate and update data, and troubleshoot common issues you might encounter when trying to connect to these systems in the real world. You’ll dive into the details of common ingestion patterns and implement batch and streaming pipelines. You’ll automate and orchestrate your data pipelines using infrastructure as code and pipelines as code tools. You’ll also explore AWS and open source tools for monitoring your data systems and data quality....

Top reviews

IH

Nov 15, 2024

Excellent course, with up to date technology, interesting labs and challenging quizzes. Highly recommended.

TS

Nov 23, 2024

Really valuable, and I got an idea of data-related concepts and infrastructure management.

Filter by:

1 - 12 of 12 Reviews for Source Systems, Data Ingestion, and Pipelines

By Syd F

•

Nov 1, 2024

You will not write code in this course. You will perform the most rudimentary code replacements in functions (IE replace "None" with "table_name", this isn't even a joke, see Week 1 Lab 2) and run Jupyter Notebook cells that are already written for you. Truly disappointed.

By Ronobir D

•

Dec 1, 2024

Great course from Joe and the team at DL. The interviews were great and the content itself is really good especially paired with the readings from the textbook. The interviews and the resources are fantastic and greatly appreciated. And the quizzes were helpful if a bit easy as a comprehensive review of the weekly material. I'd recommend everyone to watch the lectures, read each chapter that's assigned for that week and look through the links both in the weekly resources section AND in the accompanying book chapter. That said the biggest weakness would be the labs. The lab material on face value is actually very realistic, if we were given unlimited time to work on each component that is. Basically each lab is a self contained spun up environment on AWS with 99% of the python code provided and you just need to fill in the blanks. To realistically do the content of each lab on AWS would take at minimum multiple if not tens of hours for beginners and I don't know how they could set AWS instances to just run that long like that in a sandbox. So I understand why the limitation is that way, if you look at Google their BI coursera certificate just has you doing the "labs" with your own credits or free trial and are pretty weak or Google Cloud Skills boost which is also just a timed environment. I think this strikes the best balance between those two but all of them are ineffectual in my opinion. You would ideally just stumble through the lab and do the readings provided in the lab and then afterwards look at the material like the API docs on your own time outside of the lab. There's no way to absorb everything in the lab itself. They do however provide great worked (and well functioning) examples including best practices so when you do EVENTUALLY learn data engineering and the AWS stack enough and look back you'll have some great examples, probably better than most stuff you can google, of how to deploy good End 2 End pipelines. That's no small feat, this isn't some garbage code thrown together by who knows what with bad code. These are multicomponent infra and pipeline included projects following best practices whether it's logging or data quality checks of embracing multiple features of each technology and i can promise you most beginner tutorials do not cover anywhere close to that. And over the course of 4 courses, it is incredibly comprehensive. Joe doesn't shy away from things like Data Quality or Data Governance or Security which most "isolated" technology tutorials barely have a passing mention on. This is the real deal. The ideal way to do this would be to give a worked example walk through lab which these courses already provide. THEN for the assignment you have all other parts of the pipeline and infra done besides what you're learning and then you have to implement that, no python provided. Or, nothing is provided besides the source dataset and you have to set everything up for each assignment. It would take MUCH longer to do that but that's exactly what you'll be doing on the job. I'd probably recommend this entire course series to those who are either completely new to data engineering, or are reviewing before applying to a junior DE role. Probably not in between. For example, the section on Airflow? Fantastic summary with great examples in lecture and Joe distilled the content of Airflow down well. Honestly Joe's lectures were better than a lot of the lectures I found from Astronomer's Airflow Academy and Youtube. Clear, concise with code explanations and good summary writeups. That being said the final lab is a lightning round of capturing a majority of Airflows features. To give a comparison if you were to work through the learning modules on Astronomer's Academy it would cover about 20 modules of material!!! That's probably 5 hours of video lecture material and then at least 5 hours if not more of practicing and reading the documents on your own Airflow environment. That's 10 HOURS (not including the time of debugging Airflow for beginners which is multiple hours) covered by a handful of lecture material and summary notes and 99% worked through labs. You can see what I mean, great if you've never heard of orchestration in your life or you already learned airflow and need a review. I know because I have done just that. I was already familiar with the Airflow section before looking at the last week material. Joe did a fantastic job really teaching Airflow quite well in my opinion between the lectures and the readings. That said even with that combined with the AWS MWAA deployment there is no way you'd be able to do the lab if everything wasn't already provided for you. To reiterate, Joe is a fantastic teacher and the content is great. The rigour though is definitely missing. I would personally work through this certificate either at the very beginning of your data engineering journey to get a very broad introduction combined with the book and all the many resources. Or I would do this at the very end once I've already learned about modern databases, Docker, Terraform, at least one Data Quality framework (Soda, GX etc), Airflow, Spark, dbt and at least one of the Big 3's cloud stacks. It provides a great review, great best practices set and a good practical introduction to AWS. But let's say you're in the middle of learning, like you know SQL and python and pandas and you've learned dbt but you haven't hit Spark yet or Airflow then it will be incredibly overwhelming for you to go through this material as it's going to be both a review for things you know and a intro to material you don't without going into the depth you need. So I recommend using it as either an intro or a review, not both.

By Younes A

•

Nov 17, 2024

The course theory is good but it focuses on general concepts. This is a big jump from the material to the lab where you will find yourself with ready jupyter files with missing variable names that you fill to have the exercice done. This approach is not stimulating the learning. It's too passive.

By Francisco Z L

•

Oct 24, 2024

Amazing course, very complete. I learned and tuned many new techniques and good practices to improve my data engineering activities. Really liked the content, and the Labs will take you at least 15 hours to complete, but you will understand how to apply this new knowledge. Learned a few new things about Terraform, and Airflow will be a very valuable tool for my future projects.

By Iain N H

•

Nov 15, 2024

Excellent course, with up to date technology, interesting labs and challenging quizzes. Highly recommended.

By Tharuka V S

•

Nov 24, 2024

Really valuable, and I got an idea of data-related concepts and infrastructure management.

By rashid a

•

Nov 20, 2024

All concepts related to these topics are explained clearly.

By Yosef A W

•

Oct 30, 2024

Great balance between theory and practice.

By Pavel N

•

Nov 16, 2024

Outstanding course. Labs are great !

By Juan C P A

•

Nov 29, 2024

Excellent course. Tks!

By Jose A

•

Dec 1, 2024

Good

By Héctor M

•

Nov 28, 2024

These labs are SO frustrating.. there are TONS of copying and pasting.. and if you make a mistake, you won't realize until the very end, so good luck identifying where the issue is.. and if time runs out, you have to start it over again.. I've lost 2 hours of my time already with the last lab.. I'm getting tired of this course TBH..