Chevron Left
Back to Introduction to Big Data with Spark and Hadoop

Learner Reviews & Feedback for Introduction to Big Data with Spark and Hadoop by IBM

4.4
stars
406 ratings

About the Course

This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data and its application in big data analytics. You will also gain hands-on experience with big data processing tools like Apache Hadoop and Apache Spark. Bernard Marr defines big data as the digital trace that we are generating in this digital era. You will start the course by understanding what big data is and exploring how insights from big data can be harnessed for a variety of use cases. You’ll also explore how big data uses technologies like parallel processing, scaling, and data parallelism. Next, you will learn about Hadoop, an open-source framework that allows for the distributed processing of large data and its ecosystem. You will discover important applications that go hand in hand with Hadoop, like Distributed File System (HDFS), MapReduce, and HBase. You will become familiar with Hive, a data warehouse software that provides an SQL-like interface to efficiently query and manipulate large data sets. You’ll then gain insights into Apache Spark, an open-source processing engine that provides users with new ways to store and use big data. In this course, you will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the components that make up Apache Spark. You’ll learn about DataFrames and perform basic DataFrame operations and work with SparkSQL. Explore how Spark processes and monitors the requests your application submits and how you can track work using the Spark Application UI. This course has several hands-on labs to help you apply and practice the concepts you learn. You will complete Hadoop and Spark labs using various tools and technologies, including Docker, Kubernetes, Python, and Jupyter Notebooks....

Top reviews

KK

Jan 30, 2024

That is a well packaged course allow you crate bıg data applıcatıon. You can download as pdf files the application hands on practise and follow them and update them depending on ypur own appication

TT

Jan 17, 2025

I have learned a lot from this course, and hopefully it would be helping me throughout my career ahead. Very well designed course, I like the way of teaching, and structured modules.

Filter by:

51 - 75 of 88 Reviews for Introduction to Big Data with Spark and Hadoop

By Rubens T

Jul 12, 2022

Excellent!

By Juan C

Dec 20, 2023

excelente

By Tùng N P

Feb 27, 2024

Great!

By Sadhana G M

Oct 14, 2023

USEFUL

By Le V T K

Feb 22, 2024

good

By PREM C N

Nov 7, 2023

good

By Nguyen V T F

Jul 24, 2023

Nice

By 321910304004 g

Apr 4, 2023

good

By BARIGE S

Mar 24, 2023

good

By Sumit K

Oct 15, 2022

wow

By chukka A

Jan 25, 2022

good

By Shazib S

Sep 9, 2024

Excellent. But would have liked to have more detailed walk through on how to set-up (not just a reading) on my own computer and practice lab on that. Also, too many of the slides were just bullet points and could have been more visualized to ensure architectural themes are easily remembered (e.g. monitoring module was just textual and diagrams could have been more specific - especially linked with Sparks UI more closely than a few images after many bullet vis-a-vis nicely explained job/stage/task hierarchy and how it relates to client mode set-up.)

By Rafael B

Oct 25, 2024

Es un gran curso en donde he podido afianzar conceptos fundamentales tanto en Hadoop como en Spark. De no ser por estar absolutamente atado y dependiente de la plataforma en la nube de IBM, sería un curso perfecto. Me gustaría que en la primera parte del curso enseñaran cómo montar una plataforma localhost y luego pasar a conocer las ventajas de montar todo rápidamente en la muy buena plataforma de IBM.

By Stefan U

Oct 23, 2024

+ great IDE for hands-on labs + good intelligibility of computer-generated voice reading the texts - slightly boring data sets - some factual errors in the quizzes (e.g. radiobuttons instead of checkboxes for questions with multiple correct answers)

By Merouane E

Aug 23, 2022

I like the content about Spark, it was well organised and demonstrated with hands-on lab, but not for Hadoop (HDFS, Hbase etc). plus the long videos with robotic voice makes it hard to concentrate. In general it was a good course

By Michal B

Jun 12, 2023

Synth voice narration quality is truly annoying. I'd expect better from IBM. Course materials are quite superficial, which I guess is acceptable for an introductory course.

By Durr-e- S

Jan 11, 2025

I found the course to be a great foundation for understanding how to work with large datasets using Hadoop and Spark, with clear explanations and practical examples.

By SunjuYi

Nov 12, 2022

This is really helpful for me to understand Big Data and Apache Spark!

By jijo s

May 2, 2022

hands on lab and quizzes at the end of each session was very helpful

By Purvesh V

Aug 10, 2023

It's very good introduction with hands on lab.

By prahal m

Apr 14, 2023

should have more practical knowledge

By dhananjay k

Nov 11, 2022

Good course

By fredy a

Jul 27, 2023

GOOD

By Gorana B

Dec 7, 2024

There are three major concerns I have with this course: 1) Content Depth and Structure: The course content feels overly basic, even for an introductory level. The lab exercises are too simplistic and fail to provide meaningful hands-on experience. There is no technical final assessment; the concluding quiz is entirely theoretical. Questions about IBM products or statistics like "the projected growth of data" seem irrelevant and out of place. 2)Lack of Conceptual Clarity and Practical Application: Key concepts like shuffling, grouping, and filtering are not explained or demonstrated in sufficient depth. Including practical examples to showcase their impact would significantly enhance understanding. The course neglects to explain the execution plan in Spark, particularly how it operates and its implications for application performance. This is a critical topic that deserves proper attention. The explanation of differences between RDDs and DataFrames is confusing, even for someone with basic knowledge. Similarly, the coverage of Spark SQL and functions lacks clarity and structure. A more straightforward approach—e.g., showing three ways to accomplish a task, comparing them, and contextualizing their usage in real-world scenarios—would be far more effective. The inclusion of Pandas is unexpected. While it’s noted that Spark RDDs/DataFrames can be created from Pandas DataFrames, there was too much emphasis on it, and in the same stage of course foundational functions like read.csv are not even mentioned. This omission contributes to a sense of disorganization and a lack of a coherent teaching strategy. 3)Presentation and Delivery: The video materials are AI-generated, and this detracts from the learning experience. Personally, I found the videos too short, with unnecessary and repetitive intro/outro segments that quickly became irritating. This format undermines engagement and suggests a lack of thoughtful design. A human instructor narrating the content could provide a more engaging and dynamic learning experience. A human presenter might also recognize and address the lack of substance in the explanations, resulting in clearer and more effective teaching. Overall, the course feels like a collection of loosely connected topics rather than a carefully designed curriculum. A greater focus on depth, practical application, and a more personalized delivery would significantly improve the learning experience.

By Manoj K C

Jun 9, 2024

Good theoretical knowledge on Big data, Hadoop and Apache Spark. However the labs were of very limited scope and didn't provide full hands-on exposure of the tool. Concepts were really difficult to grasp and had to go through the course content few times to clear the final assignment. Not very impressed with the articulation of the concepts discussed in this course. so 3 stars from me.