Big data is the area of informatics focusing on datasets whose size is beyond the ability of typical database and other software tools to capture, store, analyze and manage. This course provides a rapid immersion into the area of big data and the technologies which have recently emerged to manage it.
Expérience recommandée
Ce que vous apprendrez
Understanding and identifying use cases and domains of Big Data problems
Selecting and implementing technical solutions involving Big Data systems
Develop and use various open source software systems (Apache) in the Big Data tech stack
Operate and run various cloud computing software services (AWS) in the Big Data infrastructure space
Compétences que vous acquerrez
- Catégorie : Data Lakes
- Catégorie : Cloud Computing
- Catégorie : Stream Processing
- Catégorie : NoSQL
- Catégorie : Big Data
Détails à connaître
Ajouter à votre profil LinkedIn
octobre 2024
54 devoirs
Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées
Obtenez un certificat professionnel
Ajoutez cette qualification à votre profil LinkedIn ou à votre CV
Partagez-le sur les réseaux sociaux et dans votre évaluation de performance
Il y a 9 modules dans ce cours
Welcome to Big Data Technologies! In Module 1, students will develop a foundational understanding of analytic data, its inherent value, and the methods to transform raw data into valuable insights. This module covers the challenges of handling large datasets, including their collection, processing, and analysis, while providing a comprehensive overview of Big Data's origins, properties, and real-world applications. Additionally, students will explore the economic, logistical, and ethical concerns associated with Big Data, alongside the professional advantages for data scientists proficient in Big Data analysis.
Inclus
16 vidéos10 lectures8 devoirs1 sujet de discussion
Module 2 introduces students to the challenges of building and managing distributed systems for big data storage and processing. It covers Hadoop’s origins, concepts, core components, and key characteristics, while exploring the Hadoop ecosystem's tools and services. Students will gain an understanding of distributed file systems, specifically HDFS, YARN's resource management, and various technologies for effective big data storage and organization.
Inclus
13 vidéos7 lectures6 devoirs
In Module 3, students will explore the differences between processing small to moderate versus massive data volumes through distributed computing. This module covers the key concepts of the MapReduce framework, including how it breaks down large data processing tasks into smaller, parallel tasks for efficient execution. Students will also learn about the phases of MapReduce, the role of map and reduce functions, optimization patterns, and the benefits and limitations of various development approaches, including Java-based MapReduce and Hadoop Streaming.
Inclus
18 vidéos8 lectures7 devoirs
In Module 4, students will explore Apache Spark as a powerful distributed processing framework for interactive, batch, and streaming tasks. This module covers Spark's core functionalities, including machine learning, graph processing, and handling structured and unstructured data, while highlighting its in-memory processing potential and unified nature. Students will compare Spark with MapReduce, learn about Spark's primary components, execution architecture, Resilient Distributed Datasets (RDDs), DataFrames, Datasets, and the various methods for creating and optimizing DataFrames for efficient data processing.
Inclus
25 vidéos7 lectures6 devoirs
In Module 5, students will delve deeper into Spark's capabilities for data manipulation and transformation. The module covers essential operations such as selecting, filtering, and sorting data, as well as joining DataFrames and performing aggregations. Students will also learn about handling null values, using Spark SQL for data queries, and optimizing performance with caching. Practical applications include creating and manipulating DataFrames, executing transformations and actions, and efficiently writing data to various formats.
Inclus
19 vidéos11 lectures10 devoirs
Module 6 introduces students to the limitations of batch processing and the significance of real-time data processing. It covers essential aspects of stream processing, including data ingestion and analysis, with a focus on tools like Apache Kafka for stream ingestion and Spark Structured Streaming for scalable and fault-tolerant data processing. Students will also explore various design patterns for organizing big data clusters, the concept of data lakes, and the Lambda Architecture for unifying real-time and batch data processing in modern data environments.
Inclus
16 vidéos6 lectures6 devoirs
In Module 7, students will explore the benefits and limitations of relational databases in big data contexts and the concept of distributed database systems. This module covers NoSQL databases, their diverse data models, and their scalability and flexibility advantages. Students will also learn about real-world use cases, data partitioning, consistency models, and the CAP Theorem, gaining a comprehensive understanding of how NoSQL databases manage large datasets across clusters while ensuring scalability and availability.
Inclus
18 vidéos6 lectures6 devoirs
In Module 8, students will explore specific NoSQL databases types – namely Key-Value, Wide-Column, and Document databases. Two similar systems, HBase and Cassandra, will be studied and contrasted in the context of the CAP theorem and associated CP/AP trade-offs. Topics such as consistency and availability will be discussed in the context of specific usage scenarios for both HBase and Cassandra – and general application domains of both systems will be highlighted. Finally, the document database MongoDB will be reviewed in the context of natural language/text processing use cases – and MongoDB usage and architecture will be analyzed with respect to traditional RDBMS.
Inclus
9 vidéos4 lectures4 devoirs
This module contains the summative course assessment that has been designed to evaluate your understanding of the course material and assess your ability to apply the knowledge you have acquired throughout the course.
Inclus
1 devoir
Instructeur
Offert par
Recommandé si vous êtes intéressé(e) par Software Development
Duke University
Préparer un diplôme
Ce site cours fait partie du (des) programme(s) diplômant(s) suivant(s) proposé(s) par Illinois Tech. Si vous êtes admis et que vous vous inscrivez, les cours que vous avez suivis peuvent compter pour l'apprentissage de votre diplôme et vos progrès peuvent être transférés avec vous.¹
Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?
Ouvrez de nouvelles portes avec Coursera Plus
Accès illimité à 10,000+ cours de niveau international, projets pratiques et programmes de certification prêts à l'emploi - tous inclus dans votre abonnement.
Faites progresser votre carrière avec un diplôme en ligne
Obtenez un diplôme auprès d’universités de renommée mondiale - 100 % en ligne
Rejoignez plus de 3 400 entreprises mondiales qui ont choisi Coursera pour les affaires
Améliorez les compétences de vos employés pour exceller dans l’économie numérique
Foire Aux Questions
Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:
The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.
You will be eligible for a full refund until two weeks after your payment date, or (for courses that have just launched) until two weeks after the first session of the course begins, whichever is later. You cannot receive a refund once you’ve earned a Course Certificate, even if you complete the course within the two-week refund period. See our full refund policy.