Ecosystem for Hadoop

Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisite
None

Expected Duration
95 minutes

Description
Hadoop’s HDFS is a highly fault-tolerant distributed file system and, like Hadoop in general, designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets. This course examines the Hadoop ecosystem by demonstrating all of the commonly used open source software components. You’ll explore a Big Data model to understand how these tools combine to create a supercomputing platform. You’ll also learn how the principles of supercomputing apply to Hadoop and how this yields an affordable supercomputing environment. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Objective

A Map for Big Data

  • start the course
  • describe supercomputing
  • recall three major functions of data analytics

Key Terminology for Big Data

  • define Big Data
  • describe the two different types of data

Ecosystem for Hadoop

  • describe the components of the Big Data stack
  • identify the data repository components
  • identify the data refinery components
  • identify the data factory components

Theory for Hadoop

  • recall the design principles of Hadoop
  • describe the design principles of sharing nothing
  • describe the design principles of embracing failure

Data Repository for Hadoop

  • describe the components of the Hadoop Distributed File System (HDFS)
  • describe the four main HDFS daemons

Data Refinery for Hadoop

  • describe Hadoop YARN
  • describe the roles of the Resource Manager daemon
  • describe the YARN NodeManager and ApplicationMaster daemons
  • define MapReduce and describe its relations to YARN

Data Analytics

  • describe data analytics

Hadoop Ecosystem Complexities

  • describe the reasons for the complexities of the Hadoop Ecosystem

Practice: Ecosystem for Hadoop

  • describe the components of the Hadoop ecosystem

MONTHLY SUBSCRIPTION

$129/month
 

ANNUAL SUBSCRIPTION

$1295/year

Multi-license discounts available for Annual and Monthly subscriptions.