Hadoop Cluster Availability

Developers interested in expanding their knowledge of Hadoop from the operations perspective

Prerequisite
None

Expected Duration
168 minutes

Description
When examining Hadoop availability it’s important not to focus solely on the NameNode. There is a tendency since that is the single point of failure for HDFS, and many components in the ecosystem rely on HDFS, but Hadoop availability is a more general larger issue. In this course we are going to examine the availability and how to recover from failures for the NameNode, DataNode, HDFS, and YARN. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Objective

Availability of Hadoop

  • start the course
  • describe how Hadoop leverages fault tolerance
  • recall the most common causes for NameNode failure
  • recall the uses for the Checkpoint node
  • test the availability for the NameNode
  • describe the operation of the NameNode during a recovery
  • swap to a new NameNode
  • recall the most common causes for DataNode failure
  • test the availability for the DataNode
  • describe the operation of the DataNode during a recovery
  • set up the DataNode for replication

High Availability for HDFS

  • identify and recover from a missing data block scenario
  • describe the functions of Hadoop high availability
  • edit the Hadoop configuration files for high availability
  • set up a high availability solution for NameNode
  • recall the requirements for enabling an automated failover for the NameNode
  • create an automated failover for the NameNode

YARN Containers

  • recall the most common causes for YARN task failure
  • describe the functions of YARN containers
  • test YARN container reliability

YARN Jobs

  • recall the most common causes of YARN job failure
  • test application reliability

High Availability for YARN

  • describe the system view of the Resource Manager configurations set for high availability
  • set up high availability for the Resource Manager

Practice: Managing Availability

  • move the Resource Manager HA to alternate master servers

MONTHLY SUBSCRIPTION

$129/month
 

ANNUAL SUBSCRIPTION

$1295/year

Multi-license discounts available for Annual and Monthly subscriptions.