Designing Hadoop Clusters

Developers interested in expanding their knowledge of Hadoop from the operations perspective

Prerequisite
None

Expected Duration
133 minutes

Description
Hadoop is an Apache Software Foundation project and open source software platform for scalable, distributed computing. Hadoop can provide fast and reliable analysis of both structured data and unstructured data. In this course you will learn about the design principles, the cluster architecture, considerations for servers and operating systems, and how to plan for a deployment. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Objective

Big Data Engineering

  • start the course
  • describe the principles of supercomputing
  • recall the roles and skills needed for the Hadoop engineering team
  • recall the advantages and shortcomings of using Hadoop as a supercomputing platform

Principles of Hadoop Clusters

  • describe the three axioms of supercomputing
  • describe the dumb hardware and smart software, and the share nothing design principles
  • describe the design principles for move processing not data, embrace failure, and build applications not infrastructure

Architecture a Hadoop Cluster

  • describe the different rack architectures for Hadoop.
  • describe the best practices for scaling a Hadoop cluster.

Network for the Hadoop Cluster

  • recall the best practices for different types of network clusters

Hardware for the Hadoop Cluster

  • recall the primary responsibilities for the master, data, and edge servers
  • recall some of the recommendations for a master server and edge server
  • recall some of the recommendations for a data server

Operating Systems for the Hadoop Cluster

  • recall some of the recommendations for an operating system
  • recall some of the recommendations for hostnames and DNS entries

Storage for the Hadoop Cluster

  • describe the recommendations for HDD
  • calculate the correct number of disks required for a storage solution
  • compare the use of commodity hardware with enterprise disks

Deployment of an Admin Server

  • plan for the development of a Hadoop cluster
  • set up flash drives as boot media
  • set up a kickstart file as boot media
  • set up a network installer

Practice: Design a Hadoop Cluster

  • identify the hardware and networking recommendations for a Hadoop cluster

MONTHLY SUBSCRIPTION

$129/month
 

ANNUAL SUBSCRIPTION

$1295/year

Multi-license discounts available for Annual and Monthly subscriptions.