Data Refinery with YARN and MapReduce

Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisite
None

Expected Duration
97 minutes

Description
The core of Hadoop consists of a storage part, HDFS, and a processing part, MapReduce. Hadoop splits files into large blocks and distributes the blocks amongst the nodes in the cluster. To process the data, Hadoop and MapReduce transfer code to nodes that have the required data, which the nodes then process in parallel. This approach takes advantage of data locality to allow the data to be processed faster and more efficiently via distributed processing than by using a more conventional supercomputer architecture that relies on a parallel file system where computation and data are connected via high-speed networking. In this course, you’ll learn about the theory of YARN as a parallel processing framework for Hadoop. You’ll also learn about the theory of MapReduce as the backbone of parallel processing jobs. Finally, this course demonstrates MapReduce in action by explaining the pertinent classes and then walk through a MapReduce program step by step. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Objective

Theory for YARN

  • start the course
  • describe parallel processing in the context of supercomputing
  • list the components of YARN and identify their primary functions
  • diagram YARN Resource Manager and identify its key components
  • diagram YARN Node Manager and identify its key components
  • diagram YARN ApplicationMaster and identify its key components
  • describe the operations of YARN
  • identify the standard configuration parameters to be changed for YARN

Theory for Key-value Pairs

  • define the principle concepts of key-value pairs and list the rules for key-value pairs
  • describe how MapReduce transforms key-value pairs

Operations for MapReduce

  • load a large text book and then run WordCount to count the number of words in the text book
  • label all of the functions for MapReduce on a diagram
  • match the phases of MapReduce to their definitions

First Program for MapReduce

  • set up the classpath and test WordCount
  • build a JAR file and run WordCount

APIs for MapReduce

  • describe the base mapper class of the MapReduce Java API and describe how to override its methods
  • describe the base Reducer class of the MapReduce Java API and describe how to override its methods
  • describe the function of the MapReduceDriver Java class

Second Program for MapReduce

  • set up the classpath and test a MapReduce job

Streaming for MapReduce

  • identify the concept of streaming for MapReduce
  • stream a Python job

Practice: Using YARN and MapReduce

  • understand YARN features and components, as well as MapReduce and its classes

MONTHLY SUBSCRIPTION

$129/month
 

ANNUAL SUBSCRIPTION

$1295/year

Multi-license discounts available for Annual and Monthly subscriptions.