Apache Hadoop

This path is designed for developers, managers, database developers, and anyone interested in learning the basics of Hadoop, or cloud computing in general.


Expected Duration
119 minutes

Apache Hadoop is a set of algorithms for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. This course will introduce the basic concepts of cloud computing using Apache Hadoop, cloud computing, Big Data, and the development tools applied.


Fundamentals of Hadoop

  • start the course
  • describe the basics of Hadoop
  • identify the major users of Hadoop, the end-user application, and the result
  • identify the characteristics of Big Data
  • compare and contrast the traditional data sources and Big Data sources
  • describe the clustering and distributed computing concepts of Hadoop
  • specify low cost commodity servers in Big Data and its configurations as nodes in small and large scale Hadoop installations

Installing Hadoop

  • describe Hadoop installation requirements
  • troubleshoot Hadoop installation issues
  • configure Hadoop installation
  • identify the features of third party Hadoop distributions
  • describe the creation and evolution of Hadoop and its related projects
  • describe the use of YARN in Hadoop cluster management

File Storage/Tools

  • describe the components and functions of Hadoop
  • compare and contrast the different types of Hadoop data
  • describe the four different types of cloud databases in NoSQL Databases
  • describe the basics of the Hadoop Distributed File System
  • describe HDFS and basic HDFS navigation operations
  • perform file operations such as add and delete within HDFS

Introduction to MapReduce

  • describe the basic principles of MapReduce and general mapping issues
  • specify the use of Pig and Hive in Hadoop Map Reduce jobs
  • describe the use of MapReduce, MapReduce lifecycle, job client, job tracker, task tracker, map tasks, and reduce tasks
  • describe Hadoop MapReduce handles, data processes data, and vocabulary of the MapReduce dataflow process
  • describe the process of mapping and reducing

Practice: Introduction to Apache Hadoop

  • describe the basic principles and uses of Hadoop





Multi-license discounts available for Annual and Monthly subscriptions.