Introduction to Hadoop

Individuals who are new to big data, Hadoop, and data modeling, and wish to understand key concepts and features of Hadoop and its tools


Expected Duration
88 minutes

Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. This course will introduce Hadoop, and its key tools and their applications.


Introduction to Hadoop

  • start the course
  • recognize what Big Data is, sources and types of data, evolution and characteristics of Big Data, and use cases of Big Data
  • identify Big Data infrastructure issues, and explain benefits of Hadoop
  • recognize basics of Hadoop, history, milestones, and core components
  • set up a virtual machine
  • install Linux on a virtual machine

UNIX and JAVA Modeling

  • recognize basic and most useful UNIX commands

Hadoop Data Internals and Interactions

  • identify Hadoop components
  • define HDFS components
  • recognize how to read and write in HDFS
  • use HDFS

MapReduce and YARN

  • recognize basics of YARN
  • define basics of MapReduce
  • identify how MapReduce processes information
  • use code that runs on Hadoop

Ecosystem and Data Type Handlings

  • define Pig, HIVE, and HBase
  • define Sqoop, Flume, Mahout, and Oozie
  • recognize storing and modeling data in Hadoop
  • identify available commercial distributions for Hadoop
  • recognize Spark and its benefits over traditional MapReduce

Practice: Practice Filtering in Hadoop

  • filter information in Hadoop





Multi-license discounts available for Annual and Monthly subscriptions.