Data Repository with HDFS and HBase

Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis


Expected Duration
127 minutes

Hadoop is an open source Java framework for processing and querying vast amounts of data on large clusters of commodity hardware. It relies on an active community of contributors from all over the world for its success. In this course, you’ll explore the server architecture for Hadoop and learn about the functions and configuration of the daemons making up the Hadoop Distributed File System. You’ll also learn about the command line interface and common HDFS administration issues facing all end users. Finally, you’ll explore the theory of HBase as another data repository built alongside or on top of HDFS, and basic HBase commands. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.


Theory of HDFS

  • start the course
  • configure the replication of data blocks
  • configure the default file system scheme and authority
  • describe the functions of the NameNode
  • recall how the NameNode operates
  • recall how the DataNode maintains data integrity
  • describe the purpose of the CheckPoint Node
  • describe the role of the Backup Node

Operations for HDFS

  • recall the syntax of the file system shell commands
  • use shell commands to manage files
  • use shell commands to provide information about the file system

Troubleshooting of HDFS

  • perform common administration functions
  • configure parameters for NameNode and DataNode
  • troubleshoot HDFS errors

Theory for NoSQL and RDBMS

  • describe key attributes of NoSQL databases

Overview of HBase and ZooKeeper

  • describe the roles of HBase and ZooKeeper
  • install and configure ZooKeeper
  • instause the HBase command line to create tables and insert datall and configure HBase

Operation for HBase

  • instause the HBase command line to create tables and insert datall and configure HBase
  • manage tables and view the web interface
  • create and change HBase data

Practice: Hadoop Distributed File System

  • provide a basic understanding of how Hadoop Distributed File System functions





Multi-license discounts available for Annual and Monthly subscriptions.