Data Flow for the Hadoop Ecosystem

Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisite
None

Expected Duration
114 minutes

Description
Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the GFS and of the MapReduce computing paradigm. You’ll explore a demonstration of the use of Sqoop and Hive with Hadoop to flow and fuse data. The demonstration includes preprocessing data, partitioning data and joining data. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Objective

The World of Data

  • start the course
  • describe the data life cycle management

Flowing Data with Sqoop

  • recall the parameters that must be set in the Sqoop import statement
  • create a table and load data into MySQL
  • use Sqoop to import data into Hive
  • recall the parameters that must be set in the Sqoop export statement
  • use Sqoop to export data from Hive
  • recall the three most common date datatypes and which systems support each
  • use casting to import datetime stamps into Hive
  • export datetime stamps from Hive into MySQL

Flowing Data with Hive

  • describe dirty data and how it should be preprocessed
  • use Hive to create tables outside the warehouse
  • use pig to sample data

Administration for the Ecosystem

  • recall some other popular components for the Hadoop Ecosystem
  • recall some best practices for pseudo-mode implementation
  • write custom scripts to assist with administrative tasks
  • troubleshoot classpath errors
  • create complex configuration files

Practice: Data Flow for Sqoop and Hive

  • to use Sqoop and Hive for data flow and fusion in the Hadoop ecosystem

MONTHLY SUBSCRIPTION

$129/month
 

ANNUAL SUBSCRIPTION

$1295/year

Multi-license discounts available for Annual and Monthly subscriptions.