Data Factory with Hive

Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisite
None

Expected Duration
125 minutes

Description
Apache Hadoop is a set of algorithms for distributed storage and distributed processing of Big Data on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are commonplace and thus should be automatically handled in software by the framework. In this course, you’ll explore Hive as a SQL like tool for interfacing with Hadoop. The course demonstrates the installation and configuration of Hive, followed by demonstration of Hive in action. Finally, you’ll learn about extracting and loading data between Hive and a RDBMS. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Objective

The Purpose of Hive

  • start the course
  • recall the key attributes of Hive

Setup of Hive

  • describe the configuration files
  • install and configure Hive
  • create a table in Derby using Hive
  • create a table in MySQL using Hive

Details of Hive

  • recall the unique delimiter that Hive uses
  • describe the different operators in Hive

Operations for Hive

  • use basic SQL commands in Hive
  • use SELECT statements in Hive
  • use more complex HiveQL
  • write and use Hive scripts

Joins and Views for Hive

  • recall what types of joins Hive can support
  • use Hive to perform joins

Partitions and Buckets for Hive

  • recall that a Hive partition schema must be created before loading the data
  • write a Hive partition script
  • recall how buckets are used to improve performance
  • create Hive buckets

User-defined Functions for Hive

  • recall some best practices for user defined functions
  • create a user defined function for Hive

Troubleshooting for Hive

  • recall the standard error code ranges and what they mean
  • use a Hive explain plan

Practice: Hive features, loading and querying

  • understand configuration option, data loading and querying

MONTHLY SUBSCRIPTION

$129/month
 

ANNUAL SUBSCRIPTION

$1295/year

Multi-license discounts available for Annual and Monthly subscriptions.