Data Repository with Flume

Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisite
None

Expected Duration
120 minutes

Description
Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software’s ability to detect and handle failures at the application layer. In this course, you’ll learn about the theory of Flume as a tool for dealing with extraction and loading of unstructured data. You’ll explore a detailed explanation of the Flume agents and a demonstration of the Flume agents in action. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Objective

The Purpose of Flume

  • start the course
  • describe the three key attributes of Flume
  • recall some of the protocols cURL supports
  • use cURL to download web server data

Setup of Flume

  • recall some best practices for the Agent Conf files
  • install and configure Flume

Operations for Flume

  • create a Flume agent
  • describe a flume agent in detail
  • use a flume agent to load data into HDFS

Sources, Sinks, and Channels

  • identify popular sources
  • identify popular sinks
  • describe Flume channels
  • describe what is happening during a file roll

Serializing Data with Avro

  • recall that Avro can be used as both a sink and a source
  • use Avro to capture a remote file
  • create multiple-hop Flume agents

Multiplex Agents for Flume

  • describe interceptors
  • create a Flume agent with a TimeStampInterceptor
  • describe multifunction Flume agents
  • configure Flume agents for mutliflow
  • create multi-source Flume agents
  • compare replicating to multiplexing
  • create a Flume agent for multiple data sinks

Troubleshooting of Flume

  • recall some common reasons for Flume failures
  • use the logger to troubleshoot Flume agents

Practice: Recognize Data Repository with Flume

  • configure the various Flume agents

MONTHLY SUBSCRIPTION

$129/month
 

ANNUAL SUBSCRIPTION

$1295/year

Multi-license discounts available for Annual and Monthly subscriptions.