Data Repository with Sqoop

Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis


Expected Duration
86 minutes

Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing. This course explains the theory of Sqoop as a tool for dealing with extraction and loading of structured data from a RDBMS. You’ll explore an explanation of Hive SQL statements and a demonstration of Hive in action. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.


Setup of MySQL

  • start the course
  • describe MySQL
  • install MySQL
  • create a database in MySQL
  • create MySQL tables and load data

The Purpose of Sqoop

  • describe Sqoop
  • describe Sqoop’s architecture

Setup of Sqoop

  • recall the dependencies for Sqoop installation
  • install Sqoop

Operations for Sqoop

  • recall why it’s important for the primary key to be numeric
  • perform a Sqoop import from MySQL into HDFS
  • recall what concerns the developers should be aware of
  • perform a Sqoop export from HDFS into MySQL
  • recall that you must execute a Sqoop import statement for each data element
  • perform a Sqoop import from MySQL into HBase

Troubleshooting of Sqoop

  • recall how to use chain troubleshooting to resolve Sqoop issues
  • use the log files to identify common Sqoop errors and their resolutions

Practice: Data Repository with Sqoop

  • to use Sqoop to extract data from a RDBMS and load the data into HDFS





Multi-license discounts available for Annual and Monthly subscriptions.