MLlib, GraphX, and R

Programmers and developers familiar with Apache Spark who wish to expand their skill sets

Prerequisite
None

Expected Duration
176 minutes

Description
MLlib is Spark’s machine learning library. GraphX is Spark’s API for graphs and graph-parallel computation. SparkR exposes the API and allows users to run jobs from the R shell on a cluster. In this course, you will learn how to work with each of these libraries.

Objective

Machine Learning with MLlib

  • start the course
  • describe data types
  • recall the basic statistics
  • describe linear SVMs
  • perform logistic regression
  • use naïve bayes
  • create decision trees
  • use collaborative filtering with ALS
  • perform clustering with K-means
  • perform clustering with LDA
  • perform analysis with frequent pattern mining

GraphX

  • describe the property graph
  • describe the graph operators
  • perform analytics with neighborhood aggregation
  • perform messaging with Pregel API
  • build graphs
  • describe vertex and edge RDDs
  • optimize representation through partitioning
  • measure vertices with PageRank

R and Spark

  • install SparkR
  • run SparkR
  • use existing R packages
  • expose RDDs as distributed lists
  • convert existing RDDs into DataFrames
  • read and write parquet files
  • run SparkR on a cluster

Practice: Use MLlib

  • use the algorithms and utilities in MLlib

MONTHLY SUBSCRIPTION

$129/month
 

ANNUAL SUBSCRIPTION

$1295/year

Multi-license discounts available for Annual and Monthly subscriptions.