MLlib, GraphX, and R

Programmers and developers familiar with Apache Spark who wish to expand their skill sets


Expected Duration
176 minutes

MLlib is Spark’s machine learning library. GraphX is Spark’s API for graphs and graph-parallel computation. SparkR exposes the API and allows users to run jobs from the R shell on a cluster. In this course, you will learn how to work with each of these libraries.


Machine Learning with MLlib

  • start the course
  • describe data types
  • recall the basic statistics
  • describe linear SVMs
  • perform logistic regression
  • use naïve bayes
  • create decision trees
  • use collaborative filtering with ALS
  • perform clustering with K-means
  • perform clustering with LDA
  • perform analysis with frequent pattern mining


  • describe the property graph
  • describe the graph operators
  • perform analytics with neighborhood aggregation
  • perform messaging with Pregel API
  • build graphs
  • describe vertex and edge RDDs
  • optimize representation through partitioning
  • measure vertices with PageRank

R and Spark

  • install SparkR
  • run SparkR
  • use existing R packages
  • expose RDDs as distributed lists
  • convert existing RDDs into DataFrames
  • read and write parquet files
  • run SparkR on a cluster

Practice: Use MLlib

  • use the algorithms and utilities in MLlib





Multi-license discounts available for Annual and Monthly subscriptions.