Programming and Deploying Apache Spark Applications

Developers familiar with Scala, Python, or Java who want to learn how to program and deploy Spark applications


Expected Duration
180 minutes

Apache Spark is a cluster computing framework for fast processing of Hadoop data. Spark applications can be written in Scala, Java, or Python. In this course, you will learn how to develop Spark applications using Scala, Java, or Python. You will also learn how to test and deploy applications to a cluster, monitor clusters and applications, and schedule resources for clusters and individual applications.


Getting Started with Apache Spark

  • start the course
  • describe Apache Spark and the main components of a Spark application
  • download and install Apache Spark on Windows 8.1 Pro N
  • download and install Apache Spark on Mac OS X Yosemite
  • download and install Java Development Kit or JDK 8 and build Apache Spark using Simple Build Tool or SBT on Mac OS X Yosemite
  • use the Spark shell for analyzing data interactively
  • link an application to Spark
  • create a SparkContext to initialize Apache Spark

Resilient Distributed Datasets (RDDs)

  • introduce Resilient Distributed Datasets or RDDs and create a parallelized collection to generate an RDD
  • load external datasets to create Resilient Distributed Datasets or RDDs

RDD Operations

  • distinguish transformations and actions, describe some of the transformations supported by Spark, and use transformations
  • describe some of the actions supported by Spark and use the actions
  • use anonymous function syntax and use static methods in a global singleton to pass functions to Spark
  • work with key-value pairs
  • persist Spark RDDs

Shared Variables

  • use broadcast variables in a Spark operation
  • use accumulators in Spark operations

Working with Data

  • use different formats for loading and saving Spark data
  • use basic Spark SQL for data queries in a Spark application
  • use basic Spark GraphX to work with graphs in a Spark application

Deployment and Testing

  • describe how Spark applications run in a cluster
  • deploy a Spark application to a cluster
  • unit test a Spark application

Monitoring and Scheduling

  • describe how to monitor a Spark application or cluster with Web UIs
  • describe options for scheduling resources across applications in a Spark cluster
  • describe how to enable a fair scheduler for fair sharing within an application in a Spark cluster
  • configure fair scheduler pool properties for a Spark context within a cluster

Practice: Programming and Deploying a Spark App

  • practice programming and deploying a Spark application to a cluster





Multi-license discounts available for Annual and Monthly subscriptions.