Data Mining Techniques: Theory and Practice

  • Business analysts and their managers
  • Statisticians

Please contact us for information about prerequisites.

Expected Duration
3 day


In this course, you will learn about data mining methodology that is a superset to the SAS SEMMA methodology around which SAS Enterprise Miner is organized. You will also learn about a wide range of data mining algorithms as well as theoretical knowledge and practical skills. In this class, you will work through all the steps of a data mining project, beginning with problem definition and data selection, and continuing through data exploration, data transformation, sampling, portioning, modeling, and assessment.


1. Introduction to Data Mining

  • What is data mining
  • Directed and undirected data mining
  • Models
  • Profiling and prediction

2. Data Mining Methodology

  • Why have a methodology
  • How data miners can inadvertently learn things that are not true
  • Translating business problems into data mining problems
  • The importance of model stability
  • Finding the right input variables
  • Sampling to create balanced model sets
  • Partitioning to create training, validation, and test sets
  • Data preparation
  • Model assessment

3. Data Exploration

  • Developing intuition about data
  • Data structure
  • Data types
  • Data values
  • Exploring distributions
  • Summary statistics
  • Histograms
  • using SAS Enterprise Miner for data exploration

4. Regression Models

  • The null hypothesis
  • Statistical significance
  • Confidence bounds
  • Variance and standard deviation
  • Standardized values
  • Correlation
  • Linear regression
  • Logistic regression
  • Using SAS Enterprise Miner to build regression models

5. Decision Trees

  • Decision trees as data exploration and classification tools
  • Decision trees for modeling and scoring
  • Decision trees for variable selection
  • Alternate representations of decision trees
  • Algorithms used to build decision trees
  • Splitting criteria
  • Recognizing instability and overfitting in decision tree models
  • Capturing interactions between variables
  • Using SAS Enterprise Miner to build decision trees

6. Neural Networks

  • Origins of neural networks
  • Neural networks compared with regression
  • Algorithms used to train neural networks
  • Data preparation requirements for neural networks
  • Picking appropriate inputs for neural networks
  • Creating neural network models using SAS Enterprise Miner

7. Memory-Based Reasoning

  • Similarity and distance
  • Distance metrics appropriate for different kinds of data
  • The role of the training set in memory-based reasoning (MBR)
  • Combining the votes of several neighbors
  • Other K-nearest neighbor techniques
  • Collaborative filtering
  • Using the SAS Enterprise Miner MBR node

8. Clustering

  • More on similarity and distance
  • The k-means algorithm
  • Divisive clustering
  • Agglomerative clustering
  • Data preparation for clustering
  • Interpreting clusters
  • Finding clusters with SAS Enterprise Miner

9. Survival Analysis

  • Origins of survival analysis
  • How business data is different from clinical data
  • Hazards and hazard charts
  • Retention curves and survival curves
  • Calculating survival from retention
  • Calculating hazards empirically
  • Parametric hazard models
  • Censoring
  • Competing risks
  • Survival-based forecasting
  • Using SAS code in SAS Enterprise Miner to create survival curves

10. Association Rules

  • Market basket analysis
  • Association rules
  • Sequential pattern analysis
  • Using SAS Enterprise Miner to discover associations in retail data

11. Link Analysis

  • Background on graph theory
  • Sphere of influence
  • Using link analysis to generate derived variables
  • Graph-coloring algorithm
  • Kleinberg’s algorithm

12. Genetic Algorithms

  • Optimization techniques and problems (SAS/OR software)
  • Other algorithms
  • Linear programming problems
  • Genetic algorithms