Data Filtering

Individuals with some programming and math experience working toward implementing data science in their everyday work


Expected Duration
62 minutes

Once data is gathered for data science it is often in an unstructured or raw format. Data must be filtered for content and validity. In this course, you’ll explore examples of practical tools and techniques for data filtering.


Introduction to Data Filtering

  • start the course
  • identify common filtering techniques and tools
  • extract date elements from common date formats
  • parse content types in HTTP headers
  • use csvcut to filter CSV data
  • use sed to replace values in a text data stream
  • drop duplicate records from data
  • extract headers from a jpeg image
  • use pdfgrep to extract data from searchable pdf files
  • detect invalid or impossible data combinations
  • parse robots.txt from a web site to decide what should and shouldn’t be crawled nor indexed

Practice: Filtering Dates

  • drop records from a CSV file based on date range





Multi-license discounts available for Annual and Monthly subscriptions.