Data Gathering

Individuals with some programming and math experience working toward implementing data science in their everyday work


Expected Duration
74 minutes

To carry out data science, you need to gather data. Extracting, parsing, and scraping data from various sources, both internal and external, is a critical first part in the data science pipeline. In this course, you’ll explore examples of practical tools for data gathering.


Data Extraction

  • start the course
  • describe problems and software tools associated with data gathering
  • use curl to gather data from the Web
  • use in2csv to convert spreadsheet data to CSV format
  • use agate to extract data from spreadsheets
  • use agate to extract tabular data from dbf files
  • extract data from particular tags in an HTML document


  • distinguish between metadata and data
  • work with metadata in HTTP Headers
  • work with Linux log files
  • work with metadata in email headers

Remote Data

  • perform a secure shell connection to a remote server
  • copy remote data using a secure copy
  • synchronize data from a remote server

Practice: Curl and HTML

  • download an HTML file and explore table data





Multi-license discounts available for Annual and Monthly subscriptions.