Data Sources

Overview

The SAS Data Surveyor for Clickstream Data is capable of processing data from standard Web logs, SAS page tag logs, and user-defined logs. Before you process data, you prepare the data source that you intend to process.

Standard Web Logs

Standard Web logs are those logs that are collected by one of the following Web servers:
  • Microsoft Internet Information Server
  • Apache HTTP Server
  • Sun iPlanet
  • any other Web server that generates a Web log in one of the following file formats: Extended Log Format (ELF), Common Log Format (CLF), and Common Log Format Extended (CLFE)

SAS Page Tag Logs

SAS page tag logs are those logs collected by a clickstream collection server (Apache HTTP server with the SAS Data Surveyor for Clickstream Data Mid-Tier Components installed). The content of these logs is generated by JavaScript page code that is inserted into the pages of interest on a Web site. SAS page tag log files are created in a custom log format supporting richer information than the information contained in a standard Web log.
When initially deploying the SAS Data Surveyor for Clickstream Data for a site, you will likely already have standard Web log data. As you incorporate the SAS page tag into your Web sites, you begin collecting richer data.
Identifying and understanding the data source that needs to be processed helps you to decide which tutorial or template jobs to select in the chapters to come. If you intend to collect SAS page tag data, then you need to perform extra tasks for inserting JavaScript page tag code into your Web pages. You also need to prepare one or more clickstream collection servers. (These tasks are described in Chapter 5: Preparing SAS Page Tag Data.) You do not have to perform these tasks if your data source is a standard Web log.

User-Defined Logs

The Clickstream Log transformation supports the ability to create user-defined log types. This feature enables you to customize data source recognition and the processing for data sources that are not incorporated into the product by default.