Stages in the Subsite Template Job

Overview

The Subsite Template Job can be divided into the following stages:

Load Data and Apply Global Rules

The Read Me First note in the job flow contains information needed for the initial setup and modification of this job. The only value described in this note is EMAILADDRESS, which supplies the e-mail address in the Checkpoint transformations in the template. This address is used for failure notification.
The first stage of the subsite template process locates the data and applies global rules to it. For example, you can apply default rules to filter requests for image files or requests made by spiders and robots that identify themselves as such in their user agent data. For more information, see Managing Non-Human Visitor Detection.
The transformations and tables in this stage are described in the following table:
Load Data and Apply Global Rules Transformations
Name
Description
Inputs from and Outputs to
Clickstream Log transformation
Extracts data from the Web log in the specified file location.
From: Specified file location for Web log
To: Checkpoint - Can we recognize the log? transformation; Clickstream Parse - Global Rules transformation
Checkpoint - Can we recognize the log? transformation
Evaluates the return code from Clickstream Log; sends e-mail to specified address if the log step fails.
From: Clickstream Log transformation
To: Clickstream Parse - Global Rules transformation
Clickstream Parse - Global Rules transformation
Parses the data and applies global rules that apply to all of the subsites; filters out graphics files, non-pages, and spiders that identify themselves in their user agent strings. Also see Managing Non-Human Visitor Detection.
From: Clickstream Log transformation; Checkpoint - Can we recognize the log? transformation
To: Checkpoint - Can we parse the log? transformation; Clickstream Parse - PRD transformation; Clickstream Parse - SVCS transformation; Clickstream Parse - GEN transformation; Clickstream Parse - ALL transformation
Checkpoint - Can we parse the log? transformation
Evaluates the return code from Clickstream Parse - Global Rules; sends e-mail to specified address if the parse step fails.
From: Clickstream Parse - Global Rules transformation
To: Clickstream Parse - PRD transformation
The following display shows the portion of the template job that runs this stage:
Global Rules Stage Process Flow
Global Rules Stage Process Flow

Generate Subsite Sessions

The second stage of the subsite template process uses a Clickstream Parse transformation to limit the data to a selected subsite. Then, a Clickstream Sessionize transformation is used to identify the sessions in that particular subsite. You can assign a session ID, which effectively identifies the sessions that are present within the data.
The template job performs this operation for three distinct subsites: PRD, SVCS, and GEN. Of course, you do not have to process exactly this set of subsites. The template is meant to serve only as an example. You can filter the data for as many or as few subsites as needed. Simply add or remove sets of transformations to match the number of subsites that you have. Then, change the names to appropriate values.
The transformations and tables in the template for this stage are described in the following table:
Generate Subsite Sessions Transformations and Tables
Name
Description
Inputs from and Outputs to
PRD subsite
Clickstream Parse - PRD transformation
Parses the data for the PRD subsite; all other data is filtered out.
From: Clickstream Parse - Global Rules; Checkpoint - Can we parse the log?
To: Checkpoint - Can we parse PRD Subsite data? transformation; Clickstream Sessionize - PRD transformation
Checkpoint - Can we parse PRD Subsite data? transformation
Evaluates the return code from Clickstream Parse - PRD; sends e-mail to specified address if the parse step fails.
From: Clickstream Parse - PRD transformation
To: Clickstream Sessionize - PRD transformation
Clickstream Sessionize - PRD transformation
Identifies sessions within PRD subsite data.
From: Checkpoint - Can we parse PRD Subsite data? transformation; Clickstream Parse - PRD transformation
To: Checkpoint - Can we sessionize PRD Subsite data? transformation; PRD_SUBSITES table
Checkpoint - Can we sessionize PRD Subsite data? transformation
Evaluates the return code from Clickstream Sessionize - PRD; sends e-mail to specified address if the sessionize step fails.
From: Clickstream Sessionize - PRD transformation
To: Clickstream Parse - SVCS transformation
PRD_SUBSITES table
Contains the output from the PRD subsite.
From: Clickstream Sessionize - PRD transformation
To: None
SVCS subsite
Clickstream Parse - SVCS transformation
Parses the data for the SVCS subsite; all other data filtered out.
From: Clickstream Parse - Global Rules; Checkpoint - Can we sessionize PRD Subsite data? transformation
To: Checkpoint - Can we parse the SVCS Subsite data? transformation
Checkpoint - Can we parse the SVCS Subsite data? transformation
Evaluates the return code from Clickstream Parse - SVCS; sends e-mail to specified address if the parse step fails.
From: Clickstream Parse - SVCS transformation
To: Clickstream Sessionize - SVCS transformation
Clickstream Sessionize - SVCS transformation
Identifies sessions within SVCS subsite data.
From: Checkpoint - Can we parse the SVCS Subsite data? transformation
To: Checkpoint - Can we sessionize SVCS subsite data?; SVCS_SUBSITES table
Checkpoint - Can we sessionize SVCS subsite data? transformation
Evaluates the return code from Clickstream Sessionize - SVCS; sends e-mail to specified address if the sessionize step fails.
From: Clickstream Sessionize - SVCS transformation
To: Clickstream Parse - GEN transformation
SVCS_SUBSITES table
Contains the output from the SVCS subsite.
From: Clickstream Sessionize - SVCS transformation
To: None
GEN subsite
Clickstream Parse - GEN transformation
Parses the data for the GEN subsite; all other data filtered out.
From: Clickstream Parse - Global Rules transformation; Checkpoint - Can we sessionize SVCS subsite data? transformation
To: Checkpoint - Can we parse GEN subsite data? transformation
Checkpoint - Can we parse GEN subsite data? transformation
Evaluates the return code from Clickstream Parse - GEN; sends e-mail to specified address if the parse step fails.
From: Clickstream Parse - GEN transformation
To: Clickstream Sessionize - GEN transformation
Clickstream Sessionize - GEN transformation
Identifies sessions within GEN subsite data.
From: Checkpoint - Can we parse GEN subsite data? transformation
To: Checkpoint - Can we sessionize GEN subsite data? transformation; GEN_SUBSITES table
Checkpoint - Can we sessionize GEN subsite data? transformation
Evaluates the return code from Clickstream Sessionize - GEN; sends e-mail to specified address if the sessionize step fails.
From: Clickstream Sessionize - GEN transformation
To: Clickstream Parse - ALL transformation
GEN_SUBSITES table
Contains the output from the GEN subsite.
From: Clickstream Sessionize - GEN transformation
To: None
The following display shows the portion of the template job that runs this stage:
Subsites Stage Process Flow
Subsites Stage Process Flow

Generate Data from Site-Wide Data

The third stage of the subsite template processes the data from the Web log without splitting it into subsites. This stage enables you to create an output table that covers all the data in the Web log. Although filters are not applied in this data, this data can be thought of as a subsite of everything. For example, the ALL output data might be of interest to those responsible for the entire company's site, while the PRD data might be of interest to those in charge of the PRD department's site.
The transformations and tables in this stage are described in the following table:
Generate Data from Site-Wide Data Transformations and Tables
Name
Description
Inputs from and Outputs to
Clickstream Parse - ALL transformation
Parses the data for the entire Web log; no subsite data filtered out.
From: Clickstream Parse - Global Rules transformation; Checkpoint - Can we sessionize GEN subsite data? transformation
To: Checkpoint - Can we parse ALL Subsite data? transformation
Checkpoint - Can we parse ALL Subsite data? transformation
Evaluates the return code from Clickstream Parse - ALL; sends e-mail to specified address if the parse step fails.
From: Clickstream Parse - ALL transformation
To: Clickstream Sessionize - ALL transformation
Clickstream Sessionize - ALL transformation
Identifies sessions from the undivided Web log into sessions.
From: Checkpoint - Can we parse ALL Subsite data? transformation
To: Checkpoint - Can we sessionize ALL subsites? transformation; ALL_SUBSITES table
Checkpoint - Can we sessionize ALL subsites? transformation
Evaluates the return code from Clickstream Sessionize - ALL; sends e-mail to specified address if the sessionize step fails.
From: Clickstream Sessionize - ALL transformation
To: None
ALL_SUBSITES table
Contains the output that has not been divided into subsites.
From: Clickstream Sessionize - ALL transformation
To: None
The following display shows the portion of the template job that runs this stage:
Site-Wide Data Stage Flow
Site-Wide Data Stage Process Flow