Stages in Template Jobs

Overview

Note: This topic describes the stages in the standard Web log basic template job. The page-tagging version of the template contains the Parse Tagged Data Items transformation in the Loop One: Recognize, Parse, Group Data stage of the job. This transformation is a renamed Clickstream Parse transformation.

Prepare Data and Parameter Values to Pass to Loop 1

The Read Me First note in the job flow contains information that is necessary for the initial setup and modification of this job. You might need to edit the following values in the Parameters tab for the job:
EMAILADDRESS
supplies the e-mail address in the Checkpoint transformations in the template. This address is used for failure notification.
NUMPARSEPATHS
determines the number of folders that are created for holding output for the parallel executions of the Clickstream Parse transformation in the first loop. Set this value to match the default value that was used in the setup job. If that value was changed in the setup job, then it should also be updated here.
NUMGROUPS
determines how many groups of data are created by the Clickstream Parse transformation during the first loop. Therefore, it also determines the maximum number of parallel executions for Clickstream Sessionize during the second loop. Set this parameter value to match the default value that was used in the setup job. If that value was changed in the setup job, then it should also be updated here.
The first stage of the standard Web log basic template process locates the data and parses it.
The transformations and tables in this stage are described in the following table:
Locate and Parse Transformations and Tables
Name
Description
Inputs from and Outputs to
LOG_PATHS table
Contains a list of folder paths to scan for clickstream logs.
From: None
To: Directory Contents transformation
Directory Contents transformation
Generates a data table that contains a list of the files found in the directories that are listed in the LOG_PATHS data table. The output table contains the following columns:
  • FILENUM: a unique sequence number related to that file (such as 1,2,3,4)
  • FILENAME: the name of the file
  • FULLNAME: a combination of path and filename
From: LOG_PATHS table
To: Build Loop Parameters (reused SAS Extract) transformation
Abort If No Files Found
Evaluates the return code from Directory Contents; stops the run if no files are found.
From: Directory Contents transformation
To: Build Loop Parameters (reused SAS Extract) transformation
Build Loop Parameters (reused SAS Extract) transformation
Passes through the columns from the Directory Contents transformation and creates two additional columns. LIBRARYNUMBER is a number from 1 to n where n is the number of output locations that have been defined on the file system for the first loop (the Clickstream Parse transformation). This column's value is used to ensure that when running in parallel, the output from the jobs is spread across the different folders. PARSEOUTMEMBER uses the incoming FILENUM value to create a unique suffix for the parse output tables. This ensures that when two streams use the same folder, the output from one does not overwrite the output from the other.
From: Directory Contents transformation
To: Set Output Library Locations (reused Lookup) transformation
PARSE_GRID_PATHS table
Contains a list of paths to folders where the outputs from multiple Clickstream Parse transformation calls are distributed. The paths specified in this table are accessed simultaneously by parallel processes. To optimize performance, specify paths that reside on different physical disks or network locations.
From: None
To: Set Output Library (reused SAS Extract) transformation
Set Output Library (reused SAS Extract) transformation
Uses the output library locations that are listed in the PARSE_GRID_PATHS configuration table. This transformation uses the LIBRARYNUMBER column to associate that log file with an output location (PARMLIBPATH) and an output LIBNAME (PARMLIBNAME). These values provide a different input file and output library for each iteration of the loop that follows.
From: Build Loop Parameters (reused SAS Extract) transformation, and PARSE_GRID_PATHS table
To: Loop 1 (Recognize and Parse) transformation
The following display shows the locate and parse data stage of the template job.
Locate and Parse Process Flow
Locate and Parse Process Flow

Loop One: Recognize, Decode, Parse, and Group Data

The second stage contains the first loop job. The transformations in the first loop job represent the subjob, which is the job that is run in parallel. For a standard Web log, each stream consists of a Clickstream Log transformation, a Clickstream Parse transformation, and two checkpoints, which are created by renaming the Return Code transformation and which enable you to configure how errors are processed.
The transformations in this stage are described in the following table:
Loop One Transformations
Name
Description
Inputs from and Outputs to
Loop 1 (Recognize and Parse) transformation
Passes the appropriate parameters through to the job flows that are executed in parallel. Each parallel stream should have the following parameters set:
  • INPUTFILE is supplied by the FULLNAME source column
  • OUTLIBPATH is supplied by the PARMLIBPATH source column
  • INFILENUM is supplied by the FILENUM source column
From: Set Output Library (reused SAS Extract) transformation
To: Clickstream Log transformation
To: Filter - Only properly parsed logs (SAS Extract)
Clickstream Log transformation
Extracts and decodes (URL and character) data from a single log for each pass through the loop; determines the raw Web log type and creates a SAS DATA step view that is used to read the raw data.
From: Loop 1 (Recognize and Parse) transformation
To: RC Check - Log transformation
To: Clickstream Parse transformation
RC Check - Log transformation
Evaluates the return code from Clickstream Log; sends e-mail to specified address if the log step fails.
From: Clickstream Log transformation
To: Clickstream Parse transformation
Clickstream Parse transformation
Parses this data and generates n output tables, where n is the number of groups expected by the Sessionize loop (the second loop).
For customer intelligence template jobs, this step also identifies the campaign and customer who clicked on a specific treatment.
Campaign information is denoted by these columns:
  • EntrySource: ID of the entity that originated access to the landing page
  • EntryActionID: ID that represents the Entry Source
  • S1 through S4 - identifies the subject of an Entry Action either alone or with other Subject ID parameters
Clickstream Parse populates EntrySource with a value of “SDM” if there is a value in the EntryActionID and S1 columns.
From: RC Check - Log transformation
To: RC Check - Parse transformation
RC Check - Parse transformation
Evaluates the return code from Clickstream Parse; sends e-mail to specified address if the parse step fails.
From: Clickstream Parse transformation
To: Loop End transformation
Loop End transformation
Ends loop processing; returns to beginning of loop
From: RC Check - Parse transformation
To: Filter - Only properly parsed logs (reused SAS Extract) transformation
The following display shows the first loop stage of the template job.
Loop 1 Process Flow for Standard Web Logs
Loop 1 Process Flow
When you process SAS page tag logs, an additional Clickstream Parse transformation (and associated RC Check) are inserted in order to process the data elements collected by the SAS page tag (after the RC Check-Log transformation and before the Clickstream Parse transformation). This additional Clickstream Parse transformation is named Parse Tagged Data Items. For a partial list of the data elements processed by this additional transformation, see SAS Page Tag Predefined Data Elements Reference.

Combine Groups

The third stage prepares the groups used in the sessionizing process in the second loop. This stage contains transformations that filter for properly parsed logs, create groups, build loop parameters, and prepare paths and output locations for the upcoming loop.
The following figure illustrates this stage of the process:
Combine Groups
Combine Groups
The transformations and tables in this stage are described in the following table:
Grouping Transformations
Name
Description
Inputs from and Outputs to
Filter - Only properly parsed logs (SAS Extract) transformation
Uses the status table generated by the Loop transformations to determine which subjobs were successful and should therefore be processed further.
From: Loop 1 (Recognize and Parse) transformation
From: Loop End transformation
To: Clickstream Create Groups transformation
Clickstream Create Groups transformation
Constructs a table that contains information that is used in the sessionize loop; aggregates the parse output groups so that all of the Group 1 session IDs are together, all the Group 2 IDs are together, and so on; prepares views that are ready for the Clickstream Sessionize transformation.
From: Filter - Only properly parsed logs (SAS Extract) transformation
To: Build Loop 2 Parameters (SAS Extract) transformation
Build Loop 2 Parameters (SAS Extract) transformation
Builds a data table that supplies the parameter values for the loop transformation.
From: Clickstream Create Groups transformation
To: Set Sessionize Output Library Locations (Lookup) transformation
SESSIONIZE_GRID_PATHS table
Contains a list of sessionized grid paths.
From: None
To: Set Sessionize Output Library Locations (Lookup) transformation
Set Sessionize Output Library Locations (Lookup) transformation
Assigns each group of tables from the Parse loop to a sessionize output location.
From: Build Loop 2 Parameters (SAS Extract) transformation and SESSIONIZE_GRID_PATHS table
To: Loop 2 (Identify Sessions) transformation
The following display shows the combine groups stage of the template job.
Combine Groups Process Flow
Combine Groups Process Flow

Loop Two: Sessionize

The fourth stage consists of the second loop. This stage contains transformations and tables that run the loop and sessionize the data.
The following figure illustrates this stage of the process:
Loop Two: Sessionize
Loop Two: Sessionize
The transformations and tables in this stage are described in the following table:
Sessionize Transformations
Name
Description
Inputs from and Outputs to
Loop 2 (Identify Sessions) transformation
Sets the parameters that are passed through to the subjobs. The following parameters are set:
  • INPUTLIBNAME is the SAS LIBNAME value used to reference all of the output SAS tables from the Clickstream Parse loop.
  • INPUTPATHS is a string formatted for use in the SAS LIBNAME statement. This string specifies the physical paths that contain the SAS table created by the Clickstream Parse loop.
  • INPUTMEMBER is the group of data that is to be processed.
  • OUTMEMBER and OUTLIBPATH define the locations of the Sessionize output.
  • PERMLIBPATH is the path location for the PERMLIB= option; PERMLIB retains data from sessions that were active during processing of the last Web log so that it can continue the sessions later; using PERMLIB enables you to reconnect spanned sessions that were cut when a Web log file ended and a new log file began. The PERMLIB results enable a spanned session to be recognized as the same session by the Clickstream Sessionize transformation.
From: Set Sessionize Output Library Locations (Lookup) transformation
To: Clickstream Sessionize transformation
To: Filter Failed Jobs (SAS Extract) transformation
PARAM_PARSE_RESULTS table
A parameterized table for receiving the output from the Clickstream Parse transformation and passing it into the Clickstream Sessionize transformation. (See Propagating Columns in Jobs that use the Loop Transformation if you have defined User Columns that need to be propagated to the final detail table.)
From: None
To: Clickstream Sessionize transformation
Clickstream Sessionize transformation
Identifies sessions in the grouped data.
From: Loop 2 (Identify Sessions) transformation and PARAM_PARSE_RESULTS table
To: RC Check - Sessionize transformation and CLICKSTREAM_SESSIONIZE table
CLICKSTREAM_SESSIONIZE table
Stores CLICKSTREAM_SESSIONIZE output and ensures the sort sequence of the output data is correct. (See Backing Up PERMLIB.)
From: Clickstream Sessionize transformation
To: None
RC Check - Sessionize transformation
Evaluates the return code from the Clickstream Sessionize transformation; sends e-mail to specified address if the sessionized step fails.
From: Clickstream Sessionize transformation
To: Loop End transformation
Loop End transformation
Ends loop processing; returns to beginning of loop
From: RC Check - Sessionize transformation
To: Filter Failed Jobs (SAS Extract) transformation
The following display shows the second loop stage of the template job.
Loop 2 Process Flow
Loop 2 Process Flow

Create Detail and Generate Output

The fifth stage combines the outputs from multiple Clickstream Sessionize transformations to create a single detail table.
The following display illustrates this stage of the process:
Create Detail and Generate Output
Create Detail and Generate Output
The transformations and tables in this stage are described in the following table:
Detail and Output Transformations
Name
Description
Inputs from and Outputs to
Filter - Only properly parsed logs transformation
Uses the status table generated by the Loop transformation to determine which subjobs were successful and should therefore be processed further.
Loop 2 (Identify Sessions) transformation
From: Loop End transformation
To: Clickstream Create Detail transformation
Clickstream Create Detail transformation
Combines the output from multiple Clickstream Sessionize transformations and creates a single data table.
From: Filter - Only properly parsed logs transformation
To: WEBLOG_DETAIL table
WEBLOG_DETAIL table
Contains the output from multiple Clickstream Sessionize transformations.
From: Clickstream Create Detail transformation
To: None
The following display shows the create detail and generate output stage of the template job.
Create Detail and Generate Output Process Flow
Create Detail and Generate Output Process Flow