Stages in the Customer Integration Template Job

Prepare Data and Parameter Values to Pass to Loop 1

The Read Me First note in the job flow contains information that is recommended for the initial setup and modification of this job. You might also need to edit the following values in the Parameters tab for the job:
EMAILADDRESS
supplies the e-mail address in the Checkpoint transformations in the template. This address is used for failure notification.
NUMPARSEPATHS
determines the number of folders that are created for holding output for the parallel executions of the Clickstream Parse transformation in the first loop. Set this value to match the default value that was used in the setup job. If that value was changed in the setup job, then it should also be updated here.
NUMGROUPS
determines how many groups of data are created by the Clickstream Parse transformation during the first loop. Therefore, it also determines the maximum number of parallel executions for Clickstream Sessionize transformation during the second loop. Set this parameter value to match the default value that was used in the setup job. If that value was changed in the setup job, then it should also be updated here.
The first stage of the campaign information template process locates the data and parses it.
The transformations and tables in this stage are described in the following table:
Locate and Parse Transformations and Tables
Name
Description
Inputs from and Outputs to
LOG_PATHS table
Contains a list of folder paths to scan for clickstream logs.
From: None
To: Directory Contents transformation
Directory Contents transformation
Generates a data table that contains a list of the files found in the directories that are listed in the LOG_PATHS data table. The output table contains the following columns:
  • FILENUM: a unique sequence number related to that file (such as 1,2,3,4)
  • FILENAME: the name of the file
  • FULLNAME: a combination of path and filename
From: LOG_PATHS table
To: Build Loop Parameters (reused SAS Extract) transformation
Build Loop Parameters (reused SAS Extract) transformation
Passes through the columns from the Directory Contents transformation and creates two additional columns. LIBRARYNUMBER is a number from 1 to n where n is the number of output locations that have been defined on the file system for the first loop (the Clickstream Parse transformation). This column's value is used to ensure that when running in parallel, the output from the jobs is spread across the different folders. PARSEOUTMEMBER uses the incoming FILENUM value to create a unique suffix for the parse output tables. This ensures that when two streams use the same folder, the output from one does not overwrite the output from the other.
From: Directory Contents transformation
To: Set Output Library Locations (reused Lookup) transformation
PARSE_GRID_PATHS table
Contains a list of paths to folders where the outputs from multiple Clickstream Parse transformation calls are distributed. The paths specified in this table are accessed simultaneously by parallel processes. To optimize performance, specify paths that reside on different physical disks or network locations.
From: None
To: Set Output Library (reused SAS Extract) transformation
Set Output Library (reused SAS Extract) transformation
Uses the output library locations that are listed in the PARSE_GRID_PATHS configuration table. This transformation uses the LIBRARYNUMBER column to associate that log file with an output location (PARMLIBPATH) and an output LIBNAME (PARMLIBNAME). These values provide a different input file and output library for each iteration of the loop that follows.
From: Build Loop Parameters (reused SAS Extract) transformation, and PARSE_GRID_PATHS table
To: Loop 1 (Recognize and Parse) transformation
The following display shows the locate and parse data stage of the template job.
Locate and Parse Process Flow
Locate and Parse Process Flow

Loop One: Recognize, Parse, and Group Data

The second stage contains the first loop job. The transformations in the first loop job represent the subjob, which is the job that is run in parallel. Each stream consists of a Clickstream Log transformation, a Clickstream Parse transformation, and two checkpoints that are created by renaming the Return Code transformation and that enable you to configure how errors are processed.
The transformations in this stage are described in the following table:
Loop One Transformations
Name
Description
Inputs from and Outputs to
Loop 1 (Recognize and Parse) transformation
Passes the appropriate parameters through to the job flows that are executed in parallel. Each parallel stream should have the following parameters set:
  • INPUTFILE is supplied by the FULLNAME source column
  • OUTLIBPATH is supplied by the PARMLIBPATH source column
  • INFILENUM is supplied by the FILENUM source column
From: Set Output Library (reused SAS Extract) transformation
To: Clickstream Log transformation
To: Filter - Only properly parsed logs (SAS Extract)
Clickstream Log transformation
Extracts data from a single log for each pass through the loop; determines the raw Web log type and creates a SAS DATA step View that is used to read the raw data.
From: Loop 1 (Recognize and Parse) transformation
To: Checkpoint - Can we recognize the log? transformation
To: Clickstream Parse transformation
Checkpoint - Can we recognize the log? transformation
Evaluates the return code from Clickstream Log; sends e-mail to specified address if the log step fails.
From: Clickstream Log transformation
To: Clickstream Parse transformation
Clickstream Parse transformation
Parses this data, identifies the campaign and customer who clicked on a specific treatment, and generates n output tables, where n is the number of groups expected by the Sessionize loop (the second loop).
Campaign information is denoted by these columns:
  • EntrySource - ID of the entity that originated access to the landing page
  • EntryActionID – ID that represents the Entry Source
  • S1 through S4 - identifies the subject of an Entry Action either alone or with other Subject ID parameters
Clickstream Parse populates EntrySource with a value of “SDM” if there is a value in the EntryActionID and S1 columns.
From: Checkpoint - Can we recognize the log? transformation
To: Checkpoint - Parse OK? transformation
Checkpoint - Parse OK? transformation
Evaluates the return code from Clickstream Parse; sends e-mail to specified address if the parse step fails.
From: Clickstream Parse transformation
To: Loop End transformation
Loop End transformation
Ends loop processing; returns to beginning of loop
From: Checkpoint - Parse OK? transformation
To: Filter - Only properly parsed logs (reused SAS Extract) transformation
The following display shows the first loop stage of the template job.
Loop 1 Process Flow
Loop 1 Process Flow

Combine Groups

The third stage prepares the groups used in the sessionizing process in the second loop. This stage contains transformations that filter for properly parsed logs, create groups, build loop parameters, and prepare paths and output locations for the upcoming loop.
The transformations and tables in this stage are described in the following table:
Grouping Transformations
Name
Description
Inputs from and Outputs to
Filter - Only properly parsed logs (SAS Extract) transformation
Uses the status table generated by the Loop transformations to determine which subjobs were successful and therefore should be processed further.
From: Loop 1 (Recognize and Parse) transformation
From: Loop End transformation
To: Clickstream Create Groups transformation
Clickstream Create Groups transformation
Constructs a table that contains information that is used in the sessionize loop; aggregates the parse output groups so that all of the Group 1 session IDs are together, all the Group 2 IDs are together, and so on; prepares views that are ready for the Clickstream Sessionize transformation.
From: Filter - Only properly parsed logs (SAS Extract) transformation
To: Build Loop 2 Parameters (SAS Extract) transformation
Build Loop 2 Parameters (SAS Extract) transformation
Builds a data table that supplies the parameter values for the loop transformation.
From: Clickstream Create Groups transformation
To: Set Sessionize Output Library Locations (Lookup) transformation
SESSIONIZE_GRID_PATHS table
Contains a list of sessionized grid paths.
From: None
To: Set Sessionize Output Library Locations (Lookup) transformation
Set Sessionize Output Library Locations (Lookup) transformation
Assigns each group of tables from the Parse loop to a sessionize output location.
From: Build Loop 2 Parameters (SAS Extract) transformation and SESSIONIZE_GRID_PATHS table
To: Loop 2 (Identify Sessions) transformation
The following display shows the combine groups stage of the template job.
Combine Groups Process Flow
Combine Groups Process Flow

Loop Two: Sessionize

The fourth stage consists of the second loop. This stage contains transformations and tables that run the loop and sessionize the data.
The transformations and tables in this stage are described in the following table:
Sessionize Transformations
Name
Description
Inputs from and Outputs to
Loop 2 (Identify Sessions) transformation
Sets the parameters that are passed through to the subjobs. The following parameters are set:
  • INPUTLIBNAME is the SAS LIBNAME value used to reference all of the output SAS tables from the Clickstream Parse loop.
  • INPUTPATHS is a string formatted for use in the SAS LIBNAME statement. This string specifies the physical paths that contain the SAS table created by the Clickstream Parse loop.
  • INPUTMEMBER is the group of data that is to be processed.
  • OUTMEMBER and OUTLIBPATH define the locations of the Sessionize output.
  • PERMLIBPATH is the path location for the PERMLIB= option; PERMLIB retains data from sessions that were active during processing of the last Web log so that it can continue the sessions later; using PERMLIB enables you to reconnect spanned sessions that were cut when a Web log file ended and a new log file began. The PERMLIB results enable a spanned session to be recognized as the same session by the Clickstream Sessionize transformation.
From: Set Sessionize Output Library Locations (Lookup) transformation
To: Clickstream Sessionize transformation
To: Filter Failed Jobs (SAS Extract) transformation
PARAM_PARSE_RESULTS table
A parameterized table for receiving the output from the Clickstream Parse transformation and passing it into the Clickstream Sessionize transformation. (See Understanding the Propagation of Columns in the Multiple Log Template Job if you have defined User Columns that need to be propagated to the final detail table.)
This table contains the columns that support campaign information.
From: None
To: Clickstream Sessionize transformation
Clickstream Sessionize transformation
Identifies sessions in the grouped data.
From: Loop 2 (Identify Sessions) transformation and PARAM_PARSE_RESULTS table
To: Checkpoint - Can we identify sessions? transformation and CLICKSTREAM_SESSIONIZE table
CLICKSTREAM_SESSIONIZE table
Stores CLICKSTREAM_SESSIONIZE output and ensures the sort sequence of the output data is correct. (See Backing Up PERMLIB.)
From: Clickstream Sessionize transformation
To: None
Checkpoint - Can we identify sessions? transformation
Evaluates the return code from the Clickstream Sessionize transformation; sends e-mail to specified address if the sessionized step fails.
From: Clickstream Sessionize transformation
To: Loop End transformation
Loop End transformation
Ends loop processing; returns to beginning of loop
From: Checkpoint - Can we identify sessions? transformation
To: Filter Failed Jobs (SAS Extract) transformation
The following display shows the second loop stage of the template job.
Loop 2 Process Flow
Loop 2 Process Flow

Create Detail and Generate Output

The fifth stage combines the outputs from multiple Clickstream Sessionize transformations to create a single detail table.
The transformations and tables in this stage are described in the following table:
Detail and Output Transformations
Name
Description
Inputs from and Outputs to
Filter Failed Jobs (SAS Extract) transformation
Uses the status table generated by the Loop transformation to determine which subjobs were successful and therefore should be processed further.
Loop 2 (Identify Sessions) transformation
From: Loop End transformation
To: Clickstream Create Detail transformation
Clickstream Create Detail transformation
Combines the output from multiple Clickstream Sessionize transformations and creates a single data table.
From: Filter Failed Jobs (SAS Extract) transformation
To: CI_DDS_OUTPUT table
CI_DDS_OUTPUT table
Contains the output from the Clickstream Sessionize transformations.
From: Clickstream Create Detail transformation
To: None
The following display shows the create detail and generate output stage of the template job.
Create Detail and Generate Output Process Flow
Create Detail and Generate Output Process Flow