The Read Me First note in the job flow contains information
that is recommended for the initial setup and modification of this
job. You might also need to edit the following values in the
Parameters tab for the job:
supplies the e-mail
address in the Checkpoint transformations in the template. This address
is used for failure notification.
determines the number
of folders that are created for holding output for the parallel executions
of the Clickstream Parse transformation in the first loop. Set this
value to match the default value that was used in the setup job. If
that value was changed in the setup job, then it should also be updated
here.
determines how many
groups of data are created by the Clickstream Parse transformation
during the first loop. Therefore, it also determines the maximum
number of parallel executions for Clickstream Sessionize transformation
during the second loop. Set this parameter value to match the default
value that was used in the setup job. If that value was changed in
the setup job, then it should also be updated here.
The first
stage of the campaign information template process locates the data
and parses it.
The transformations
and tables in this stage are described in the following table:
Locate and Parse Transformations and Tables
|
|
Inputs from and Outputs
to
|
|
Contains a list of folder
paths to scan for clickstream logs.
|
To: Directory Contents
transformation
|
Directory Contents transformation
|
Generates a data table
that contains a list of the files found in the directories that are
listed in the LOG_PATHS data table. The output table contains the
following columns:
-
FILENUM: a unique sequence number
related to that file (such as 1,2,3,4)
-
FILENAME: the name of the file
-
FULLNAME: a combination of path
and filename
|
To: Build Loop Parameters
(reused SAS Extract) transformation
|
Build Loop Parameters
(reused SAS Extract) transformation
|
Passes through the columns
from the Directory Contents transformation and creates two additional
columns. LIBRARYNUMBER is a number from 1 to n where n is the number of
output locations that have been defined on the file system for the
first loop (the Clickstream Parse transformation). This column's value
is used to ensure that when running in parallel, the output from the
jobs is spread across the different folders. PARSEOUTMEMBER uses the
incoming FILENUM value to create a unique suffix for the parse output
tables. This ensures that when two streams use the same folder, the
output from one does not overwrite the output from the other.
|
From: Directory Contents
transformation
To: Set Output Library
Locations (reused Lookup) transformation
|
|
Contains a list of paths
to folders where the outputs from multiple Clickstream Parse transformation
calls are distributed. The paths specified in this table are accessed
simultaneously by parallel processes. To optimize performance, specify
paths that reside on different physical disks or network locations.
|
To: Set Output Library
(reused SAS Extract) transformation
|
Set Output Library (reused
SAS Extract) transformation
|
Uses the output library
locations that are listed in the PARSE_GRID_PATHS configuration table.
This transformation uses the LIBRARYNUMBER column to associate that
log file with an output location (PARMLIBPATH) and an output LIBNAME
(PARMLIBNAME). These values provide a different input file and output
library for each iteration of the loop that follows.
|
From: Build Loop Parameters
(reused SAS Extract) transformation, and PARSE_GRID_PATHS table
To: Loop 1 (Recognize
and Parse) transformation
|
The following display
shows the locate and parse data stage of the template job.
Locate and Parse Process Flow