Collecting Campaign Information in a Customer Integration Job

Problem

You want to process the data contained in one or more clickstream logs in a single job while filling the marketing campaign information into all records for each job. The campaign information values that you need to capture are the Response Tracking Code (RTC) and the Subject ID (Sn) fields.

Solution

You can collect campaign information by using the Customer Integration Template job. This template is virtually identical to the Basic (Multiple) Web Log Template. The only difference is that new columns have been added to contain campaign information, Clickstream Parse rules are added to extract the values from the raw Web log, and the Fill Column options are used to copy the values for these new columns into all records for a session.
If you have not done so already, you should run a copy of the setup job for the Customer Integration Template job, which is named clk_0010_setup_basic_ci_job. When you actually process the data, you should run a copy of the Customer Integration Template job, which is named clk_0200_load_ci_dds. By running a copy, you protect the original template. For information about running the setup job and creating a copy of the original job, see Copying the Customer Integration Template Folder.

Tasks

Review and Prepare the Job

You can examine the Customer Integration Template job on the Diagram tab of the SAS Data Integration Studio Job Editor before you run it. You can also configure the job to change the list of logs that you process, set the number of groups that are used in the sessionizing loop, and specify parallel and multiple processing options.
Perform the following steps to make these adjustments:
  1. Open the renamed multiple logs template job.
  2. Scroll through the job on the Diagram tab.
    Note the following components:
    • the two loops and the connections between them
    • the transformations that prepare the clickstream logs and groups for loop processing
    • the output table that collects the results from the job
    For information about how the job is processed, see About the Customer Integration Template Job.
  3. Right-click the Log_Paths table and select Open from the pop-up menu. Review the list of log paths contained in the table. If you need to modify this list, you can click Switch to edit mode icon in the toolbar and make any needed changes.
  4. Open the Loop Options tabs in the property windows for the two Loop transformations and make sure that the appropriate parallel processing settings are specified. Be particularly careful to ensure that the path specified in the Location on host for log and output files field is correct.
    For information about the prerequisites for parallel processing, see the “About Parallel Processing” topic in the Working with Iterative Jobs and Parallel Processing chapter in the SAS Data Integration Studio: User's Guide. Of course, your job fails if parallel processing has been enabled but the parallel processing prerequisites have not been satisfied.
  5. Open the Parameters tab in the properties window for the template job and review the two parameters Number of Distinct Clickstream Parse Output Paths and Number of Groups into which data should be divided for the job. To access these values, select the parameters and click Edit to access the Edit Prompt window. Then, click Prompt Type and Values to review the number of groups specified in the Default value field. Click OK as necessary to close the dialog boxes and return to the Diagram tab.
    Note: The value for these parameters must match the value entered for the setup job. The setup job values are entered on the Options tab in the properties window for the Setup transformation in the setup job. If you change either of these values in the template job, you need to rerun the setup job to make sure that the settings match and that the supporting file system structure is generated.

Set Campaign Information Options

Perform the following steps to set options that enable you to capture campaign information:
  1. Open the properties window of the Clickstream Sessionize transformation.
  2. Review the Forward fill columns and Complete fill columns options to verify that they are set appropriately for your needs.
  3. Click OK to save the option settings and close the properties window.

Run the Job and Examine the Output

Perform the following steps to run a Customer Integration Template job and examine its output:
  1. Run the job.
    The following display shows a successfully completed sample job.
    Completed Customer Integration Template Job
    Completed Customer Integration Template Job
  2. If the job completes without error, right-click the CI_DDS_OUTPUT table at the end of the job and select Open from the pop-up menu.
    The View Data window appears, as shown in the following display.
    Customer Integration Template Job Output
    Customer Integration Template Job Output
    The campaign-specific fields are found at the end of the field list as shown in the following display.
    campaign-specific fields
    If the job does not complete successfully, then you might want to examine the logs for each loop in the job. Since most of the processing is done in the loop portion of the job, this is where most errors occur. Examine the Status tab to determine where the error occurred and refer to the log for that part of the job. A SAS log is saved for each pass through the loops in the Customer Integration Template job. These logs are placed in a folder called Process Logs under the Loop1 and Loop2 folders in the structure that is created by the template setup job.
    In order to know which file you are looking for, you should understand the naming conventions for these log files. The files in the ProcessLogs folder are named Lnn_x.log, where nn is a unique number for this particular Loop transformation and x is a number that represents the iteration of the current loop. For example, if you process 200 Web logs, then the ProcessLogs folder for Loop1 (Clickstream Log transformation and Clickstream Parse transformation) contains 200 logs named Lnn_1.log to Lnn_200.log (where nn is some constant number).
    The ProcessLogs folder for Loop2 (Clickstream Sessionize transformation) has the same naming convention. However, the log folder for Loop2 contains one log for each group. For example, if the Clickstream Parse transformation in the first loop generated five groups, then the logs are named Lnn_1.log to Lnn_5.log (where nn is a constant number).