Managing Subsite Flow Segments

Problem

The subsite template job contains transformations that isolate three separate subsites from the clickstream log. Of course, your data often includes a larger or smaller number of sites. Even if you need to process exactly three subsites, they are unlikely to be named PRD, SVCS, and GEN. Fortunately, you can add, delete, and modify the subsite flow segments in your jobs.

Solution

Tasks

Adding Subsite Flow Segments

Perform the following steps to add one or more subsite flow segments to your job:
  1. Find the folders for the subsite job in the Folders pane on the SAS Data Integration Studio desktop. Then, add an Additional Output folder and a Permanent Library folder to the folders under Data Sources folder, which is located within the Sub Site Template folder. If you are adding a segment to locate a techsupp subsite, you might call these folders Additional Output TECHSUPP and Permanent Library TECHSUPP. The name that you use here is not used in any way during the processing of the job. It simply functions as a visual cue to assist you in editing the job.
  2. Add a Clickstream Parse transformation to the job flow on the Diagram tab of the Job Editor window.
  3. Connect the temporary output table port of the Clickstream Parse - Global transformation to the input port of the just-added Clickstream Parse transformation.
  4. Open the General tab in the properties window for the Clickstream Parse transformation. Then, rename the transformation to document the subsite that you need to add. For example, you can rename the transformation to Clickstream Parse - TECHSUPP if you are adding a techsupp subsite.
  5. Click Input Mapping. Then, click Map all columns in the toolbar.
  6. Click Rules. Disable the Filter graphics files, Filter non-pages, and Filter spiders by user agent rules, which are enabled by default. To disable the rules, click No in the drop-down menu in the Enable column for those rows.
  7. Right-click a blank space on the Rules tab and click New in the pop-up menu. A row is added to the table.
  8. Enter the following values for the columns in the new row:
    • Enable: Yes (via drop-down menu)
    • Group: Subsite
    • Name: Filter by subsite
    • When: After Input (via drop-down menu)
    • Condition Type: SAS expression
    • Action Type: Delete
  9. Right-click the row for the Filter by subsite rule. Then, click Properties in the pop-up menu to access the Rules Properties window.
  10. Select the SAS expression radio button and click Build to access the SAS Expression Builder window. Modify a SAS expression from one of the other subsite flow segments to isolate the data from the desired subsite. For example, you can modify (CLK_cs_URI_Stem,'/prd','ti') from the PRD segment to (CLK_cs_URI_Stem,'/techsupp','ti') for the TECHSUPP segment.
    Note: This code is an example of how you can create and modify rules to subset the data records for a subsite. You can also use multiple rules if you need them to obtain the desired result.
  11. Close the row and transformation properties windows to return to the process flow.
  12. Click Control Flow in the Details panel of the Job Editor window. Drag the subsite parse transformation for the new subsite to the position after the Checkpoint - Can we sessionize subsite data? transformation for the previous subsite in the flow. In the techsupp example, Checkpoint - Can we sessionize GEN subsite data? precedes Clickstream Parse - TECHSUPP.
  13. Add a Return Code Check transformation to the job flow. Open the General tab in the properties window for the transformation and rename the transformation to match the parse transformation for the subsite that you just added. For example, if you are adding a techsupp subsite, you might rename the transformation to Checkpoint - Can we parse TECHSUPP subsite data?
  14. Click Status Handling. Then, click the Send Email action and click Action Options to open the Action Options window.
  15. Specify an e-mail address in the Value column for the e-mail address option. If the step fails when the subsite job is run, an e-mail message is sent to the specified address. Then, close the windows.
  16. Click Control Flow in the Details panel of the Job Editor window. Drag the checkpoint transformation for the new subsite to the position after the Clickstream Parse - TECHSUPP transformation.
  17. Add a Clickstream Sessionize transformation to the job flow. Open the General tab in the properties window for the transformation and rename the transformation to match the subsite that you are adding. For example, you might rename the transformation to Clickstream Sessionize -TECHSUPP.
  18. Click Options and click Tables in the list at the left side of the tab.
  19. Click Browse adjacent to the Additional output library field to locate the output library for the subsite. The path and name for the techsupp subsite is /Shared Data/Sub Site Template/Data Sources/Additional Output TECHSUPP/Additional Output TECHSUPP(Library).
  20. Click Browse adjacent to the Permanent library path field to locate the permanent library for the subsite. The path and name for the techsupp subsite is /Shared Data/Sub Site Template/Data Sources/Permanent Library TECHSUPP/Permanent Library TECHSUPP(Library).
  21. Click Control Flow in the Details panel of the Job Editor window. Drag the sessionize transformation for the new subsite to the position after the checkpoint transformation for the parse transformation. Then, drag between the temporary output table port for the parse transformation and the input port for the sessionize transformation to connect them. The sessionize transformation now has inputs from the checkpoint transformation and the parse transformation.
  22. Right-click the temporary output table port for the sessionize transformation, and select the output table for the subsite flow segment from the Table Selector window.
  23. Add another Return Code Check transformation to the job flow. Open the General tab in the properties window for the transformation and rename the transformation to match the sessionize transformation for the subsite that you just added. For example, if you are adding a techsupp subsite, rename the transformation to Checkpoint - Can we sessionize TECHSUPP subsite data?
  24. Click Status Handling. Click the Send Email action, and then click Action Options.
  25. Specify an e-mail address in the Value column for the e-mail address option. If the step fails when the subsite job is run, an e-mail message is sent to the specified address. Then, close the properties windows.
  26. Click Control Flow in the Details panel of the Job Editor window. Drag the checkpoint transformation for the new subsite to the position after the sessionize transformation.

Deleting Existing Subsite Flow Segments

Perform the following steps to delete existing subsite flow segments:
  1. Select the transformations and tables that comprise the subsite flow segment that you need to delete.
  2. Right-click one of the selected objects and click Delete in the pop-up menu.

Modifying Existing Subsite Flow Segments

Perform the following steps to change the set of subsites isolated during the job:
  1. Open the Rules tab in the properties window for the Clickstream Parse subsite transformation that you need to modify. Note that only the Filter by subsite rule is enabled.
  2. Right-click the row Filter by subsite rule to access the Rule Properties window.
  3. Modify the value in the Expression field under the SAS expression radio button to find the subsite that you need. For example, the expression (CLK_cs_URI_Stem,'/prd','ti') locates a subsite identified by PRD. Because the action specified for the rule is Delete, the rule isolates the subsite by filtering out all other data in the clickstream log. You can also add any additional rules that you need to control the filtering of data from the output table for this subsite.
  4. Close the properties windows.