Examining Web Log Data

Building the Process Flow Diagram

This example uses the same diagram workspace that you created in Chapter 2. You have the option to create a new diagram for this example, but instructions to do so are not provided in this example. First, you need to add the SAMPSIO.WEBPATH data source to project.
  1. In the Project Panel, right-click Data Sources and click Create Data Source.
  2. In the Data Source Wizard — Metadata Source window, click Next.
  3. In the Data Source Wizard — Select a SAS Table and enter SAMPSIO.WEBPATH in the Table field. Click Next.
  4. In the Data Source Wizard — Table Information window, click Next.
  5. In the Data Source Wizard — Metadata Advisor Options window, click Advanced. Click Next.
  6. In the Data Source Wizard — Column Metadata window, make the following changes:
    • For the variable REFERRER, set the Role to Input.
    • For the variable REQUESTED_FILE, set the Role to Target.
    • For the variable SESSION_ID, set the Role to ID.
    • For the variable SESSION_SEQUENCE, set the Role to Sequence.
    Click Next.
  7. In the Data Source Wizard — Decision Configuration window, click Next.
  8. In the Data Source Wizard — Create Sample window, click Next.
  9. In the Data Source Wizard — Data Source Attributes window, set the Role of the data source to Transaction. Click Next.
  10. In the Data Source Wizard — Summary window, click Finish.
In the Project Panel, drag the WEBPATH data source to your diagram workspace. From the Explore tab, drag a Link Analysis node to your diagram workspace. Connect the WEBPATH data source to the Link Analysis node.
Example PFD
Select the Link Analysis node. Set the value of the Association Support Type property to Count. Set the value of the Association Support Count property to 1. This ensures that all paths are captured by the Link Analysis node, including visitors that requested just a single page. Set the Minimum Confidence (%) property to 50.

Running the Link Analysis Node

In your diagram workspace, right-click the Link Analysis node and click Run. In the Confirmation window, click Yes. Click Results in the Run Status window.
Results Window
Maximize the Items Constellation Plot window. The Items Constellation Plot shows all of the links to and from each page. The arrows indicate the direction of travel. In the upper left corner, set the Node value on the drop-down menu to /Cart.jsp. This puts the /Cart.jsp node in the center of the diagram. The diagram now contains only the nodes that are connected to /Cart.jsp.
Constellation Plot
The thickness of each arrow represents the relative frequency of that link. In this example, the most frequent links appear to be from /Cart.jspto /Confirm.jsp, /Summary.jsp, /Billing.jsp, and /Shipping.jsp.
Because SESSION_SEQUENCE is a sequence variable, these links are directed. If you set the role of SESSION_SEQUENCE to Rejected, then the links are undirected. Undirected links would not detect the beginning and ending web page, but would instead treat each direction as the same path. For comparison, the undirected Items Constellation Plot is shown below.
Constellation Plot
In addition to creating undirected links, the Link Analysis node performed item-cluster detection because there was no sequence variable. The color of each node in the Items Constellation Plot indicates the cluster that the node belongs to. Minimize the Items Constellation Plot.
Maximize the Node Frequency Histogram (by Item-cluster) window.
Node Frequency Histogram
Notice that the first cluster contains the page /Cart.jsp. This indicates that the rest of the pages located in the first cluster are very similar to /Cart.jsp.
Close the Results window.