About the Subsite Template Job

The subsite template job enables you to identify one or more subsites within a Web log. Then, you can identify and session the data for only those subsites that you need to analyze. All other data in the Web log is filtered out.
Subsites are commonly identified in the following ways:
  • subsite specification via URI.: http://www.abc.com/marketing and http://www.abc.com/techsupp, where the organization identifies the subsites with part of the URI (marketing and techsupp in these URIs)
  • subsite specification via sub domains: http://mkt.abc.com and http://ts.abc.com, where mkt and ts are considered sub-domains of the abc.com domain
  • subsites to be defined by the user, where the user identifies subsites by using some user-defined algorithm against the Web log data (such as the information stored in a cookie string)
The subsite job uses the following Clickstream Parse transformations that are configured for specialized purposes and renamed accordingly:
  • Clickstream Parse - Global Rules, which filters out superfluous data such as graphic files, non-pages, and spiders that identify themselves in their user agent string
  • Clickstream Parse - Subsite, which isolates subsites
  • Clickstream Parse - ALL, which generates output that includes the content from all subsites
The subsite job also includes a series of Clickstream Sessionize transformations to split the sets of data into sessions.