The subsite template job enables you to
identify one or more subsites within a Web log. Then, you can identify
and session the data for only those subsites that you need to analyze.
All other data in the Web log is filtered out.
Subsites are commonly
identified in the following ways:
-
subsite specification via URI.:
http://www.abc.com/marketing
and
http://www.abc.com/techsupp
, where the organization identifies the subsites with
part of the URI (marketing and techsupp in these URIs)
-
subsite specification via sub domains:
http://mkt.abc.com
and
http://ts.abc.com
, where
mkt and ts are considered sub-domains of the abc.com domain
-
subsites to be defined by the user,
where the user identifies subsites by using some user-defined algorithm
against the Web log data (such as the information stored in a cookie
string)
The subsite job uses
the following Clickstream Parse transformations that are configured
for specialized purposes and renamed accordingly:
-
Clickstream Parse - Global Rules,
which filters out superfluous data such as graphic files, non-pages,
and spiders that identify themselves in their user agent string
-
Clickstream Parse - Subsite, which
isolates subsites
-
Clickstream Parse - ALL, which
generates output that includes the content from all subsites
The subsite job also includes a series of Clickstream
Sessionize transformations to split the sets of data into sessions.