Spiders, robots,
crawlers, pingers, and any other computer program that might generate
traffic to a Web site are referred to as non-human visitors (NHV).
Spiders (a search engine bot, for example) surf the Web site traveling
various links to determine the contents of all of the Web pages. All
spiders or NHVs have certain behavior characteristics that make it
possible to identify them. These characteristics include clicking
at a rate faster than humanly possible or pinging at an exact interval.
Activity
from NHVs is handled in two locations. The first is in the Clickstream
Parse transformation using the Filter Spiders by User Agent rule.
This rule matches commonly known strings found in the user agent of
well-behaved NHVs who identify themselves as an NHV. By default, this
rule deletes activity for these NHVs. The purpose of this detection
is to eliminate NHV clicks as soon as possible.
The second
location of NHV activity is handled during the Clickstream Sessionize
transformation. The transformation uses a proprietary behavioral detection
approach that examines the behavior of the visitor within a session
and decides whether the behavior is likely to be that of a human or
a non-human visitor. This process is known as Behavioral Identification
of Non-Human Sessions (BINS), and is configured using the spider-related
options on the Clickstream Sessionize transformation. See the Clickstream
Sessionize
Options tab help for details about
how to configure this functionality.
If you
have already filtered and removed the NHVs found by the Clickstream
Parse transformation using the rule that examines the User Agent string,
you might want to analyze the visitor behavior to ensure that none
of the remaining sessions were created by NHVs. To perform this analysis,
set the options in the Clickstream Sessionize properties window to
detect any NHVs.