Spiders, robots, crawlers, pingers, and any other computer program
are referred to as non-human visitors (NHV). Spiders (a search engine
bot, for example) surf the Web site traveling various links to determine
the contents of all of the Web pages. All spiders or NHVs have certain
behavior characteristics that make it possible to identify them such
as clicking at a rate faster than humanly possible or pinging at an
exact interval.
Activity
from NHVs is handled in two locations. The first is in the Clickstream
Parse transformation using the Filter Spiders by User Agent rule.
This rule matches commonly known strings found in the user agent of
well-behaved NHVs who identify themselves as an NHV. By default, this
rule deletes activity for these NHVs. The purpose of this detection
is to eliminate NHV clicks as soon as possible.
The second
location NHV activity is handled is during the Clickstream Sessionize
transformation, using a proprietary behavioral detection approach
that examines the behavior of the visitor within a session and decides
whether the behavior is likely to be that of a human or a non-human visitor.
This process is known as Behavioral Identification of Non-Human Sessions
(BINS), and is configured using the spider-related options on the
Clickstream Sessionize transformation. See the Clickstream Sessionize
Options tab help for details on how to configure this
functionality.