The option entered in the
Hold Buffer Size field in the
Input pane on the
Options tab in the Clickstream Parse transformation
can have a significant effect on the performance of the transformation.
When Web servers write raw data to the logs, the records are typically
written in chronological order. The hold buffer size option represents
the amount of this data that is held in memory before it is written
to the output table.
For example,
the default value of
120
causes all records
that have a timestamp within the last 120 seconds of the latest timestamp
to be held in memory. With this value, any records that have a date-and-time
stamp that is not within that 120-second range are added to the output
table. This hold buffer usually enables any incoming records that
are slightly out of chronological order to be corrected. Thus, a subsequent
sort of the data can generally be avoided.
However,
the default hold buffer size does not always work as expected. If
you find that your incoming data is out of chronological order and
exceeds this 120-second threshold, you can the increase the hold buffer
size. However, the larger hold buffer increases the memory used by
the Clickstream Parse transformation because more data is held in
the buffer before it is sent to the output table.
If the
hold buffer functionality is consistently unable to prevent a sort,
it can be switched off with a value of
0
.
This setting can result in a subsequent sort being required. However,
it removes some of the processing overhead that occurs in managing
the buffer.