Whether you intend to use
textual data for descriptive purposes, predictive purposes, or both,
the same processing steps take place, as shown in the following table:
|
|
|
|
Creates a single SAS
data set from your document collection. The SAS data set is used as
input for the Text Miner node and might contain the actual text or
paths to the actual text.
|
%TMFILTER macro —
a SAS macro for extracting text from documents and creating a predefined
SAS data set with a text variable
|
|
Decomposes textual data
and generates a quantitative representation suitable for data mining
purposes.
|
|
Transformation (dimension reduction)
|
Transforms the quantitative
representation into a compact and informative format.
|
|
|
Performs clustering,
classification, prediction, or concept linking of the document collection.
|
Text Miner node or SAS
Enterprise Miner predictive modeling nodes
|
Finally,
the rules for clustering or predictions can be used to score a new
collection of documents at any time.
You might
not need to include all of these steps in your analysis, and it might
be necessary to try a different combination of text-parsing options
before you are satisfied with the results.