Whether
you intend to use textual data for descriptive purposes, predictive
purposes, or both, the same processing steps take place, as shown
in the following table:
|
|
|
|
Creates a single SAS
data set from your document collection. The SAS data set is used as
input for the Text Miner node or the Text Parsing node,
and might contain the actual text or paths to the actual text.
|
%TMFILTER macro —
a SAS macro for extracting text from documents and creating a predefined
SAS data set with a text variable
|
|
Decomposes textual data
and generates a quantitative representation suitable for data mining
purposes.
|
Text Miner node, Text
Parsing node
|
Transformation
(dimension reduction)
|
Transforms the quantitative
representation into a compact and informative format.
|
Text Miner node, Text
Filter node
|
|
Performs
classification, prediction, or concept linking of the document collection.
Creates clusters or topics from the data.
|
Text Miner node, Text
Topic node, or SAS Enterprise Miner predictive modeling nodes
|
Finally,
the rules for clustering or predictions can be used to score a new
collection of documents at any time.
You might
not need to include all of these steps in your analysis. Also, it
might be necessary to try a different combination of text-parsing
options before you are satisfied with the results.