This chapter shows how
you can create topics and rules from filtered terms using the
Text
Topic node, and the
Text Rule Builder node.
The
Text
Topic node enables you to explore the document collection
by automatically associating terms and documents according to both
discovered and user-defined topics. Topics are collections of terms
that describe and characterize a main theme or idea. The goal in creating
a list of topics is to establish combinations of words that you are
interested in analyzing. The ability to combine individual terms into
topics can improve your text mining analysis. Through combining, you
can narrow the amount of text that is subject to analysis to specific
groupings of words that you are interested in. For more information
about the
Text Topic node, see the SAS Text
Miner Help.
The
Text
Rule Builder node generates an ordered set of rules from
small subsets of terms that together are useful in describing and
predicting a target variable. Each rule in the set is associated with
a specific target category that consists of a conjunction that indicates
the presence or absence of one or a small subset of terms (for example,
“term1” AND “term2” AND (NOT “term3”)).
A particular document matches this rule if and only if it contains
at least one occurrence of term1 and of term2 but no occurrences of
term3. This set of derived rules creates a model that is both descriptive
and predictive. When categorizing a new document, it proceeds through
the ordered set and chooses the target that is associated with the
first rule that matches that document. The rules are provided in the
syntax that can be used within SAS Content Categorization Studio,
and can be deployed there. For more information about the
Text
Rule Builder node, see the SAS Text Miner help.