About the Tasks That You Will Perform

This chapter shows how you can create topics and rules from filtered terms using the Text Topic node, and the Text Rule Builder node.
The Text Topic node enables you to explore the document collection by automatically associating terms and documents according to both discovered and user-defined topics. Topics are collections of terms that describe and characterize a main theme or idea. The goal in creating a list of topics is to establish combinations of words that you are interested in analyzing. The ability to combine individual terms into topics can improve your text mining analysis. Through combining, you can narrow the amount of text that is subject to analysis to specific groupings of words that you are interested in. For more information about the Text Topic node, see the SAS Text Miner Help.
The Text Rule Builder node generates an ordered set of rules from small subsets of terms that together are useful in describing and predicting a target variable. Each rule in the set is associated with a specific target category that consists of a conjunction that indicates the presence or absence of one or a small subset of terms (for example, “term1” AND “term2” AND (NOT “term3”)). A particular document matches this rule if and only if it contains at least one occurrence of term1 and of term2 but no occurrences of term3. This set of derived rules creates a model that is both descriptive and predictive. When categorizing a new document, it proceeds through the ordered set and chooses the target that is associated with the first rule that matches that document. The rules are provided in the syntax that can be used within SAS Content Categorization Studio, and can be deployed there. For more information about the Text Rule Builder node, see the SAS Text Miner help.