About the Text Topic Node

The Text Topic node enables you to explore the document collection by automatically associating terms and documents according to both discovered and user-defined topics. Topics are collections of terms that describe and characterize a main theme or idea. The goal in creating a list of topics is to establish combinations of words that you are interested in analyzing. The ability to combine individual terms into topics can improve your text mining analysis. Through combining, you can narrow the amount of text that is subject to analysis to specific groupings of words that you are interested in.
For example, you might be interested in mining articles that discuss the activities of a "company president." One way to approach this task is to look at all articles that have the term "company," and all articles that have the term "president." The Text Topic node enables you to combine the terms "company" and "president" into the topic "company president.” The approach is different from clustering. Clustering assigns each document to a unique group, while the Text Topic node assigns a score for each document and term to each topic. Then thresholds are used to determine whether the association is strong enough to consider that the document or term belongs to the topic. As a result, documents and terms can belong to more than one topic or to none at all. The number of topics that you request should be directly related to the size of the document collection (for example, a large number for a large collection).
For more information about the Text Topic node, see the SAS Text Miner Help.