The
Text
Topic node enables you to explore the document collection
by automatically associating terms and documents according to both
discovered and user-defined topics. Topics are collections of terms
that describe and characterize a main theme or idea. The goal in creating
a list of topics is to establish combinations of words that you are
interested in analyzing. The ability to combine individual terms into
topics can improve your text mining analysis. Through combining, you
can narrow the amount of text that is subject to analysis to specific
groupings of words that you are interested in.
For example, you might
be interested in mining articles that discuss the activities of a
"company president." One way to approach this task is to look at all
articles that have the term "company," and all articles that have
the term "president." The
Text Topic node
enables you to combine the terms "company" and "president" into the
topic "company president.” The approach is different from clustering.
Clustering assigns each document to a unique group, while the
Text
Topic node assigns a score for each document and term
to each topic. Then thresholds are used to determine whether the association
is strong enough to consider that the document or term belongs to
the topic. As a result, documents and terms can belong to more than
one topic or to none at all. The number of topics that you request
should be directly related to the size of the document collection
(for example, a large number for a large collection).
For more information
about the
Text Topic node, see the SAS Text
Miner Help.