Cluster Data

The Text Cluster node clusters documents into disjointed sets of documents and reports on the descriptive terms for those clusters. Two algorithms are available. The Expectation Maximization algorithm clusters documents with a flat representation, and the Hierarchical clustering algorithm groups clusters into a tree hierarchy. Both approaches rely on the singular value decomposition (SVD) to transform the original weighted, term-document frequency matrix into a dense but low dimensional representation. For more information about the Text Cluster node, see the SAS Text Miner help.
To cluster the data:
  1. Select the Text Mining tab on the node toolbar, and drag a Text Cluster node into the diagram workspace.
  2. Connect the Text Filter node to the Text Cluster node.
    Process flow diagram
  3. Select the Text Cluster node.
  4. Set the Descriptive Terms to 12 to ease cluster labeling.
  5. Right-click the Text Cluster node in the diagram workspace, and select Run.
  6. Click Yes in the Confirmation dialog box when you are asked whether you want to run the path.
  7. Click OK in the Run Status dialog box that appears after the Text Cluster node has finished running.