The Text Mining Process

Whether you intend to use textual data for descriptive purposes, predictive purposes, or both, the same processing steps take place, as shown in the following table:

Action	Result	Tool
File preprocessing	Creates a single SAS data set from your document collection. The SAS data set is used as input for the Text Parsing node, and might contain the actual text or paths to the actual text.	Text Import node %TMFILTER macro — a SAS macro for extracting text from documents and creating a predefined SAS data set with a text variable
Text parsing	Decomposes textual data and generates a quantitative representation suitable for data mining purposes.	Text Parsing node
Transformation (dimension reduction)	Transforms the quantitative representation into a compact and informative format.	Text Filter node
Document analysis	Performs classification, prediction, or concept linking of the document collection. Creates clusters, topics, or rules from the data.	Text Cluster node Text Topic node Text Rule Builder node SAS Enterprise Miner predictive modeling nodes

Note: The Text Miner node is not available from the Text Mining tab in SAS Text Miner 12.1. The Text Miner node has now been replaced by the functionality in other SAS Text Miner nodes. You can import diagrams from a previous release of SAS Text Miner that had a Text Miner node in the process flow diagram. However, new Text Miner nodes can no longer be created, and property values cannot be changed in imported Text Miner nodes. For more information, see the Converting SAS Text Miner Diagrams from a Previous Version topic in the SAS Text Miner Help.

Finally, the rules for clustering or predictions can be used to score a new collection of documents at any time.

You might not need to include all of these steps in your analysis, and it might be necessary to try a different combination of options before you are satisfied with the results.