SAS Text Miner 4.2 includes the following new features and enhancements:
Three nodes have been added:
SAS Text Miner 4.2 provides additional options to control how you perform text mining. Each of the new nodes focuses on a specific text mining step. You can also still use the Text Miner node if you want to collapse all of the text mining steps into a single node on your process flow diagram.
Text Parsing Node
The Text Parsing node enables you to parse a document collection in order to quantify information about the terms that are contained therein. The Text Parsing node provides a standard parsing facility, and enables you to import custom entities as defined in SAS Content Categorization.
In addition to new functionality, the Text Parsing node offers improved parsing performance. By using a Text Parsing node in a process flow diagram, you are only required to parse a document collection once. This can lead to performance improvements beyond what you can obtain with the Text Miner node. For example, modifications to filtering would require the Text Miner node to reparse all the documents again. Similarly, when you use the Text Topic node, you are not required to reparse the document collection.
Text Filter Node
The Text Filter node enables you to reduce the total number of parsed terms or documents that are analyzed, in order to exclude extraneous information from your analysis. The Text Filter node enables you to perform spell checking using PROC TMSPELL, do full text searches using integrated Teragram search capabilities, view and analyze results with concept linking, and conduct subsetting management of terms and documents.
Text Topic Node
The Text Topic node enables you to combine terms into topics or provide your own topics that you want to analyze. With the Text Topic node, you can manage topics by mining for multiple topics per document, automatically create single and multi-word topics, edit automatically generated topics, and define your own topics. The Interactive Topic Viewer enables you to manage your topics, and results from the Text Topic node provide charts and tables that enable you to analyze, for example, the number of documents by topics and the number of terms by topics.
SAS Text Miner now supports the SAX server platform.
In addition to the languages supported in previous releases (Chinese, English, French, German, Italian, Portuguese, and Spanish), SAS Text Miner 4.2 also supports the following languages: Arabic, Dutch, Japanese, Korean, Polish, and Swedish. Entity parsing is available for all supported languages.
You must use SAS Content Categorization with Teragram Contextual Extraction to be able to define custom entities for use in SAS Text Miner. For more information on how to use custom entries in SAS Text Miner, see the SAS Teragram TK240 User's Guide.
The following new procedures support SAS Text Miner functionality: