Resources

SAS Text Miner

What's New in SAS Text Miner 4.2


Overview

SAS Text Miner 4.2 includes the following new features and enhancements:

  • new text mining nodes
  • new supported server platform
  • additional supported languages
  • new add-on support for custom entries
  • new procedures

New Text Mining Nodes

Three nodes have been added:

  • Text Parsing node
  • Text Filter node
  • Text Topic node

SAS Text Miner 4.2 provides additional options to control how you perform text mining. Each of the new nodes focuses on a specific text mining step. You can also still use the Text Miner node if you want to collapse all of the text mining steps into a single node on your process flow diagram.

Text Parsing Node

The Text Parsing node enables you to parse a document collection in order to quantify information about the terms that are contained therein. The Text Parsing node provides a standard parsing facility, and enables you to import custom entities as defined in SAS Content Categorization.

In addition to new functionality, the Text Parsing node offers improved parsing performance. By using a Text Parsing node in a process flow diagram, you are only required to parse a document collection once. This can lead to performance improvements beyond what you can obtain with the Text Miner node. For example, modifications to filtering would require the Text Miner node to reparse all the documents again. Similarly, when you use the Text Topic node, you are not required to reparse the document collection.

Text Filter Node

The Text Filter node enables you to reduce the total number of parsed terms or documents that are analyzed, in order to exclude extraneous information from your analysis. The Text Filter node enables you to perform spell checking using PROC TMSPELL, do full text searches using integrated Teragram search capabilities, view and analyze results with concept linking, and conduct subsetting management of terms and documents.

Text Topic Node

The Text Topic node enables you to combine terms into topics or provide your own topics that you want to analyze. With the Text Topic node, you can manage topics by mining for multiple topics per document, automatically create single and multi-word topics, edit automatically generated topics, and define your own topics. The Interactive Topic Viewer enables you to manage your topics, and results from the Text Topic node provide charts and tables that enable you to analyze, for example, the number of documents by topics and the number of terms by topics.


New Supported Server Platform

SAS Text Miner now supports the SAX server platform.


Additional Supported Languages

In addition to the languages supported in previous releases (Chinese, English, French, German, Italian, Portuguese, and Spanish), SAS Text Miner 4.2 also supports the following languages: Arabic, Dutch, Japanese, Korean, Polish, and Swedish. Entity parsing is available for all supported languages.


New Add-on Support for Custom Entries

You must use SAS Content Categorization with Teragram Contextual Extraction to be able to define custom entities for use in SAS Text Miner. For more information on how to use custom entries in SAS Text Miner, see the SAS Teragram TK240 User's Guide.


New Procedures

The following new procedures support SAS Text Miner functionality:

  • PROC TMSPELL performs integrated spell checking. You can use TMSPELL to automatically create a synonym list for misspellings, undetected stems, and shorthand, or for use with an input synonym data set.
  • PROC TMFACTOR performs rotated SVD. You can use the SAS interactive procedure TMFACTOR to implement nonnegative matrix decomposition.