What’s New in SAS Text Miner 5.1

Overview

SAS Text Miner 5.1 includes the following new features and enhancements:
  • new text mining nodes
  • replacement of the original Text Miner node
  • additional supported languages
  • new functionality for text mining nodes
  • procedure change

New Text Mining Nodes

Overview of the New Text Mining Nodes

Two new nodes have been added in SAS Text Miner:
  • Text Cluster Node
  • Text Import Node

Text Cluster Node

The Text Cluster node replaces the clustering functionality and the creation of the singular value decomposition in the original Text Miner node. The new node enables you to both cluster documents and experiment with different cluster settings without having to reparse the collection to see the updates.

Text Import Node

The Text Import node enables you to create data sets from your own document collections or from a Web crawl, all from within the context of a SAS Enterprise Miner diagram.

Replacement of the Original Text Miner Node

The Text Miner node that was available in previous releases of SAS Text Miner has now been replaced by the functionality in other SAS Text Miner nodes.
This release allows you to import diagrams from a previous release of SAS Text Miner that had a Text Miner node in the process flow diagram; however, new Text Miner nodes can no longer be created, and property values cannot be changed in imported Text Miner nodes.

Additional Supported Languages

In addition to the languages supported in previous releases (Arabic, Chinese, Dutch, English, French, German, Italian, Japanese, Korean, Polish, Portuguese, Spanish, and Swedish), SAS Text Miner 5.1 also supports these languages: Czech, Danish, Finnish, Greek, Hebrew, Hungarian, Indonesian, Norwegian, Romanian, Russian, Slovak, Thai, Turkish, and Vietnamese.
Note: While custom entities are supported for the new languages, these languages do not come prepackaged with default entities. You can use SAS Concept Creation for SAS Text Miner to enable extraction, definition, and managing of custom entities for inclusion in text mining projects and analysis.

New Functionality for Text Mining Nodes

Export Synonyms from the Text Filter Node

You can create synonym data sets as you specify synonyms in the Interactive Filter Viewer.

Import Synonyms to Use in the Text Filter Node

You can import synonyms into the Text Filter node using the Import Synonyms property.

Improvements to Table Editing and Creating

Improvements include the ability to:
  • sort columns
  • insert and delete multiple rows
When a new row is added for user topics, a default weight is used.

Text Filter Node and Text Topic Node Improvements

You can now edit any existing subset documents filter in the Text Filter node.
Both the Text Filter node and Text Topic node viewers allow you to find text (and find the next to cycle through all occurrences).

Improvements to the Text Topic Viewer

The Text Topic Viewer includes the following improvements:
  • Creates exactly the number of topics asked for (rather than that number or less)
  • Exports raw rotated SVD topic values that are automatically set to be used by any predictive modeling nodes
  • The 1/0 topic variables are still exported, and set to be automatically used by the Segment Profiler node.
  • Automatically generated document cutoff values now have much less than half the documents in a given topic, and user-specified term and document cutoff values are now remembered whenever the Text Topic node is rerun.

Procedure Change

The DOCPARSE procedure has been replaced by the TGPARSE procedure. If you currently use the DOCPARSE procedure, you will need to modify your code to use the TGPARSE procedure.