Resources

What's New Table of Contents  

What's New in SAS Text Miner 3.1

Overview

SAS Text Miner 3.1 includes the following new features and enhancements:


New Supported Languages

The new supported languages are Japanese, Korean, Norwegian Bokmal, Simplified Chinese, and Traditional Chinese.


New Encoding Support

SAS Text Miner 3.1 supports encoding for the Unicode standard, UTF-8, as well as the standard encodings for Chinese, Japanese, and Korean.


New Noun Group Extraction Support

SAS Text Miner 3.1 supports noun group extraction for all supported languages: English, Danish, Dutch, Finnish, French, German, Italian, Japanese, Korean, Norwegian Bokmal, Portuguese, Simplified Chinese, Spanish, Swedish, and Traditional Chinese.


New Entity Types

SAS Text Miner 3.1 supports these new entity types: LANGUAGE (German and Spanish), PEOPLES (German and Spanish), PUBLICATION (German), TICKER (English), and VEHICLE (English).


Enhanced %TMFILTER Macro Features

New Parameters

SAS Text Miner 3.1 includes the following new parameters:

New Document Formats

The %TMFILTER macro supports new document formats such as Microsoft Outlook and Outlook Express e-mail files and files that are created with the Open Office suite.

Automatic Transcoding of Documents

Encodings are always automatically detected for each document and documents are always automatically transcoded to the session encoding. If documents cannot be  transcoded correctly, the process will remove unsupported characters so that the %TMFILTER macro can process the transcoded documents.

Additional %TMFILTER Macro Output

The %TMFILTER macro provides additional output:


UNIX Support

In SAS Text Miner 3.1, there is Solaris and AIX support, except for the %TMFILTER macro, which can be run on Windows only.


Enhanced Synonym List Processing

In SAS Text Miner 3.1, you can define synonyms with different parts of speech. The synonym data set can have variables for both the parent role and child role. Previously, only one role was provided. You can define your synonym data set with the following variables:


Enhanced Parsing

In SAS Text Miner 3.1, parsing has been enhanced as follows:


New DOCPARSE Procedure

A new parsing procedure, PROC DOCPARSE, parses text documents and organizes the terms and their frequencies into data sets. The DOCPARSE procedure is portable to multiple platforms, and it does not require XCMD.


New DOCSCORE Function

The new DOCSCORE function is called inside DATA step code. It takes a textual variable (or a reference to a document that contains text) along with information from the training run and generates a compressed term-document frequency data set called OUT.


Improved Performance

In SAS Text Miner 3.1, parsing speed is improved. Documents are processed faster than previous Text Miner releases.


Eliminated XCMD Requirement

Previous versions of SAS Text Miner have required that the parsing be done using an XCMD call on the SAS server. In SAS Text Miner 3.1, this is no longer necessary when running the Text Miner node. However, the %TMFILTER macro still uses an XCMD requirement, so it must still be issued in a SAS session that permits XCMD calls.


New Quick Find Functionality in the Terms Table of the Interactive Results Window

Quick find enables users to scroll quickly to a specific spot in a sorted column of the Terms table by typing a single character while the column is active. Quick find can be used in the Term, Freq, #Docs, Weight, Role, and Attribute columns.


Contains LinguistX ® from Inxight Software, Inc. Copyright © 1996-2006. All rights reserved. www.inxight.com.

Contains ThingFinderTM Server from Inxight Software, Inc. Copyright © 1996-2006. All rights reserved. www.inxight.com.