SAS Text Miner
is a plug-in for the SAS Enterprise Miner environment. SAS Enterprise
Miner provides a rich set of data mining tools that facilitate the
prediction aspect of text mining. The integration of SAS Text Miner
within SAS Enterprise Miner combines textual data with traditional
data mining variables. Text mining nodes can be embedded into a SAS
Enterprise Miner process flow diagram. SAS Text Miner supports various
sources of textual data: local text files, text as observations in
SAS data sets or external databases, and files on the Web.
The Text
Miner node encompasses the parsing and exploration aspects of text
mining and prepares data for predictive mining and further exploration
using other SAS Enterprise Miner nodes. The Text Miner node enables
you to analyze structured text information, and combine the structured
output of a Text Miner node with other structured data as desired.
The Text Miner node is highly customizable and enables you to choose
among a variety of parsing options. It is possible to parse documents
for detailed information about the terms, phrases, and other entities
in the collection. You can also cluster documents into meaningful
groups and report concepts that you discover in the clusters. You
can use the Text Miner node in an environment that enables you to
interact with the collection. Sorting, searching, filtering (subsetting),
and finding similar terms or documents all enhance the exploration
process.
Also
available are the Text Parsing, Text Filter, and Text Topic nodes.
Each of these nodes performs a specific task of the text mining process.
The Text Parsing node performs the same parsing operations as the
Text Miner node and can be configured in much the same way. The Text
Filter node enables you to remove terms that are deemed to have low
information value or occur in too few documents to be relevant. The
Text Topic node creates a set of topics based on the most highly correlated
terms in the document collection. This is similar to the process of
clustering the document collection that is done in the Text Miner
node.
The
Text Miner and Text Parsing nodes' extensive parsing capabilities
include the following:
-
-
automatic recognition of multi-word
terms
-
normalization of various entities
such as dates, currencies, percentages, and years
-
-
extraction of entities such as
organizations, products, Social Security numbers, time, titles, and
more
-
-
language-specific analysis for
Arabic, Chinese, Dutch, English, French, German, Italian, Japanese,
Korean Polish, Portuguese, Spanish, and Swedish
SAS Text
Miner also enables you to use a SAS macro that is called %TMFILTER.
This macro accomplishes a text preprocessing step and enables SAS
data sets to be created from documents that reside in your file system
or on Web pages. These documents can exist in a number of proprietary
formats.
SAS Text
Miner is a very flexible tool that can solve a variety of problems.
Here are some examples of tasks that can be accomplished using SAS
Text Miner:
-
-
grouping documents by topic into
predefined categories
-
-
clustering analysis of research
papers in a database
-
clustering analysis of survey data
-
clustering analysis of customer
complaints and comments
-
predicting stock market prices
from business news announcements
-
predicting customer satisfaction
from customer comments
-
predicting costs, based on call
center logs