What Is SAS Text Miner?

SAS Text Miner is a plug-in for the SAS Enterprise Miner environment. SAS Enterprise Miner provides a rich set of data mining tools that facilitate the prediction aspect of text mining. The integration of SAS Text Miner within SAS Enterprise Miner combines textual data with traditional data mining variables. Text mining nodes can be embedded into a SAS Enterprise Miner process flow diagram. SAS Text Miner supports various sources of textual data: local text files, text as observations in SAS data sets or external databases, and files on the Web.
The Text Miner node encompasses the parsing and exploration aspects of text mining and prepares data for predictive mining and further exploration using other SAS Enterprise Miner nodes. The Text Miner node enables you to analyze structured text information, and combine the structured output of a Text Miner node with other structured data as desired. The Text Miner node is highly customizable and enables you to choose among a variety of parsing options. It is possible to parse documents for detailed information about the terms, phrases, and other entities in the collection. You can also cluster documents into meaningful groups and report concepts that you discover in the clusters. You can use the Text Miner node in an environment that enables you to interact with the collection. Sorting, searching, filtering (subsetting), and finding similar terms or documents all enhance the exploration process.
Also available are the Text Parsing, Text Filter, and Text Topic nodes. Each of these nodes performs a specific task of the text mining process. The Text Parsing node performs the same parsing operations as the Text Miner node and can be configured in much the same way. The Text Filter node enables you to remove terms that are deemed to have low information value or occur in too few documents to be relevant. The Text Topic node creates a set of topics based on the most highly correlated terms in the document collection. This is similar to the process of clustering the document collection that is done in the Text Miner node.
The Text Miner and Text Parsing nodes' extensive parsing capabilities include the following:
  • stemming
  • automatic recognition of multi-word terms
  • normalization of various entities such as dates, currencies, percentages, and years
  • part-of-speech tagging
  • extraction of entities such as organizations, products, Social Security numbers, time, titles, and more
  • support for synonyms
  • language-specific analysis for Arabic, Chinese, Dutch, English, French, German, Italian, Japanese, Korean Polish, Portuguese, Spanish, and Swedish
SAS Text Miner also enables you to use a SAS macro that is called %TMFILTER. This macro accomplishes a text preprocessing step and enables SAS data sets to be created from documents that reside in your file system or on Web pages. These documents can exist in a number of proprietary formats.
SAS Text Miner is a very flexible tool that can solve a variety of problems. Here are some examples of tasks that can be accomplished using SAS Text Miner:
  • filtering e-mail
  • grouping documents by topic into predefined categories
  • routing news items
  • clustering analysis of research papers in a database
  • clustering analysis of survey data
  • clustering analysis of customer complaints and comments
  • predicting stock market prices from business news announcements
  • predicting customer satisfaction from customer comments
  • predicting costs, based on call center logs