What's New in SAS Content Categorization 5.2

SAS Content Categorization Studio

New and enhanced features in SAS Content Categorization Studio include the following:
  • SAS licensing replaces the Teragram license.
  • Graphical reporting enables you to view precision, recall, and document matching information.
  • Generate subcategories now uses data from Wikipedia.

SAS Content Categorization Collaborative Server

New and enhanced features in SAS Content Categorization Collaborative Server include the following:
  • Support for Oracle
  • Support for SAS Contextual Extraction Studio concepts
  • Improved support for Microsoft SQL Server
  • Improved user account management

SAS Content Categorization Server

The new features in SAS Content Categorization Server include the following:
  • Most of the configuration work that was recently required to upload a binary project to SAS Content Categorization Server is now unnecessary. The creator has administrative permissions to perform this task.
  • SAS Contextual Extraction Studio is now supported in SAS Content Categorization Server.
  • SAS licensing replaces the Teragram license.

SAS Contextual Extraction Studio

Overview

New and enhanced features in SAS Contextual Extraction Studio include the following:
  • Added coreference operators facilitate rule-writing precision.
  • XML fields can be specified for matches.
  • Additional operators enable greater rule matching precision.
  • Case-insensitive matching and comments in rules are now enabled.

Coreference Operators Added

Coreference refers to pronoun resolution. A pronoun is matched to the antecedent that it refers to when you use these operators in your contextual extraction concept rules:
  • Use the coreference operator (_ref ) to link a matched string with its canonical form.
  • Use _coref with CLASSIFIER definitions.
  • Use the forward ( _F ) and the preceding (_P ) symbols to restrict coreference matches.
  • Assign a new concept name for a match on a term specified by the _ref operator.

XML Field Specified for Matching

Limit matches to specific XML fields when you write these fields into rules and apply them to input XML documents.

Additional Operators for Precision

Additional operators enable greater rule matching precision. These operators include:
  • Specify a stemming symbol to enable SAS Contextual Extraction Studio to match all word forms, or only all noun or verb forms.
  • Specify the paragraph symbol (PARA) to enable SAS Contextual Extraction Studio to match all word forms, or only all noun or verb forms.
  • Write a SENT_n operator into a rule to specify the maximum number of sentences where a match can occur.
  • Use a SENTSTART_n operator to specify the number of words at the beginning of a sentence where a match can occur.
  • Use a SENTEND_n operator to specify the number of words at the end of a sentence where a match can occur.

Case-Insensitive Matching and Comments

Case-insensitive matching occurs when you select the Case Insensitive Matching check box in the Data tab for a contextual extraction concept. (By default, all matching is case sensitive.)
You can also add comments to your rules using the pound character ( # ).