What's New in SAS Content Categorization 5.2
SAS Content Categorization Studio
New and enhanced features
in SAS Content Categorization Studio include the following:
-
SAS licensing replaces the Teragram
license.
-
Graphical reporting enables you
to view precision, recall, and document matching information.
-
Generate subcategories now uses
data from Wikipedia.
SAS Content Categorization Collaborative Server
New and enhanced features
in SAS Content Categorization Collaborative Server include the following:
-
-
Support for SAS Contextual Extraction
Studio concepts
-
Improved support for Microsoft
SQL Server
-
Improved user account management
SAS Content Categorization Server
The new features in
SAS Content Categorization Server include the following:
-
Most of the configuration work
that was recently required to upload a binary project to SAS Content
Categorization Server is now unnecessary. The creator has administrative
permissions to perform this task.
-
SAS Contextual Extraction Studio
is now supported in SAS Content Categorization Server.
-
SAS licensing replaces the Teragram
license.
SAS Contextual Extraction Studio
Overview
New and enhanced features
in SAS Contextual Extraction Studio include the following:
-
Added coreference operators facilitate
rule-writing precision.
-
XML fields can be specified for
matches.
-
Additional operators enable greater
rule matching precision.
-
Case-insensitive matching and comments
in rules are now enabled.
Coreference Operators Added
Coreference refers
to pronoun resolution. A pronoun is matched to the antecedent that
it refers to when you use these operators in your contextual extraction
concept rules:
-
Use the coreference operator (
_ref
)
to link a matched string with its canonical form.
-
Use _coref with
CLASSIFIER
definitions.
-
Use the forward (
_F
)
and the preceding (
_P
) symbols to restrict
coreference matches.
-
Assign a new concept name for a
match on a term specified by the
_ref
operator.
XML Field Specified for Matching
Limit matches to specific
XML fields when you write these fields into rules and apply them to
input XML documents.
Additional Operators for Precision
Additional operators
enable greater rule matching precision. These operators include:
-
Specify a stemming symbol to enable
SAS Contextual Extraction Studio to match all word forms, or only
all noun or verb forms.
-
Specify the paragraph symbol (
PARA
)
to enable SAS Contextual Extraction Studio to match all word forms,
or only all noun or verb forms.
-
Write a
SENT_n
operator
into a rule to specify the maximum number of sentences where a match
can occur.
-
Use a
SENTSTART_n
operator
to specify the number of words at the beginning of a sentence where
a match can occur.
-
Use a
SENTEND_n
operator
to specify the number of words at the end of a sentence where a match
can occur.
Case-Insensitive Matching and Comments
Case-insensitive matching
occurs when you select the
Case Insensitive Matching check
box in the
Data tab for a contextual extraction
concept. (By default, all matching is case sensitive.)
You can also add comments
to your rules using the pound character ( # ).
Copyright © SAS Institute Inc. All rights reserved.