What’s New in SAS Contextual Extraction Studio 5.2
Overview
New and enhanced features
in SAS Contextual Extraction Studio include the following:
-
Added coreference operators facilitate
rule-writing precision.
-
XML fields can be specified for
matches.
-
Additional operators enable greater
rule matching precision.
-
Case-insensitive matching and comments
in rules are now enabled.
Coreference Operators Added
Coreference refers
to pronoun resolution. A pronoun is matched to the antecedent that
it refers to when you use these operators in your contextual extraction
concept rules:
-
Use the coreference operator (
_ref
) to link a matched string with its canonical form.
-
Use _coref with
CLASSIFIER
definitions.
-
Use the forward (
_F
) and the preceding (
_P
) symbols to restrict
coreference matches.
-
Assign a new concept name for a
match on a term specified by the
_ref
operator.
XML Field Specified for Matching
Limit matches to specific
XML fields when you write these fields into rules and apply them to
input XML documents.
Additional Operators for Precision
Additional operators
enable greater rule matching precision. These operators include:
-
Specify a stemming symbol to enable
SAS Contextual Extraction Studio to match all word forms, or only
all noun or verb forms.
-
Specify the paragraph symbol (
PARA
) to enable SAS Contextual Extraction Studio to
match all word forms, or only all noun or verb forms.
-
Write a
SENT_n
operator into a rule to specify
the maximum number of sentences where a match can occur.
-
Use a
SENTSTART_n
operator to specify the number
of words at the beginning of a sentence where a match can occur.
-
Use a
SENTEND_n
operator to specify the number
of words at the end of a sentence where a match can occur.
Case-Insensitive Matching and Comments
Case-insensitive matching
occurs when you select the
Case Insensitive Matching check box in the
Data tab for a contextual
extraction concept. (By default, all matching is case sensitive.)
You can also add comments
to your rules using the pound character ( # ).
Copyright © SAS Institute Inc. All rights reserved.