Language Processing Concepts

Stemming

Stemming identifies the possible root form of an inflected word. For example, the word talk is the stem of the words talk, talks, talking, and talked. In this case talk is the parent, and talk, talks, talking, and talked are its children.

Tagging

Tagging disambiguates the grammatical category of a word by analyzing it in context. For example, consider the following sentence:
I like to bank at the local branch of my bank.
The first bank is tagged as a verb and the second bank is tagged as a noun. The possible speech tags that you might see are as follows:
Tag
Description
ABBR
Abbreviation
ADJ
Adjective
ADV
Adverb
AUX
Auxiliary or modal term
CONJ
Conjunction
DET
Determiner
INTERJ
Interjection
NOUN
Noun
NOUN_GROUP
Compound noun
NUM
Number or numeric expression
PART
Infinitive marker, negative participle, or possessive marker
PREF
Prefix
PREP
Preposition
PROP
Proper noun
PUNCT
Punctuation
VERB
Verb
VERBADJ
Verbal adjective

Noun Group Extraction

Noun groups provide more relevant information than simple nouns. A noun group is defined as a sequence of nouns and their modifiers. Noun group extraction uses part-of-speech tagging to identify nouns and their related words that together form a noun group. Examples of noun groups are "week-long cruises" and "Middle Eastern languages."

Entity Identification

Entity identification uses SAS linguistic technologies to classify sequences of words into predefined classes. These classes are assigned as roles for the corresponding sequences. For example, "Person," "Location," "Company," and "Measurement" are identified as classes for "George W. Bush," "Boston," "SAS Institute," "2.5 inches," respectively. The following table lists the possible entities for English.
Entity
Description
ADDRESS
Postal address or number and street name
COMPANY
Company name
CURRENCY
Currency or currency expression
INTERNET
Email address or URL
LOCATION
City, county, state, political or geographical place or region
MEASURE
Measurement or measurement expression
NOUN_GROUP
Phrases that contain multiple words
ORGANIZATION
Government, legal, or service agency
PERCENT
Percentage or percentage expression
PERSON
Person’s name
PHONE
Telephone number
PROP_MISC
Proper noun with an ambiguous classification
SSN
Social Security number
TIME
Time or time expression
TIME_PERIOD
Measure of time expressions
TITLE
Person’s title or position
VEHICLE
Motor vehicle, including color, year, make, and model