As demonstrated in the previous chapter, SAS Text Miner
does a good job of finding themes that are clear in the data. But,
when the data needs cleaning, SAS Text Miner can be less effective
at uncovering useful themes. In this chapter, you will encounter manually
edited data that contains many misspellings and abbreviations, and
you will work on cleaning the data to get better results.
The README.TXT file
provided in the Getting Started with SAS Text Miner 12.1 zip file
contains a list of abbreviations that are commonly used in the adverse
event reports. SAS Text Miner enables you to specify a synonym list.
A VAER_ABBREV synonym list is provided for you in the Getting Started
with SAS Text Miner 12.1 zip file. So that you can create such a synonym
list, the abbreviations list from README.TXT was copied into a Microsoft
Excel file. The list was manually edited in the Microsoft Excel file
and then imported into a SAS data set. For example, CT was marked
as equivalent to computerized axial tomography.
For more information about importing data into a SAS
data set, see the following documentation resource:
http://support.sas.com/documentation/
You will perform the
following tasks to clean the text and examine the results:
-
Use a synonym data set
from the Getting Started with SAS Text Miner 12.1 zip file.
-
Create a new synonym
data set by using the
SAS Code node and the
%TEXTSYN macro. The %TEXTSYN macro will run through all the terms,
automatically identify which ones are misspellings, and create synonyms
that map correctly spelled terms to the misspelled terms.
-
Examine results using
merged synonym data sets.