About the Tasks That You Will Perform

As demonstrated in the previous chapter, SAS Text Miner does a good job of finding themes that are clear in the data. But, when the data needs cleaning, SAS Text Miner can be less effective at uncovering useful themes. In this chapter, you will encounter manually edited data that contains many misspellings and abbreviations, and you will work on cleaning the data to get better results.
The README.TXT file provided in the Getting Started with SAS Text Miner 12.1 zip file contains a list of abbreviations that are commonly used in the adverse event reports. SAS Text Miner enables you to specify a synonym list. A VAER_ABBREV synonym list is provided for you in the Getting Started with SAS Text Miner 12.1 zip file. So that you can create such a synonym list, the abbreviations list from README.TXT was copied into a Microsoft Excel file. The list was manually edited in the Microsoft Excel file and then imported into a SAS data set. For example, CT was marked as equivalent to computerized axial tomography.
For more information about importing data into a SAS data set, see the following documentation resource:
http://support.sas.com/documentation/
You will perform the following tasks to clean the text and examine the results:
  1. Use a synonym data set from the Getting Started with SAS Text Miner 12.1 zip file.
  2. Create a new synonym data set by using the SAS Code node and the %TEXTSYN macro. The %TEXTSYN macro will run through all the terms, automatically identify which ones are misspellings, and create synonyms that map correctly spelled terms to the misspelled terms.
  3. Examine results using merged synonym data sets.