Create a New Synonym Data Set

You can use the SAS Text Miner %TEXTSYN macro to create a new synonym data set. The %TEXTSYN macro evaluates all the terms, automatically identifies which terms are misspellings, and creates synonyms that map correctly spelled terms to misspelled terms.
To create a new synonym data set:
  1. Select the Utility tab and drag a SAS Code node into the diagram workspace. Connect the Text Miner — Symptom Text node to the SAS Code node. Right-click the SAS Code node, and select Rename. Type SAS Code — %TEXTSYN in the Node Name box. Click OK.
    Process flow diagram
  2. Select the arrow that connects the Text Miner — Symptom Text node to the SAS Code — %TEXTSYN node. Note the value of the Terms export Table property. You will use this value in the TERMDS= parameter in the next step.
    Note: The libref EMWS in the TERMS Table property is dependent upon the diagram number within your SAS Enterprise Miner project. If your diagram is the first one created, then the libref will be EMWS, the second diagram will be EMWS1, the third will be EMWS2, and so on.
    Property panel
  3. Select the SAS Code — %TEXTSYN node, and click the Selector Button button for the Code Editor property in the Properties panel.
  4. Enter the following code in the Code Editor:
    %textsyn( termds=emws.text2_terms
             , docds=&em_import_data
             , outds=&em_import_transaction
             , textvar=symptom_text
             , mnpardoc=8
             , mxchddoc=10
             , synds=mylib.vaerextsyns
             , dict=mylib.engdict
             , maxsped=15
             ) ;
    
    Code editor dialog box
    Note: For details on the %TEXTSYN macro, see SAS Text Miner help documentation.
  5. Click the Save button button to save the changes.
  6. Click the Run button button to run the SAS Code — %TEXTSYN node. Click Yes in the Confirmation dialog box.
  7. Click OK in the dialog box that indicates that the node has finished running.
  8. Close the Training Code — Code Node window.
  9. From the SAS Enterprise Miner window, select View Arrow Explorer. The Explorer window opens.
  10. Click Mylib, and then select Vaerextsyns.
    Note: If the Mylib library is already selected and you do not see the Vaerextsyns data set, you might need to click Show project data or refresh the Explorer window to see the Vaerextsyns data set.
  11. Double-click the Mylib.Vaerextsyns table to examine it.
    Vaerextsyns data set
    Here is a list of what the Vaerextsyns columns provide:
    • Example1 and example2 are two examples of the term in a document.
    • Term is the misspelled word.
    • Parent is a guess at the word that was meant.
    • Childndocs is the number of documents that contained that term.
    • # Documents is the number of documents that contained the parent.
    • Minsped is an indication of how close the terms are.
    • Dict indicates whether the term is a legitimate English word. Legitimate words can still be deemed misspellings, but only if they occur rarely and are very close in spelling to a frequent target term.
    For example, Observation 44 shows abdomin to be a misspelling of abdominal. Three documents contain abdomin, 77 documents contain the parent, abdomin is not a legitimate English word, and an example text that contains that misspelling is 20 mins later, upper !!abdomin!!. Note that double exclamation marks (!!) both precede and succeed the child term in the example text so you can see the term in context.
  12. Examine the Vaerextsyns table to see whether you disagree with some of the choices made. For this example, however, assume that the %TEXTSYN macro has done a good enough job detecting misspellings.
    Note: The Vaerextsyns table can be edited using any SAS table editor. You cannot edit this table in the SAS Enterprise Miner GUI. You can change a parent for any misspellings that appear incorrect or delete a row if the Term column contains a valid term.
  13. Close the Mylib.Vaerextsyns table and the Explorer window.