Create a New Synonym Data Set

You can use the SAS Text Miner %TEXTSYN macro to create a new synonym data set. The %TEXTSYN macro evaluates all the terms, automatically identifies which terms are misspellings, and creates synonyms that map correctly spelled terms to misspelled terms.
To create a new synonym data set:
  1. Select the Utility tab on the node toolbar and drag a SAS Code node into the diagram workspace.
  2. Right-click the SAS Code node, and select Rename.
  3. Enter SAS Code — %TEXTSYN in the Node Name field, and then click OK.
  4. Connect the Text Filter — Symptom Text node to the SAS Code — %TEXTSYN node.
    Process flow diagram
  5. Select the SAS Code — %TEXTSYN node, and then click the Ellipses icon for the Code Editor property in the Properties Panel.
    The Code Editor window appears.
  6. Enter the following code in the Code Editor:
    %textsyn( termds=<libref>.<nodeID>_terms
             , docds=&em_import_data
             , outds=&em_import_transaction
             , textvar=symptom_text
             , mnpardoc=8
             , mxchddoc=10
             , synds=mylib.vaerextsyns
             , dict=mylib.engdict
             , maxsped=15
             ) ;
    
    Note: You will need to replace <libref> and <nodeID> in the first line in the above code with the correct library name and node ID. To determine what these values are, close the Code Editor window, and then select the arrow that connects the Text Filter — Symptom Text node to the SAS Code — %TEXTSYN node. The value for <libref> will be the first part of the table name that appears in the Properties panel, such as emws, emws2, and so on. The node ID will appear after the value for <libref>, and will be TextFilter, TextFilter2, and so on. After you determine the value for <libref> and <nodeID>, a possible first line might be termds=emws2.textfilter2_terms. Your libref and node ID values could differ depending on how many Text Filter nodes and diagrams have been created in your workspace.
    For details about the %TEXTSYN macro, see SAS Text Miner Help documentation.
  7. After you have added the %TEXTSYN macro code to the Code Editor window, and modified it to add values for <libref> and <nodeID>, click the Save icon to save the changes.
  8. Click the Run Node icon to run the SAS Code — %TEXTSYN node.
  9. Click Yes in the Confirmation dialog box.
  10. Click OK in the dialog box that indicates that the node has finished running.
  11. Close the Code Editor window.
  12. Select Viewthen select Explorer from the main menu.
    The Explorer window appears.
  13. Click Mylib in the SAS Libraries tree, and then select Vaerextsyns.
    Note: If the Mylib library is already selected and you do not see the Vaerextsyns data set, you might need to click Show Project Data or refresh the Explorer window to see the Vaerextsyns data set.
  14. Double-click Vaerextsyns to see its contents.
    Vaerextsyns data set
    Here is a list of what the Vaerextsyns columns provide:
    • Term is the misspelled word.
    • parent is a guess at the word that was meant.
    • example1 and example2 are two examples of the term in a document.
    • childndocs is the number of documents that contained that term.
    • numdocs is the number of documents that contained the parent.
    • minsped is an indication of how close the terms are.
    • dict indicates whether the term is a legitimate English word. Legitimate words can still be deemed misspellings, but only if they occur rarely and are very close in spelling to a frequent target term.
    For example, Observation 117 shows antibotics to be a misspelling of antibiotics. Four documents contain antibotics, and 745 documents contain the parent. Note that double exclamation marks (!!) both precede and follow the child term in the example text so that you can see the term in context.
  15. Examine the Vaerextsyns table to see whether you disagree with some of the choices made. For this example, however, assume that the %TEXTSYN macro has done a good enough job of detecting misspellings.
    Note: The Vaerextsyns table can be edited using any SAS table editor. You cannot edit this table in the SAS Enterprise Miner GUI. You can change a parent for any misspellings that appear incorrect or delete a row if the Term column contains a valid term.
  16. Close the Mylib.Vaerextsyns table and the Explorer window.