Use the COSTRING Variable to Model

To use the COSTRING variable to create a model:
  1. Select the Explore tab on the toolbar and drag and drop a Text Miner node into the diagram workspace. Connect the Data Partition node to the Text Miner node.
  2. Right-click the new Text Miner node and select Rename. Type Text Miner — COSTART in the Node Name box, and click OK.
    Process flow diagram
  3. Select the VAEREXT node in the diagram workspace. Click the Selector Button button for the Variables property in the Properties Panel for the VAEREXT node.
    Recall that there were two text variables, COSTRING and SYMPTOM_TEXT, from the initial data source. By default, SAS Text Miner will use the longer text variable, SYMPTOM_TEXT. In this chapter, you want to mine the COSTRING variable.
    Click OK to close the Variables window.
  4. Select the Text Miner — COSTART node. Set the following properties in the Properties Panel for the Text Miner — COSTART node:
    • Click the Selector Button button for the Variables property. In the Variables window, set the Use value for the SYMPTOM_TEXT variable to No, the Use value for the costring variable to Yes, and the Use value for serious variable to Yes. Click OK to save your changes.
      Variables window
    • Click the Selector Button button to the right of the Stop List property. Select the No data set to be specified check box in the Select a SAS Table dialog box. This removes the entry for the stop list so that no stop list is used. Click OK.
    • Set Different Parts of Speech to No.
  5. Right-click the Text Miner — COSTART node, and select Run. Click Yes in the Confirmation dialog box. Click OK in the Run Status dialog box when the node has finished running.
  6. In the Properties Panel, make sure that the Parse Variable property of the Text Miner — COSTART Terms node is set to costring.
  7. Click the Selector Button button for the Interactive property to open the Interactive Results window. One problem with COSTART is that it does not always use the same keyword to describe the same term or equivalent terms. For example, abdomen is shown in COSTART as ab and as abdo. Sometimes there are modifiers that you do not need. You could run the %TEXTSYN macro, but because these are abbreviations, the macro will probably not find all of the correct spellings. You need to manually clean some terms.
  8. Sort the terms in the Terms window by clicking on the Term column heading. Select ab and abdo from the TERM column. Right-click and select Treat as Equivalent Terms.
    Terms window
    Select abdo from the Create Equivalent Terms dialog box. Click OK.
    Look through the data set and create synonyms by holding the CTRL or Shift keys and clicking the terms that you consider to be the same. Then, right-click on these selected terms and select Treat as Equivalent Terms.
  9. Repeat this process as many times as you need. It might be helpful to filter the terms so that you can view the full text of COSTART before combining terms.
  10. Select File Arrow Save Synonyms from the Interactive Results window menu. Select Mylib in the drop-down menu for the library field, and type COSTARTSYNS in the Data Set Name field. Click OK.
  11. Close the Text Miner — Interactive window.
  12. Note that the Synonyms property in the Properties Panel has been set to the new MYLIB.COSTARTSYNS synonym data set.
  13. COSTART terms should represent keywords, so you want to create variables for each keyword. Set the following Transform properties in the Properties Panel:
    • Set Compute SVD to No.
    • Set Term Weight to Mutual Information.
    • Set Roll up Terms to Yes.
    • Set No. of Rolled-up terms to 400.
    • Set Drop Other Terms to Yes.
  14. Right-click the Text Miner — COSTART node, and select Run. Click Yes in the Confirmation dialog box. Click OK in the Run Status dialog box when the node has finished running.
  15. Click the Selector Button button for the Interactive property to open the Text Miner — Interactive window and view the Terms window.
  16. Sort the TERM column until the arrow on the column heading is pointing up.
    Note: Terms with a plus (+) sign indicate the synonyms you have specified. Click the plus (+) sign to expand the child terms underneath the respective parent term.
    Terms window
  17. Scroll down until you see terms that do not have a checkmark beneath the Keep column. A separate variable will not be created for these terms. They were not considered significant enough (based on rolling up only 400 variables) to create a separate variable. Recall that you set the Roll up Terms property to Yes and the No. of Rolled-up Terms property to 400. When you roll up terms, the terms are sorted in descending order of the value of the term weight times the square root of the number of documents. The top 400 highest-ranked terms are then used as variables in the document collection.
  18. Close the Text Miner — Interactive window.
  19. From the Model tab, drag and drop a Decision Tree node into the diagram workspace. Connect the Text Miner — COSTART node to the Decision Tree node. Right-click the Decision Tree node, and select Rename. Type Decision Tree — CT, where CT stands for COSTART Terms. Click OK.
    Process flow diagram
  20. Right-click the Decision Tree — CT node and select Run. Click Yes in the Confirmation dialog box. Recall that when you created the VAEREXT data set, you set serious as the target variable.
  21. Click Results in the Run Status dialog box after the node has finished running.
  22. Select View Arrow Assessment Arrow Classification Chart: serious from the Results window menu to view the Classification Chart.
    Note: Blue indicates correct classification and red indicates incorrect classification.
  23. Close the Results window.