Long
before text mining, researchers have needed to analyze text. In the
field of drug trials, the need was acute enough that coding systems
were developed to automatically identify keywords that could be analyzed to understand adverse events. The COSTART coding
system was one such attempt. COSTART terms consist of one to three
tokens: a symptom, an optional body part, and an optional subpart.
One initial task is to find what factors influence whether a reaction
becomes serious and how well these factors are captured by the COSTART
terms. One way of doing this is to use SAS Text Miner to see how well
the COSTART terms predict the seriousness of the adverse event. This
chapter explores an example of predictive modeling in SAS Text Miner.
To analyze
texts with predictive models, you will perform the following tasks:
-
Use the
COSTRING variable and the Decision Tree node to create a model.
-
Use the
SYMPTOM_TEXT variable and the Decision Tree node to create a model.
-
Compare
the models using the Model Comparison node.