Working with Nodes That Assess |
Overview |
The final step in most data mining problems is to create scoring code that you can use to score new data. For example, now that you have a good predictive model for profitable donations, you can apply that model to raw data that does not include the target variable TARGET_B. Thus you can automate the model scoring process of deciding which individuals are likely to donate.
There are several types of scoring including interactive, batch, and on-demand. Typically, interactive scoring is performed on smaller tables, while batch scoring is performed on larger tables. On-demand scoring includes single transactions that can be performed in a real-time setting. There are also different types of score code. By default, the score code that SAS Enterprise Miner creates is SAS code. It is also possible to create Java or C code for integration into environments other than SAS. Additionally, SAS Enterprise Miner can create PMML code, which is an XML representation of the model for scoring in other databases. In this topic, you will perform interactive scoring that produces SAS code.
You use the Score node to manage, edit, export, and execute scoring code that is generated from a trained model or models. The Score node generates and manages scoring formulas in the form of a single SAS DATA step, which can be used in most SAS environments even without the presence of SAS Enterprise Miner.
In this example you use the Score node to score a score data set within the process flow.
Define Data Source for Scoring |
In order to add a data set to an Enterprise Miner process flow diagram, you must define it as a data source first.
In this task, you define the DONOR_SCORE_DATA as a data source.
Right-click the Data Sources folder in the Project Panel and select Create Data Source. The Data Source wizard opens.
In the Source box, select SAS Table. Click .
In the Data Source Wizard - Select a SAS Table window, click
.In the Select a SAS Table window, double-click the DONOR library folder to expand it. Select the DONOR_SCORE_DATA table and click
.DONOR.DONOR_SCORE_DATA appears in the Table box of the Select a SAS Table window. Click .
Click
in the Table Information window.Select Basic in the Metadata Advisor Options window. Click . The Column Metadata window opens.
Redefine the metadata by setting the Role for the CONTROL_NUMBER variable to ID.
Click
. The Data Source Attributes window opens.In the Role box, select Score from the list to indicate that this data set contains Score data.
Click
. The DONOR_SCORE_DATA data source appears in the Data Sources folder in the Project panel.Add Score Data and Score Node to Diagram |
Drag the DONOR_SCORE_DATA data source from the DONOR library folder in the Project panel into the Diagram Workspace. Place it near the Model Comparison node.
Drag a Score node from the Assess tab of the node toolbar into the Diagram Workspace. Connect both the Model Comparison node and the DONOR_SCORE_DATA data source to the Score node.
Run the Score node in order to apply the SAS scoring code to the new data source.
View the Score node results when the node has finished running.
As you examine the results, notice these details:
The SAS Code window displays code that was generated by the entire process flow diagram. The SAS score code can be used outside the Enterprise Miner environment for custom applications. The results also contain C and Java translations of the score code that can be used for external deployment.
The Output window displays summary statistics for class and interval variables. You can also view lists of the score input and output variables.
Select View Scoring in order to view the SAS, C, and Java score code.
Select View Graphs Bar Chart in order to display a bar chart of the values of the target variable for classification, decision, and segment output types, if applicable, for each data set.
Select View Graphs Histogram in order to display a histogram of the values of the predicted, probability, and profit output types for each of the data sets.
Close the Results window.
Click the ellipsis button to the right of the Exported Data property in the Score node Properties panel in order to view the scored data.
Select the SCORE port table in the Exported Data - Score window.
Click
.Examine the SCORE table. Values for predicted profit, expected profit, and other variables were generated by the Score node for export.
Close the Score table. Click
to close the Exported Data window.Add a SAS Code Node |
Drag a SAS Code node from the Utility tab of the nodes toolbar into the Diagram Workspace, and connect it to the Score node.
Right-click the SAS Code node and select Rename.
Type Best Potential Donors on the Node name box.
In the SAS Code node Properties panel, click the ellipsis button to the right of the Variables property to open the Variables table. The name of the average profit variable is EM_PROFIT and the name of the decision variable is EM_DECISION.
Scroll to the right to see the Label column. You can widen the column to view its entire contents.
Close the Variables window.
In the SAS Code node Properties panel, click the ellipsis button to the right of the Code Editor property to open the Code Editor window.
The Code Editor window has three tabs: Macros, Macro Variables, and Variables. The Macros and Macro Variables tabs contain the system-defined macro variables and their values if these are already assigned and a list of macros provided by SAS.
Select the Macro Variables tab. The tab holds a list of macros that you can reference in your SAS code. Examine the Imports section of the Macro Variable list. A macro variable named EM_IMPORT_SCORE appears in this section. You can use the EM_IMPORT_SCORE macro variable to reference the score code that you import into the SAS Code node.
In the Training Code pane, enter the following code:
proc means data=&em_import_score n min mean median max; class em_decision; var em_profit; run; proc print data=&em_import_score noobs; var control_number em_profit; where em_profit gt .60; run;
Note: The PROC MEANS step calculates descriptive statistics for expected profit, and the PROC PRINT step generates a list of donors that exceed an expected profit threshold.
Click the Save All icon to save the code.
Close the Code Editor window.
Run the SAS Code node named Best Potential Donors and view the results.
Select View SAS Results Log in order to view the SAS log.
Close the Results window.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.