Artificial neural networks are one of the predictive modeling capabilities of SAS® In-Memory Statistics. Neural networks are one of the most widely used machine learning algorithms because they are extremely powerful for modeling nonlinear relationships in high dimensional data. They are also useful when the relationship among the input variables (including interactions) is vaguely understood. Originally, they were motivated by the idea of imitating how human brain works. We are witnessing a resurgence in neural networks primarily due to improved in-memory big data computing capabilities.
Neural networks are a very general flexible predictive model that can be applied to solve a wide range of problems that include classification, dimension reduction and regression. Some of the real-world business application examples are, image processing, medical diagnosis, financial service and fraud detection. This sample shows how to use the NEURAL statement in SAS® In-Memory Statistics to build an artificial neural network model to identify spam email. The data set used in the example is the classic Spambase data set in UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/datasets/Spambase). Notice that SAS® In-Memory Statistics has the capability of directly loading data into memory from URL directly without saving to disk, which is shown in the sample. The sample also demonstrate how to perform the following tasks:
1. Pre-train several 'shallow' neural networks that start from different points to avoid creating a neural network that is ineffective due to poor initial weights.
2. Select the best neural network from the pre-trained neural networks and resume the analysis to train a much deeper neural network as the final model.
3. Score a validation data set using the final neural network model.
4. Perform model assessment using the scored results with the ASSESS statement.
5. Plot lift and ROC curves from the result of ASSESS statement.
Please note that the "pre-train and train" strategy is not necessary. However, SAS® recommends the strategy to avoid poor initial values and becoming stuck in a local minima.
The model assessment portion of the sample is general for predictive models -- subject to changing the names for the target, model event, and predicted columns.
For more details of NEURAL statement, please check SAS® LASR™ Analytic Server 2.5 Reference Guide.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
/**************************************************************************************/
/* */
/* 0.Load data into memory from URL to memory of LASR server */
/* */
/**************************************************************************************/
libname mylasr sasiola host="grid001.example.com" port=10010 tag='hps';
%let base = http://archive.ics.uci.edu/ml/machine-learning-databases;
data mylasr.spambase;
infile "&base/spambase/spambase.data" device=url dsd dlm=',';
input Make Address All _3d Our Over Remove Internet Order Mail Receive
Will People Report Addresses Free Business Email You Credit Your Font
_000 Money Hp Hpl George _650 Lab Labs Telnet _857 Data _415 _85
Technology _1999 Parts Pm Direct Cs Meeting Original Project Re Edu
Table Conference Semicol Paren Bracket Bang Dollar Pound Cap_Avg
Cap_Long Cap_Total Class;
run;
proc imstat;
/**************************************************************************************/
/* */
/* 1. Pre-train several 'shallow' neural networks starting from different points to */
/* avoid creating a neural network that is ineffective due to poor initial values.*/
/* */
/**************************************************************************************/
table mylasr.train;
where part <= .75;
NEURAL class/ seed=12345 details temptable
/*input */ input=(make--cap_total) std=std
/*target*/ targetact=softmax targetcomb=linear error=entropy nominal=class
/*hidden*/ hiddens=(10) act=(logistic) combine=(linear)
/*prelim*/ numtries=5 maxiter=10 tech=congra
/*NLOP */ maxfunc=1000000 linesearch=2 fconv=1e-4 lower=-20 upper=20;
run;
/**************************************************************************************/
/* */
/* 2. Select the best neural network from the pre-trained neural networks and resume */
/* the analysis to train a much deeper neural network as the final model. */
/* */
/**************************************************************************************/
table mylasr.train;
where part <= .75;
NEURAL class/ seed=12345 details temptable
resume LASRANN=mylasr.&_templast_
/*input */ input=(make--cap_total) std=std
/*target*/ targetact=softmax targetcomb=linear error=entropy nominal=class
/*hidden*/ hiddens=(10) act=(logistic) combine=(linear)
/*train */ tech=congra maxiter=50
/*NLOP */ maxfunc=1000000 linesearch=2 fconv=1e-4 lower=-20 upper=20;
run;
/**************************************************************************************/
/* */
/* 3. Score validation set using trained neural network model. The ASSESS option */
/* specifies to add predicted probabilities to the scored data for all the levels */
/* of the nominal target variable. In this example, two levels are created */
/* because the variable named class has two values, 0 or 1. The scored data are */
/* stored in a temporary table. */
/* */
/**************************************************************************************/
table mylasr.valid;
where part > .75;
NEURAL class / lasrANN=mylasr.&_templast_
input = (make--cap_total) nominal=class temptable assess vars = (class);
run;
/**************************************************************************************/
/* */
/* 4. Perform model assessment using the scoring result. The probabilities of all */
/* levels are in output, but we need the probabilities of the event level only. */
/* The WHERE clause is used to select the rows with event level only. The strip */
/* function is applied to remove the blanks in character variable _NN_Level_. */
/* */
/**************************************************************************************/
table mylasr.&_templast_;
where strip(_NN_Level_) eq '1';
assess _NN_P_/ y = class event = '1' nbins = 20 step = 0.05;
ods output ROCInfo=work.rocdata LiftInfo=work.liftdata;
run;
quit;
proc lasr term port=&myport;
run;
/**************************************************************************************/
/* */
/* 5. Plot lift and ROC curves from the result of ASSESS statement. */
/* */
/**************************************************************************************/
proc sgplot data=liftdata;
series x = depth y = Cumlift / markers markerattrs=(symbol=circlefilled);
series x = depth y = CumliftBest;
yaxis label=' ' grid;
run;
data endpoint;
Sensitivity=0;
Specificity=1;
run;
data rocdata;
set rocdata endpoint;
One_minus_Specificity = 1 - Specificity;
run;
proc sort data=rocdata;
by One_minus_Specificity;
run;
proc sgplot data=rocdata;
series x = one_minus_Specificity y = Sensitivity / lineattrs=(color=blue);
series x = one_minus_Specificity y = one_minus_Specificity / lineattrs=(color=black);
yaxis grid;
run;
quit;
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
The model information table The score information table The lift curve ROC curve
Type: | Sample |
Topic: | Analytics ==> Data Mining Analytics ==> Scoring Analytics ==> Statistical Graphics |
Date Modified: | 2015-03-12 15:05:35 |
Date Created: | 2015-01-28 14:14:38 |
Product Family | Product | Host | Product Release | SAS Release | ||
Starting | Ending | Starting | Ending | |||
SAS System | SAS In-Memory Statistics | Microsoft® Windows® for x64 | 2.5 | 9.4 TS1M2 | ||
Linux for x64 | 2.5 | 9.4 TS1M2 |