In SAS® Enterprise Miner™, the HPBNET procedure uses the Additive Smoothing technique to smooth categorical data. The formula is shown below and you can get details from the following Wikipedia page: https://en.wikipedia.org/wiki/Additive_smoothing.
Given an observation x = (x1, …, xd) from a multinomial distribution with N trials and parameter vector Ө = (θ1, …, θd), a "smoothed" version of the data gives this estimator:
Pseudo count α ( > 0) is the smoothing parameter.
PROC HPBNET performs Add-one Smoothing (α=1) for the calculation of the conditional probability. This calculation addresses the zero-frequency problem. Without an adjustment, the conditional probability is zero when there is a zero entry in the contingency table.
Click the Full Code tab to see an example that generates output to demonstrates how Add-one Smoothing is performed in PROC HPBNET. The conditional probability table network_A is created from a Naive Bayes network using PROC HPBNET. Also, a PROC FREQ statement is used to create a contingency table between the target variable Pain and the input variable Age for the treatment group A.
As shown in the table created by the FREQ procedure, there is an entry with a zero-frequency count. A table with this entry is called a zero-frequency problem, which means that the conditional probability P( Group=1 / Pain = Yes) is 0. Add-one Smoothing adjusts the probability so that the estimated probability is not zero. By setting the pseudo count smoothing parameter to α=1, you can calculate the conditional probabilities as follows:
P( Group=1 / Pain = Yes) = (0+1)/(5+1*3) = 0.12500
P( Group=2 / Pain = Yes) = (2+1)/(5+1*3) = 0.37500
P( Group=3 / Pain = Yes) = (3+1)/(5+1*3) = 0.50000
P( Group=1 / Pain = No) = (7+1)/(15+1*3) = 0.44444
P( Group=2 / Pain = No) = (7+1)/(15+1*3) = 0.44444
P( Group=3 / Pain = No) = (1+1)/(15+1*3) = 0.11111
P( Pain = Yes) = (5+1) / (20+1*2) = 0.27273
P( Pain = No) = (15+1) / (20+1*2) = 0.72727
These results are matched to those in the network_A table as shown below:
Product Family | Product | System | Product Release | SAS Release | ||
Reported | Fixed* | Reported | Fixed* | |||
SAS System | SAS Enterprise Miner | Microsoft® Windows® for x64 | 14.2 | 9.4 TS1M4 | ||
Microsoft Windows 8 Enterprise 32-bit | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows 8 Enterprise x64 | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows 8 Pro 32-bit | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows 8 Pro x64 | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows 8.1 Enterprise 32-bit | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows 8.1 Enterprise x64 | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows 8.1 Pro 32-bit | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows 8.1 Pro x64 | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows 10 | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows Server 2008 | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows Server 2008 R2 | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows Server 2008 for x64 | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows Server 2012 Datacenter | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows Server 2012 R2 Datacenter | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows Server 2012 R2 Std | 14.2 | 9.4 TS1M4 | ||||
Microsoft Windows Server 2012 Std | 14.2 | 9.4 TS1M4 | ||||
Windows 7 Enterprise 32 bit | 14.2 | 9.4 TS1M4 | ||||
Windows 7 Enterprise x64 | 14.2 | 9.4 TS1M4 | ||||
Windows 7 Home Premium 32 bit | 14.2 | 9.4 TS1M4 | ||||
Windows 7 Home Premium x64 | 14.2 | 9.4 TS1M4 | ||||
Windows 7 Professional 32 bit | 14.2 | 9.4 TS1M4 | ||||
Windows 7 Professional x64 | 14.2 | 9.4 TS1M4 | ||||
Windows 7 Ultimate 32 bit | 14.2 | 9.4 TS1M4 | ||||
Windows 7 Ultimate x64 | 14.2 | 9.4 TS1M4 | ||||
64-bit Enabled AIX | 14.2 | 9.4 TS1M4 | ||||
64-bit Enabled Solaris | 14.2 | 9.4 TS1M4 | ||||
HP-UX IPF | 14.2 | 9.4 TS1M4 | ||||
Linux for x64 | 14.2 | 9.4 TS1M4 | ||||
Solaris for x64 | 14.2 | 9.4 TS1M4 |
Data Neuralgia;
input Treatment $ Sex $ Age Duration Pain $ @@;
datalines;
P F 68 1 No B M 74 16 No P F 67 30 No
P M 66 26 Yes B F 67 28 No B F 77 16 No
A F 71 12 No B F 72 50 No B F 76 9 Yes
A M 71 17 Yes A F 63 27 No A F 69 18 Yes
B F 66 12 No A M 62 42 No P F 64 1 Yes
A F 64 17 No P M 74 4 No A F 72 25 No
P M 70 1 Yes B M 66 19 No B M 59 29 No
A F 64 30 No A M 70 28 No A M 69 1 No
B F 78 1 No P M 83 1 Yes B F 69 42 No
B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes
A M 70 12 No A F 69 12 No B F 65 14 No
B M 70 1 No B M 67 23 No A M 76 25 Yes
P M 78 12 Yes B M 77 1 Yes B F 69 24 No
P M 66 4 Yes P F 65 29 No P M 60 26 Yes
A M 78 15 Yes B M 75 21 Yes A F 67 11 No
P F 72 27 No P F 70 13 Yes A M 75 6 Yes
B F 65 7 No P F 68 27 Yes P M 68 11 Yes
P M 67 17 Yes B M 70 22 No A M 65 15 No
P F 67 1 Yes A M 67 10 No P F 72 11 Yes
A F 74 1 No B M 80 21 Yes A F 69 3 No
;
data Trt_A;
set Neuralgia;
if Treatment= 'A' ;
run;
ods listing close;
proc hpbnet data=Trt_A numbin=3 structure=Naive maxparents=1 prescreening=0 varselect=0;
target Pain;
input Age/level=INT;
output network=network_A parameter=parameter_A pred=predictions_A;
run;
ods listing;
title "data=Trt_A";
data Trt_A;
set Trt_A;
if age < 67.333333 then group= 1;
else if age <72.666667 then group =2;
else group= 3;
run;
proc freq data=Trt_A;
table group*pain / norow nocol nopercent;
run;
title "data=network_A";
proc print data=network_A(where=( _TYPE_ eq 'PROBABILITY')) noobs;
var _CHILDCOND_ _PARENTCOND_ _CHILDCONDID_ _VALUE_;
run;
Type: | Usage Note |
Priority: |
Date Modified: | 2017-08-17 09:38:54 |
Date Created: | 2017-08-14 14:52:54 |