SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 60922: PROC HPBNET computes conditional probability using the Additive Smoothing technique to smooth categorical data

DetailsCodeAboutRate It

In SAS® Enterprise Miner™, the HPBNET procedure uses the Additive Smoothing technique to smooth categorical data. The formula is shown below and you can get details from the following Wikipedia page: https://en.wikipedia.org/wiki/Additive_smoothing.  

Given an observation x = (x1, …, xd) from a multinomial distribution with N trials and parameter vector Ө = (θ1, …, θd), a "smoothed" version of the data gives this estimator:

 

Pseudo count  α  ( > 0) is the smoothing parameter. 

 

PROC HPBNET performs Add-one Smoothing (α=1) for the calculation of the conditional probability.  This calculation addresses the zero-frequency problem. Without an adjustment, the conditional probability is zero when there is a zero entry in the contingency table.

Click the Full Code tab to see an example that generates output to demonstrates how Add-one Smoothing is performed in PROC HPBNET.  The conditional probability table network_A is created from a Naive Bayes network using PROC HPBNET.  Also, a PROC FREQ statement is used to create a contingency table between the target variable Pain and the input variable Age for the treatment group A.

 

 

As shown in the table created by the FREQ procedure, there is an entry with a zero-frequency count.  A table with this entry is called a zero-frequency problem, which means that the conditional probability P( Group=1 / Pain = Yes)  is 0.  Add-one Smoothing adjusts the probability so that the estimated probability is not zero.  By setting the pseudo count smoothing parameter to α=1, you can calculate the conditional probabilities as follows:

          P( Group=1 / Pain = Yes) = (0+1)/(5+1*3) = 0.12500

          P( Group=2 / Pain = Yes) = (2+1)/(5+1*3) = 0.37500

          P( Group=3 / Pain = Yes) = (3+1)/(5+1*3) = 0.50000

          P( Group=1 / Pain = No) = (7+1)/(15+1*3) = 0.44444

          P( Group=2 / Pain = No) = (7+1)/(15+1*3) = 0.44444

          P( Group=3 / Pain = No) = (1+1)/(15+1*3) = 0.11111

          P( Pain = Yes) = (5+1) /  (20+1*2) = 0.27273

          P( Pain = No) = (15+1) / (20+1*2) = 0.72727

 

These results are matched to those in the network_A table as shown below:

 

 

 



Operating System and Release Information

Product FamilyProductSystemProduct ReleaseSAS Release
ReportedFixed*ReportedFixed*
SAS SystemSAS Enterprise MinerMicrosoft® Windows® for x6414.29.4 TS1M4
Microsoft Windows 8 Enterprise 32-bit14.29.4 TS1M4
Microsoft Windows 8 Enterprise x6414.29.4 TS1M4
Microsoft Windows 8 Pro 32-bit14.29.4 TS1M4
Microsoft Windows 8 Pro x6414.29.4 TS1M4
Microsoft Windows 8.1 Enterprise 32-bit14.29.4 TS1M4
Microsoft Windows 8.1 Enterprise x6414.29.4 TS1M4
Microsoft Windows 8.1 Pro 32-bit14.29.4 TS1M4
Microsoft Windows 8.1 Pro x6414.29.4 TS1M4
Microsoft Windows 1014.29.4 TS1M4
Microsoft Windows Server 200814.29.4 TS1M4
Microsoft Windows Server 2008 R214.29.4 TS1M4
Microsoft Windows Server 2008 for x6414.29.4 TS1M4
Microsoft Windows Server 2012 Datacenter14.29.4 TS1M4
Microsoft Windows Server 2012 R2 Datacenter14.29.4 TS1M4
Microsoft Windows Server 2012 R2 Std14.29.4 TS1M4
Microsoft Windows Server 2012 Std14.29.4 TS1M4
Windows 7 Enterprise 32 bit14.29.4 TS1M4
Windows 7 Enterprise x6414.29.4 TS1M4
Windows 7 Home Premium 32 bit14.29.4 TS1M4
Windows 7 Home Premium x6414.29.4 TS1M4
Windows 7 Professional 32 bit14.29.4 TS1M4
Windows 7 Professional x6414.29.4 TS1M4
Windows 7 Ultimate 32 bit14.29.4 TS1M4
Windows 7 Ultimate x6414.29.4 TS1M4
64-bit Enabled AIX14.29.4 TS1M4
64-bit Enabled Solaris14.29.4 TS1M4
HP-UX IPF14.29.4 TS1M4
Linux for x6414.29.4 TS1M4
Solaris for x6414.29.4 TS1M4
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.