SUPPORT / SAMPLES & SAS NOTES
 

Support

Problem Note 67609: Posterior probabilities are not adjusted for event-based sampling when you replace the data source in a project

DetailsHotfixAboutRate It

When you create a SAS® Visual Data Mining and Machine Learning project where event-based sampling is enabled, the posterior probabilities are adjusted for the prior probabilities (priors) in the score code of a modeling node. However, when you replace the data source for the project, either by using the Model Studio interface or when you retrain the project by using the downloaded project API code, the score code is adjusted incorrectly. Consequently, the predicted probabilities that are generated by the model are incorrect.

There are no errors or warnings to indicate that the priors are not adjusted in the score code.

To determine whether the priors are used, examine the code in the Node Score Code of the node results. If priors are used, then the score code should adjust the posterior probabilities like in the example below. Note that 0.8005033557 represents the prior when the target level is 0and 0.1994966443 represents the prior when the target level is 1.

*------------------------------------------------------------*;
* Adjust Posterior Probabilities;
*------------------------------------------------------------*;
'P_BAD0'n = 'P_BAD0'n * 0.8005033557/0.5;
'P_BAD1'n = 'P_BAD1'n * 0.1994966443/0.5;
drop _sum;
_sum = 'P_BAD0'n + 'P_BAD1'n;
'P_BAD0'n = 'P_BAD0'n/_sum;
'P_BAD1'n = 'P_BAD1'n/_sum;
drop _P_;
_P_= 0.0 ;
if 'P_BAD1'n > _P_ then do;
_P_ = 'P_BAD1'n;
'I_BAD'n = '1';
end;
if 'P_BAD0'n > _P_ then do;
_P_ = 'P_BAD0'n;
'I_BAD'n = '0';
end;  

However, when the project's data source is replaced, the score code incorrectly adjusts the priors, as shown below:

*------------------------------------------------------------*;
* Adjust Posterior Probabilities;
*------------------------------------------------------------*;
'P_BAD0'n = 'P_BAD0'n * 0.5/0.5;
'P_BAD1'n = 'P_BAD1'n * 0.5/0.5;
drop _sum;
_sum = 'P_BAD0'n + 'P_BAD1'n;
'P_BAD0'n = 'P_BAD0'n/_sum;
'P_BAD1'n = 'P_BAD1'n/_sum;
drop _P_;
_P_= 0.0 ;
if 'P_BAD1'n > _P_ then do;
_P_ = 'P_BAD1'n;
'I_BAD'n = '1';
end;
if 'P_BAD0'n > _P_ then do;
_P_ = 'P_BAD0'n;
'I_BAD'n = '0';
end;

Note: This issue also affects the results when you retrain your project in SAS® Model Manager.

There is no workaround for this problem. The only way to obtain correct results is to create a new project. To speed up the process (so you do not have to rebuild your pipeline from the start), follow these steps:

  1. Save the current pipeline to The Exchange as a template.
  2. Create a new pipeline by selecting the saved template.

 



Operating System and Release Information

Product FamilyProductSystemProduct ReleaseSAS Release
ReportedFixed*ReportedFixed*
SAS SystemSAS Visual Data Mining and Machine LearningMicrosoft® Windows® for x648.52020.1.5ViyaViya
Linux for x648.52020.1.5ViyaViya
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.