67609 - Posterior probabilities are not adjusted for event-based sampling when you replace the data source in a project

SUPPORT / SAMPLES & SAS NOTES

Support

Problem Note 67609: Posterior probabilities are not adjusted for event-based sampling when you replace the data source in a project

When you create a SAS^® Visual Data Mining and Machine Learning project where event-based sampling is enabled, the posterior probabilities are adjusted for the prior probabilities (priors) in the score code of a modeling node. However, when you replace the data source for the project, either by using the Model Studio interface or when you retrain the project by using the downloaded project API code, the score code is adjusted incorrectly. Consequently, the predicted probabilities that are generated by the model are incorrect.

There are no errors or warnings to indicate that the priors are not adjusted in the score code.

To determine whether the priors are used, examine the code in the Node Score Code of the node results. If priors are used, then the score code should adjust the posterior probabilities like in the example below. Note that 0.8005033557 represents the prior when the target level is 0, and 0.1994966443 represents the prior when the target level is 1.

*------------------------------------------------------------*;

* Adjust Posterior Probabilities;

*------------------------------------------------------------*;

'P_BAD0'n = 'P_BAD0'n * 0.8005033557/0.5;

'P_BAD1'n = 'P_BAD1'n * 0.1994966443/0.5;

drop _sum;

_sum = 'P_BAD0'n + 'P_BAD1'n;

'P_BAD0'n = 'P_BAD0'n/_sum;

'P_BAD1'n = 'P_BAD1'n/_sum;

drop _P_;

_P_= 0.0 ;

if 'P_BAD1'n > _P_ then do;

_P_ = 'P_BAD1'n;

'I_BAD'n = '1';

end;

if 'P_BAD0'n > _P_ then do;

_P_ = 'P_BAD0'n;

'I_BAD'n = '0';

end;

However, when the project's data source is replaced, the score code incorrectly adjusts the priors, as shown below:

*------------------------------------------------------------*;

* Adjust Posterior Probabilities;

*------------------------------------------------------------*;

'P_BAD0'n = 'P_BAD0'n * 0.5/0.5;

'P_BAD1'n = 'P_BAD1'n * 0.5/0.5;

drop _sum;

_sum = 'P_BAD0'n + 'P_BAD1'n;

'P_BAD0'n = 'P_BAD0'n/_sum;

'P_BAD1'n = 'P_BAD1'n/_sum;

drop _P_;

_P_= 0.0 ;

if 'P_BAD1'n > _P_ then do;

_P_ = 'P_BAD1'n;

'I_BAD'n = '1';

end;

if 'P_BAD0'n > _P_ then do;

_P_ = 'P_BAD0'n;

'I_BAD'n = '0';

end;

Note: This issue also affects the results when you retrain your project in SAS^®Model Manager.

There is no workaround for this problem. The only way to obtain correct results is to create a new project. To speed up the process (so you do not have to rebuild your pipeline from the start), follow these steps:

Save the current pipeline to The Exchange as a template.
Create a new pipeline by selecting the saved template.

Operating System and Release Information

Product Family	Product	System	Product Release		SAS Release
			Reported	Fixed*	Reported	Fixed*
SAS System	SAS Visual Data Mining and Machine Learning	Microsoft® Windows® for x64	8.5	2020.1.5	Viya	Viya
		Linux for x64	8.5	2020.1.5	Viya	Viya

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Problem Note
Priority:	medium

Date Modified:	2021-03-18 07:30:59
Date Created:	2021-03-16 19:33:58

Support

Problem Note 67609: Posterior probabilities are not adjusted for event-based sampling when you replace the data source in a project

Operating System and Release Information

Follow Us

What is...