SAS® Visual Data Mining and Machine Learning enables you to make changes to data using a Manage Variables node or a %dmcas_metaChange macro invocation. Using DATA step code to change data is not recommended, and can cause incorrect results in an analytics store (astore). This note documents one example of what can go wrong if you attempt to change data using a DATA step.
Example:
You build a pipeline that contains a SAS Code node followed by a modeling node that generates an analytics store (astore), such as Gradient Boosting, Forest, SVM, and so on.
Data → SAS Code → Gradient Boosting
You use the SAS Code node to create a new variable, and drop the original variable using a DROP statement. You are changing the data.
filename _score "&dm_file_scorecode";
data _null_;
file _score;
put "length clage_banded $8;";
put "if clage <50 then clage_banded = '0-49';";
put "else if 50<= clage <=100 then clage_banded = '50-100';";
put "else if 101<= clage <=200 then clage_banded = '101-200';";
put "else if 201<= clage <=300 then clage_banded = '201-300';";
put "else if 301<= clage <=400 then clage_banded = '301-400';";
put "else if clage>400 then clage_banded = '400+';";
put "else clage_banded = '9999';";
Put "DROP clage;";
run;
When you run the flow, the NODEOUTPUT table from the Results of the modeling node might have incorrect results. For the above example, CLAGE_BANDED contains only one value, '0-49', whereas there should be multiple values. Consequently, the predicted-output columns contain incorrect values. There are no errors or warnings to indicate a problem.
To avoid the problem, use either one of the following two options:
- Add the %dmcas_metaChange macros in the SAS Code node instead of using a DROP statement in the DATA step. Revise the code as shown below:
filename _score "&dm_file_scorecode";
data _null_;
file _score;
put "length clage_banded $8;";
put "if clage <50 then clage_banded = '0-49';";
put "else if 50<= clage <=100 then clage_banded = '50-100';";
put "else if 101<= clage <=200 then clage_banded = '101-200';";
put "else if 201<= clage <=300 then clage_banded = '201-300';";
put "else if 301<= clage <=400 then clage_banded = '301-400';";
put "else if clage>400 then clage_banded = '400+';";
put "else clage_banded = '9999';";
*Put "DROP clage;";
run;
%dmcas_metaChange(NAME=clage_banded, ROLE=INPUT, LEVEL=NOMINAL);
%dmcas_metaChange(NAME=clage, ROLE=REJECTED);
- Add a Manage Variables node directly after the SAS Code node instead of using the DROP statement in the DATA step. Assign a “REJECTED” role to the variable that needs to be omitted from subsequent analysis.
Operating System and Release Information
SAS System | SAS Visual Data Mining and Machine Learning | Microsoft® Windows® for x64 | 8.3 | | Viya | |
Linux for x64 | 8.3 | | Viya | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.