In SAS® Enterprise Miner™, the model-fit statistics for HP modeling nodes might be computed using both sampled data and distributed data. Dual-computation occurs when these conditions are true:
- Your flow uses an HP Partition node.
- You are using a distributed environment (grid mode).
- The data is distributed.
The HP Partition node partitions the sample and the distributed data independently of each other. As a result, the training and validation sample might not be representative of the actual distributed-data partition. The model-fit statistics might seem to be incorrect when you compare results from non-HP modeling nodes to HP modeling nodes. However, the results are different from each other due to the different partitions.
Click the Hot Fix tab in this note to access the hot fix for this issue.
After you apply the hot fix, new functionality is available. You can specify the following statement in the Enterprise Miner "project start code", or in an autoexec.sas file:
%let hpdm_partition_resample=Y;
After specifying that statement, when you add and run an HP Data Partition node, the node samples from the distributed-data partition (instead of partitioning the sample as described below). Note: because HP sampling is not deterministic, there is no guarantee that the sample is the same size as a non-HP sample.
Background
When the data is in a distributed table, a sample is created when the data source is created. If the target(s) is a class variable, then the sample is stratified. This sample is used by all the non-HP nodes. If your flow uses a Data Partition node or an HP Data Partition node, then that sample is partitioned. The non-HP modeling nodes use that (partitioned) sample for training and assessment. The non-HP nodes use the sample so that Enterprise Miner does not need to download large volumes of data to the SAS client.
The HP nodes use the entire data, not the sample. As a result, HP modeling-nodes train on the entire (partitioned) distributed data, and the assessment results are produced on that data.
Enterprise Miner enables you to compare non-HP-modeling nodes to HP-modeling-nodes. To enable that comparison, Enterprise Miner scores the sample and computes a separate set of assessments that are based on the sample. However, the partitioned sample might not closely represent the partitioned distributed-table. Some training observations in the partitioned sample might be in the validation portion of the distributed table, and vice versa. The training set is not necessarily a sample of the training observations in the distributed table (and the validation set is not necessarily a sample of the validation observations).
The difference is usually not noticeable when you are comparing different models. However, it might be noticeable when you are comparing (for the same model) the assessment results based on the sample to the assessment results based on the entire table.
Operating System and Release Information
SAS System | SAS Enterprise Miner | Microsoft® Windows® for x64 | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows 8 Enterprise x64 | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows 8 Pro x64 | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows 8.1 Enterprise 32-bit | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows 8.1 Enterprise x64 | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows 8.1 Pro 32-bit | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows 8.1 Pro x64 | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows 10 | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows Server 2008 R2 | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows Server 2008 for x64 | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows Server 2012 Datacenter | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows Server 2012 R2 Datacenter | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows Server 2012 R2 Std | 12.3 | | 9.4 TS1M0 | |
Microsoft Windows Server 2012 Std | 12.3 | | 9.4 TS1M0 | |
Windows 7 Enterprise x64 | 12.3 | | 9.4 TS1M0 | |
Windows 7 Professional x64 | 12.3 | | 9.4 TS1M0 | |
64-bit Enabled AIX | 12.3 | | 9.4 TS1M0 | |
64-bit Enabled Solaris | 12.3 | | 9.4 TS1M0 | |
HP-UX IPF | 12.3 | | 9.4 TS1M0 | |
Linux for x64 | 12.3 | | 9.4 TS1M0 | |
Solaris for x64 | 12.3 | | 9.4 TS1M0 | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.