Process Flow Diagram Logic

Sampling Nodes

Use the Input Data Source node to access SAS data sets and other types of data. This node reads data sources and defines their attributes for processing by Enterprise Miner. Meta information (the metadata sample) is automatically created for each variable when you import a data set with the Input Data Source node. Initial values are set for the measurement level and the model role for each variable. You can change these values if you are not satisfied with the automatic selections that the node makes. Summary statistics are displayed for interval and class variables.
Use the Sampling node to take random, stratified random samples, and to take cluster samples of data sets. Sampling is recommended for extremely large databases because it can significantly decrease model training time. If the random sample sufficiently represents the source data set, then data relationships that Enterprise Miner finds in the sample can be extrapolated upon the complete source data set.The Sampling node writes the sampled observations to an output data set and saves the seed values that are used to generate the random numbers for the samples so that you may replicate the samples.
Use the Data Partition node to partition data sets into training, test, and validation data sets. The training data set is used for preliminary model fitting. The test data set is an additional hold-out data set that you can use for model assessment. The validation data set is used to monitor and tune the model weights during estimation and is also used for model assessment. This node uses simple random sampling, stratified random sampling, or user defined partitions to create partitioned data sets.

Top of Page