Process Flow Diagram Logic |
Use the Data Set Attributes node to modify data set attributes, such as data set names, descriptions, and roles. You can also use the Data Set Attributes node to modify the metadata sample that is associated with a data set, and to specify target profiles for a target. An example of a useful Data Set Attributes application is to generate a data set in the SAS Code node and then modify its metadata sample with this node.
Use the Transform Variables node to transform variables. For example, you can transform variables by taking the square root of a variable or by taking its natural logarithm. Additionally, the Transform Variables node supports user-defined formulas for transformations and provides a visual interface for grouping interval-valued variables into buckets or quantiles. Transforming variables to similar scale and variability may improve the fit of models, and subsequently, the classification and prediction precision of fitted models.
Use the Filter Outliers node to identify and remove outliers or "noise" from data sets. Checking for outliers is recommended as outliers may greatly affect modeling results and, subsequently, the classification and prediction precision of fitted models.
Use the Replacement node to impute (fill in) values for observations that have missing values. You can replace missing values for interval variables with the mean, median, midrange, or mid-minimum spacing, or with a distribution-based replacement. Alternatively, you can use a replacement M-estimator such as Tukey's biweight, Huber's, or Andrew's Wave. You can also estimate the replacement values for each interval input by using a tree-based imputation method. Missing values for class variables can be replaced with the most frequently occurring value, distribution-based replacement, tree-based imputation, or a constant.
Use the Clustering node to segment your data so that you can identify data observations that are similar in some way. When displayed in a plot, observations that are similar tend to be in the same cluster, and observations that are different tend to be in different clusters. The cluster identifier for each observation can be passed to other nodes for use as an input, ID, or target variable. It can also be passed as a group variable that enables you to automatically construct separate models for each group.
Use the SOM/Kohonen node to generate self-organizing maps, Kohonen networks, and vector quantization networks. The SOM/Kohonen node performs unsupervised learning in which it attempts to learn the structure of the data. As with the Clustering node, after the network maps have been created, the characteristics can be examined graphically by using the SOM/Kohonen Results Browser. The SOM/Kohonen node provides the analysis results in the form of an interactive map that illustrates the characteristics of the clusters. Furthermore, the SOM/Kohonen node Results Browser provides a report that indicates the importance of each variable.
The Time Series experimental node converts transactional data to time series data, and performs seasonal and trend analysis on an interval target variable.
Use the Interactive Grouping node to interactively group variable values into classes. The Interactive Grouping functionality is required to create a specific type of predictive model that is called a score card. Statistical and plotted information can be interactively rearranged as you explore various variable groupings. Score card models are frequently used for application and behavioral scoring in the credit scoring industry.
Copyright © 2006 by SAS Institute Inc., Cary, NC, USA. All rights reserved.