Data access and preparation
- Access to data sources (including SAS data sets, database tables, and Hadoop files) that are registered in SAS Metadata Server
- Interactive assignment of data source metadata (such as variable roles, levels, and order) or use of automated settings to share variable settings across projects
- Data segmentation for stratified modeling
- Automated data profiling and interactive variable distribution graphs for detecting data issues
- Data filtering
- Data transformations
- Data cleaning with statistical and machine learning imputation methods
Customizable supervised learning templates
Ability to interactively build custom templates that include models and the following processing steps:
- Filtering
- Principal components
- Imputation
- Transformations
- Supervised and unsupervised variable selection
- Create your own model templates
- Edit any data preparation or model parameters and save as customized template
- Ability to share model templates across projects and users
Self-service machine learning techniques
Build models that use the following techniques:
- Bayesian networks
- Decision trees
- Gradient boosting
- Neural networks
- Random forests
- Support vector machines
- Generalized linear models
- Linear regression
- Logistic regression
- Interactive visualization of model-specific results
Champion model identification
- Automatic identification for each segment by using selectable criteria
- Manual overrides of system-selected models
- Interactive comparison and assessment of models within a segment and across multiple segments
Model performance exception identification
- Reports that highlight model performance exceptions to enable easy
- Identification of underperforming models
- Detail drill-down
Model tracking and reporting
- Summary reports that contain model results, significant variables, and model settings
- Reports in PDF and RTF for easy sharing
Model retraining
- Retraining of existing model templates on new data sets
- Tracking of assessment statistics across retraining iterations
- Longitudinal model performance degradation reports
Flexible model management and deployment
- Automatic generation of SAS score code for all model templates
- Registration of models to SAS Model Manager for centralized model deployment and management (requires SAS Model Manager)
- Model deployment in database and in Hadoop using SAS Scoring Accelerator (requires SAS Scoring Accelerator)
Scalable processing for training models
- Multithreaded procedures on SAS servers to take advantage of multicore servers
- Asynchronous processes via SAS Grid Manager for workload balancing and scheduling (requires SAS Grid Manager)
- In memory by using SAS High-Performance Data Mining on database appliances -such as Oracle, Teradata, Greenplum, and SAP HANA - or on Hadoop (requires SAS High-Performance Data Mining)