Problem Note 65775: Tree-based procedures, actions, and nodes stop responding, or fail to complete
In SAS® Visual Data Mining and Machine Learning, tree-based models might fail to respond, or take an unexpectedly long time to complete. Tree-based models include forests, decision trees, gradient boosting, and so on. Models can be trained using nodes, CAS actions, or procedures.
One possible cause is having a nominal input that has a large number of levels (high cardinality). High-cardinality inputs impact tree-based models in two ways:
- Split searches can take an unacceptably long time.
- Spurious splits that are not helpful for prediction are created.
To avoid the problem, avoid using nominal inputs that have a large number of levels. Here are some common strategies:
- Check to see whether the input should be interval instead of nominal.
- Check to see whether the variable should be an ID instead of an input.
- Check to see whether levels can be collapsed in a meaningful way.
Click the Hot Fix tab in this note for a link to instructions about accessing and applying the software update.
After you apply the hot fix, a warning message is displayed when any nominal input has 500 or more levels.
WARNING: Columns with many levels often create spurious splits and take a long time in training.
WARNING: The column COLUMN_NAME contains XXX levels.
The first message is a generic warning, and the second outputs the column name and the number of levels. However, depending on the client application that you are using, the warning message might not be displayed in real time. Examples:
- In Jupyter Notebook, the SWAT does not display messages in real time. Instead, the message is displayed only after the cell is finished.
- In Model Studio, messages are not displayed in real time. However, by default, Model Studio prevents high-cardinality inputs from being used, so that you encounter this issue only when you overwrite the default settings.
- In SAS® Studio using SAS® Viya® 3.5 and later, the SAS Studio log displays notes from CAS-enabled procedures in real time. If you use PROC CAS to run actions directly, then you must use the NOQUEUE option to display notes in real time.
Operating System and Release Information
SAS System | SAS Visual Data Mining and Machine Learning | Linux for x64 | 8.1 | | Viya | |
Microsoft® Windows® for x64 | 8.3 | | Viya | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Problem Note |
Priority: | high |
Date Modified: | 2020-04-07 12:44:28 |
Date Created: | 2020-03-30 10:41:44 |