High-Cardinality Constraints

Introduction

High-cardinality data has one or more columns that contain a very large number of unique values. For example, user names, email addresses, and bank account numbers can be high-cardinality data items.
SAS Visual Analytics supports billions of values that are aggregated to thousands of values. If the billions of values in a table have millions of unique identifiers, then a column that contains those identifiers is a high-cardinality data item.
To help ensure that users get meaningful results in a timely fashion, the number of unique values that can be returned for certain visualizations and report objects is constrained. When a user selects a high-cardinality data item, the outcome is determined by any applicable thresholds, the number of unique values in the data, and the user’s selections.
The following topics provide information about two distinct levels of thresholds: client-side thresholds and middle-tier thresholds.

Client-Side Thresholds for High-Cardinality Data

Client-side thresholds are specific to an individual application (such as the explorer), or to a group of applications (such as the designer and the viewer). For some requests that exceed a client-side threshold, an error is displayed, and no results are returned. For some requests that exceed a client-side threshold, but do not exceed a middle-tier threshold, adapted results are returned.
Note: In general, client-side thresholds are fixed. An exception is that a user can select a low, medium, or high threshold level as a user preference in the explorer. On a computer that has low memory availability, setting the client-side threshold to Low can help prevent events such as system crashes.
Client-side thresholds for visualizations and report objects are documented in the Data Limits appendix in the SAS Visual Analytics: User’s Guide. The appendix explains the adapted responses that clients provide for certain requests that exceed a client-side threshold (but do not exceed a middle-tier threshold).

Middle-Tier Thresholds for High-Cardinality Data

Middle-tier thresholds have a wider scope, affecting all instances of the specified visualization or report object. Compared to client-side thresholds, middle-tier thresholds are less granular and less restrictive. For requests that exceed a middle-tier threshold, an error message is displayed, and no results are returned. The default thresholds work in almost all environments. In general, users filter or group any high-cardinality data items, so requests rarely exceed a middle-tier threshold.
In the following table, the second column indicates the maximum number of unique values (not the maximum volume of data).
Middle-Tier Thresholds
Visualization or Report Object
Rows
Decision trees1
10,000
Crosstabs
50,000
Tables (in the designer and the viewers)
50,000
Box plots: at least one measure, no categories2
50,000
Bar charts: single category
50,000
Heat maps: single category
50,000
Line charts: at least one measure, single category (numeric, date, time, or string)
50,000
Bubble plots: three measures, grouped
50,000
Bubble plots: three measures, grouped with animation category
50,000
Bubble plots: three measures, not grouped, horizontal or vertical series (or both)
50,000
Bubble plots: three measures, no categories
100,000
Scatter plots
100,000
Tables (in the explorer)
100,000
1There is also a time-out period for decision tree calls. See vae.DecisionTreeTimeout.
2If there is no category, one box is applied for each measure, up to 400 measures.

Configuration Properties for High-Cardinality Thresholds

CAUTION:
Increasing a middle-tier threshold can affect performance and stability.
The default settings are appropriate in most environments. Do not set excessively high thresholds. If you have questions about adjusting the following properties, contact SAS Technical Support.
Note: For instructions, see How to Set Configuration Properties.
The following properties affect middle-tier thresholds:
va.DistinctCountServerLimit
sets the distinct count limit for graphs. By default, there is no distinct count limit for graphs. The default is -1.
Scope: Entire suite
va.DistinctCountDataPanelLimit
sets the distinct count limit for data that is displayed in a data panel. This property affects only the data panel, not the distinct count limits within graphs. The default is 5,000.
Scope: Entire suite
va.CardinalityLimitForGroupByTempTable
for all high cardinality rank requests that exceed the specified limit (number of unique values), prevents processing and returns an error. Set this property only in the unusual circumstance in which high cardinality ranks cause the SAS LASR Analytic Server to hang. For example, to block rank requests against data that contains more than 2 million unique values, set this property to 2000000. If you choose to set this property, the suggested value is 3000000.
Scope: Entire suite
va.CardinalityLimitForGroupByCountDistinctTempTable
for only distinct count high cardinality rank requests that exceed the specified limit (number of unique values), prevents processing and returns an error. Set this property only in the unusual circumstance in which distinct count high cardinality ranks cause the SAS LASR Analytic Server to hang. (This property affects only distinct count requests, providing a narrower constraint than the va.CardinalityLimitForGroupByTempTable property.) If you choose to set this property, the suggested value is 1000000.
Scope: Entire suite
va.SortResultServerLimit
sets the maximum number of values that can be returned for detail queries that are run with sorting. This property affects only results in list tables for which details are turned on.
Scope: Entire suite, except for the explorer
va.CategoryCardinalityServerLimit
sets the maximum number of values for category crossings. Only a fixed (and finite) number of category crossings are supported. For example, if you drag and drop "First name" and "Last name" onto the population of the United States, the server might generate 200 million different values. This property determines how high the cardinality can be and still allow the server to process and return results to the client. If the number of values for category crossings exceeds this limit, the query is not run.
Scope: Entire suite, except for the explorer
va.SummaryServerRowLimit
sets the maximum number of values that can be returned to the middle tier for further processing. For example, for high-cardinality data that is sorted by first name, the number of values computed could be very large.
Scope: Entire suite, except for the explorer (which uses vae.SummaryServerLimit)
va.MidtierCellLimit
sets the maximum size of a crosstab.
Scope: Entire suite, except for the explorer
va.maxPeriodCalculations
specifies the maximum number of calculated columns that are constructed for period calculations. If this limit is exceeded for a particular period measure, excess calculations are excluded, and existing calculations (for that particular period measure) are replaced with missing values. The user is prompted to apply a filter to reduce the number of calculations. The default is 800.
Note: Software optimizations reduce the number of calculations before this limit is applied, so this limit is rarely exceeded. An example of the effect of this property is a distinct count calculation with cumulative periods (the number of unique date values that are visible cannot exceed the specified limit).
Scope: designer, viewer, transport service
va.MaxSparkTables
sets the maximum number of spark tables. The default is 300.
Scope: Entire suite, except for the explorer
va.CheckCardinalityBeforeQuery
controls whether cardinality pre-checks occur. The default value is -1 (which disables this constraint). By default, pre-checks do not occur.
Scope: Entire suite, except for the explorer
va.CheckCardinalityWithinQuery
controls whether SAS LASR Analytic Server enforces cardinality limits. By default, these checks do occur.
Scope: Entire suite, except for the explorer
vae.BoxPlotServerLimit
sets middle-tier thresholds for box plots that have at least one measure and no more than one category.
Scope: Explorer only
vae.DecisionTreeServerLimit
sets the middle-tier threshold for decision trees.
Scope: Explorer only
vae.FetchRowsServerLimit
sets middle-tier thresholds for tables.
Scope: Explorer only
vae.FrequencyServerLimit
sets middle-tier thresholds for bar charts that have a single category. This constraint is applied before a selection list of values is displayed.
Scope: Explorer only
vae.modeling.ClassCardinalityLimit
sets the maximum number of distinct levels in a model. This property limits the cumulative total of classification effects and interaction terms in a model. For example, if you set this property to 800, a user can neither specify an effect variable that contains more than 800 distinct levels nor add an effect variable that would cause the total number of distinct levels to exceed 800. The initial value is 2048.
Scope: SAS Visual Statistics add-on (if licensed)
vae.modeling.DecisionTreePredictorBinsCardinalityLimit
sets the maximum number of bins for a measure variable in a decision tree. The initial value is 1024.
Scope: SAS Visual Statistics add-on (if licensed)
vae.modeling.DecisionTreePredictorCardinalityLimit
sets the maximum number of distinct levels for a category variable in a decision tree. The initial value is 1024.
Scope: SAS Visual Statistics add-on (if licensed)
vae.modeling.DecisionTreeResponseCardinalityLimit
sets the maximum number of distinct levels for the response category variable in a decision tree. In the initial configuration, this property is not set, so the default value (100) is in effect.
Scope: SAS Visual Statistics add-on (if licensed)
vae.modeling.GroupByCardinalityLimit
sets the maximum number of distinct levels for the group-by variables in a model. This property limits the cumulative total for group-by variables in a model. For example, if the value of this property is set to 800, users can neither specify a group-by variable that contains more than 800 distinct levels nor add a group-by variable that would cause the total number of distinct levels to exceed 800. The initial value is 1024.
Scope: SAS Visual Statistics add-on (if licensed)
vae.RealScatterServerLimit
sets middle-tier thresholds for scatter plots and bubble plots that have three measures and no categories.
Scope: Explorer only
vae.ScatterPlotServerLimit
sets middle-tier thresholds for heat maps that have exactly one category.
Scope: Explorer only
vae.SummaryServerLimit
sets middle-tier thresholds for the following visualization types:
  • crosstabs
  • line charts that have at least one measure and a single category (numeric, date, time, or string)
  • bubble plots that are grouped with no series, grouped with animation, or with series and not grouped
Scope: Explorer only (other applications use va.SummaryServerRowLimit)
Last updated: December 18, 2018