The parallel group SELECT statement uses multiple threads up to the MAXWHTHREADS=
limit to perform parallel group aggregations. The threads equally share the memory
that is specified on GRPBYROWCACHE= to cache groups in memory. Each
thread receives 1/MAXWHTHREADS= of the cache.
When a thread accumulates enough distinct groups to fill its cache, the groups are
moved to secondary
bins. At the completion of the parallel BY-group processing, the parallel group aggregations
in memory and in secondary bins are merged to produce the final sorted results. If
you omit the GRPBYROWCACHE= option, the default value is a 2-MB cache per thread.
You can improve aggregation performance with large numbers of groups by increasing
the default value. However, you can potentially allocate more memory than is needed
for caching, which diminishes the resources that are available for processing by the
excess amount of assigned memory.