Memory Management

About Physical and Virtual Memory

The amount of memory on a machine is the physical memory. The amount of memory that can be used by an application can be larger, because the operating system can provide virtual memory. Virtual memory makes the machine appear to have more memory available than there actually is, by sharing physical memory between applications when they need it and by using disk space as memory.
When memory is not used and other applications need to allocate memory, the operating system pages out the memory that is not currently needed to support the other applications. When the paged-out memory is needed again, some other memory needs to be paged out. Paging means to write some of the contents of memory onto a disk.
Paging does affect performance, but some amount of paging is acceptable. Using virtual memory enables you to access tables that exceed the amount of physical memory on the machine. So long as the time to write pages to the disk and read them from the disk is short, the server performance is good.
One advantage of SASHDAT tables that are read from HDFS is that the server performs the most efficient paging of memory.

How Does the Server Use Memory for Tables?

When you load a table to memory with the SAS LASR Analytic Server engine, the server allocates physical memory to store the rows of data. This applies to both distributed and non-distributed servers.
When a distributed server loads a table from HDFS to memory with the LASR procedure, the server defers reading the rows of data into physical memory. You can direct the server to perform an aggressive memory allocation scheme at load time with the READAHEAD option for the PROC LASR statement.
Note: When a distributed server loads a table from either the Greenplum Data Computing Appliance or the Teradata Data Warehouse Appliance, physical memory is allocated for the rows of data. This is true even when the data provider is co-located.

How Else Does the Server Use Memory?

Physical memory is used when the server performs analytic operations such as summarizing a table. The amount of memory that a particular operation requires typically depends on the cardinality of the data. In most cases, the cardinality of the data is not known until the analysis is requested. When the server performs in-memory analytics, the following characteristics affect the amount of physical memory that is used:
  • Operations that use group-by variables can use more memory than operations that do not. The amount of memory that is required is not known without knowing the number of group-by variable combinations that are in the data.
  • The memory utilization pattern on the worker nodes can change drastically depending on the distribution of the data across the worker nodes. The distribution of the data affects the size of intermediate result sets that are merged across the network.
Some requests, especially with high-cardinality variables, can generate large result sets. To enable interactive near-real-time work with high cardinality problems, the server allocates memory for data structures that speed performance. The following list identifies some of these uses:
  • The performance for traversing and querying a decision tree is best when the tree is stored in the server.
  • Paging through group-by results when you have a million groups is best done by storing the group-by structure in a temporary table in the server. The temporary table is then used to look up groups for the next page of results to deliver to the client.

Monitoring Server Memory Use

SAS LASR Analytic Server 2.5 introduces the _T_LASRMEMORY and _T_TABLEMEMORY tables. These tables contain information about server memory usage and table memory usage. The tables are always available with any SASIOLA engine libref because each table is created dynamically when you access the table.
Column Descriptions for the _T_LASRMEMORY Table
Column Name
Data Type
Description
Hostname
Character (64)
Identifies the machine.
CommitSize 1
Numeric
Amount of memory that the memory manager has committed for the server.
WorkingSet 1
Numeric
Amount of memory that is physically mapped to the process context for the server.
VirtualMemory 2
Numeric
Amount of virtual memory that is used by the server.
ResidentMemory 2
Numeric
Amount of physical memory currently in use by the server process.
AllocatedMemory
Numeric
Amount of memory that is used by the server, including memory that is used for tables.
TableAllocatedMemory
Numeric
Amount of memory that is used for table storage.
ChildSMPTableMemory
Numeric
Amount of memory that is used by a child non-distributed server for full copies of tables.
ChildSMPVirtualMemory
Numeric
Amount of virtual memory that is used by a child non-distributed server.
ChildSMPResidentMemory
Numeric
Amount of physical memory currently in use by a child non-distributed server.
1Applies to non-distributed servers on Windows only. These column names align with terms used in the Microsoft Windows Resource Monitor.
2Applies to distributed and non-distributed servers on Linux only.
To view the table, you can use a program like the following:
libname example sasiola host="grid001.example.com" port=10010 tag=hps;

proc imstat;
    table example._T_LASRMEMORY;
    fetch;
quit;

/* Alternatively, use the PRINT procedure */
data lasrmemory;
  set example._T_LASRMEMORY;
run;

proc print data=lasrmemory;
    title "Non-distributed Server Memory Use";
    format _numeric_ sizekmg9.2;
run;
The previous program generates output like the following example.
Contents of _T_LASRMEMORY for a Non-Distributed Server
Non-distributed server memory use
For a distributed server, you might want to sum the values from each machine. See the following example:
libname example sasiola host="grid001.example.com" port=10010 tag=hps;

data distributed;
  set example._T_LASRMEMORY;
run;

proc print data=distributed;
  title "Distributed Server Memory Usage";
  format _numeric_ sizekmg9.2;
  sum    _numeric_;
run;
In the following display, notice that the first machine uses much less memory than the others. This is because of the following reasons:
  • The first machine is the root node of a distributed server. The root node does not store rows of data from tables that are loaded into a distributed server.
  • A child non-distributed server is started on the same machine as the root node for providing high-volume access to small tables. However, 0 KB is used for child non-distributed server table memory because the server did not place full copies of tables on that machine.
As more tables are added for high-volume access, this can lead to additional full copies of tables using memory on the root node as well as the worker nodes.
Contents of _T_LASRMEMORY for a Distributed Server
Distributed server memory usage

Monitoring Table Memory Use

The _T_TABLEMEMORY table provides information about the amount of memory that is used for tables. The table is always available with any SASIOLA engine libref because the table is created dynamically when you access the table.
Column Descriptions for the _T_TABLEMEMORY Table
Column Name
Data Type
Description
Hostname
Character (64)
Identifies the machine.
Tablename
Character(64)
Identifies the table.
InMemorySize
Numeric
Amount of memory that is needed to store the table in memory.
UncompressedSize
Numeric
Amount of memory that is used by the table when it is not compressed.
CompressedSize
Numeric
Amount of memory that is used by the table when it is compressed.
TableAllocatedMemory
Numeric
Amount of memory that is used for table storage.
NumberRecords
Numeric
Number of rows from the table that are on the machine.
UseCount
Numeric
Number of processes that are using the table.
When the value is zero and the table is dropped, the memory is immediately freed. If the count is greater than zero and the table is dropped, the memory is not freed until the count drops to zero.
RecordLength
Numeric
Amount of memory that is used to store one row of the table.
ComputedColLength
Numeric
Amount of memory that is used to store columns that are created with the COMPUTE statement of the IMSTAT procedure.
InMemoryMappedSize
Numeric
Amount of memory that is mapped to a SASHDAT table.
ChildSMPTableMemory
Numeric
Amount of memory that is used by a child non-distributed server for full copies of tables. This field applies shown for distributed servers only. For more information, see High Volume Access to Smaller Tables.
Memory that is used by temporary tables is not included in the calculations. They are excluded because temporary tables are typically either dropped after an analysis is performed or they are made available for general use with the PROMOTE statement. After a table is promoted, it is included in the memory use calculations.
For SASHDAT tables, the InMemoryMappedSize matches the InMemorySize. The TableAllocatedMemory value represents internal memory structures for the table and classification levels if the NOCLASS option was not specified.
To view the table, you can use a program like the following:
libname example sasiola host="grid001.example.com" port=10010 tag=hps;

%let sizecols = InMemorySize UncompressedSize 
                CompressedSize TableAllocatedMemory
                InMemoryMappedSize ChildSMPTableMemory;
%let countcols = NumberRecords UseCount RecordLength ComputedColLength;

data tablemem;
    set example._T_TABLEMEMORY;
run;

proc print data=tablemem;
    title "Non-distributed Server Table Memory Usage";
    format &sizecols.  sizekmg9.2;
    format &countcols. 8.;
    sum _numeric_;
run;
Note: Even though the example uses the TAG=HPS option in the LIBNAME statement, the contents of the _T_TABLEMEMORY table include the memory used by all tables in the server.
The previous program generates output like the following example. In this example, the server has two in-memory tables.
Contents of _T_TABLEMEMORY for a Non-distributed Server
Non-distributed server memory use
For a distributed server, the output is similar, but the table includes a row for each machine. In the following example, a five-machine cluster has two in-memory tables, Energy and Prdsal3. The root node of the cluster is shown in the first two rows. The root node never holds rows of data and always indicates that 0 KB used is for TableAllocatedMemory.
Contents of _T_TABLEMEMORY for a Distributed Server
Distributed server memory usage
The Energy table is a SASHDAT table. As a result, the value for the TableAllocatedMemory column is much less than the InMemorySize column because the memory is used only while the server operates on the table. In contrast, the Prdsal3 table is a small table and was loaded with the SASIOLA engine. As a result, the rows are in memory all the time and the overhead for the table structure makes the TableAllocatedMemory size greater than the InMemorySize value. There is some overhead for all tables, it is just more apparent with smaller tables.
The Prdsal3 table was also loaded with the FULLCOPYTO= data set option and a value of 3. The ChildSMPTableMemory column includes three rows that are a little more than 1.5 MB. This indicates the three machines that were selected to hold full copies of the Prdsal3 table.

Accessing the Memory Tables from Other Applications

The _T_LASRMEMORY and _T_TABLEMEMORY tables do not exist until the table is referenced with the IMSTAT procedure or a libref from a SASIOLA LIBNAME engine. For example, the tables are not listed from PROC DATASETS or with the TABLEINFO statement in the IMSTAT procedure.
To access the information from other applications, especially SAS applications that rely on SAS metadata for tables, you can run a DATA step like the following example. The output table can be registered in SAS metadata and you can also manage the output table as part of an ETL process.
libname example sasiola host="grid001.example.com" port=10010 
    tag=hps signer="https://server.example.com/SASLASRAuthorization";

options dsaccel=any msglevel=i;
data example.servermem(append=yes);
    set example._T_LASRMEMORY;
    dttm = datetime();
run;
In the example, the output table, servermem, is stored in the server and can be registered in SAS metadata to become available for reporting with applications like SAS Visual Analytics. If the server is secured with SAS LASR Authorization Service, then the SIGNER= option is needed and the account that runs a program like the example must have metadata-layer permissions. For more information about the permissions, see SAS Visual Analytics: Administration Guide.

Managing Memory

The following list identifies some of the options that SAS provides for managing memory:
  • You can use the TABLEMEM= option to specify a threshold for physical memory utilization.
  • You can use the EXTERNALMEM= option to specify a threshold for memory utilization for SAS High-Performance Analytics procedures.
By default, whenever the amount of physical memory in use rises above 75% of the total memory available on a node of a distributed server, adding tables (including temporary ones), appending rows, or any other operation that consumes memory for storing data fails.
If the machine has already crossed the threshold, your requests to add data are immediately rejected. If you attempt to add a table and the server crosses the threshold as the data is added, the server removes the table that you attempted to add and frees the memory. Similarly, if you attempt to append rows and the server crosses the threshold during the request, the entire append request fails. The table remains as it was before the append was attempted.
You can specify the threshold when you start a server with the TABLEMEM= option in the PROC LASR statement or alter it for a running server with the SERVERPARM statement in the VASMP procedure. By default, TABLEMEM=75 (%).
Note: The memory that is consumed by tables loaded from HDFS do not count toward the TABLEMEM= limit.
Be aware that the TABLEMEM= option does not specify the percentage of memory that can be filled with tables. The memory consumption is measured across all processes of a machine.
A separate memory setting can be applied to processes that extract data from a server on a worker node. SAS High-Performance Analytics procedures can do this. If you set the EXTERNALMEM= option in the PROC LASR statement or through the SERVERPARM statement in the VASMP procedure, then you are specifying the threshold of total memory (expressed as a percentage) at which the server stops sending data to the high-performance analytics procedure.