Distributed Server: High-Volume Access to Smaller Tables

Introduction

This topic addresses the specialized situation where all of the following circumstances exist:
  • You must support high-volume Read access to smaller tables.
    Note: Smaller is a relative concept. Tables that are less than 2 GB are good candidates. Tables that are between 2 GB and 20 GB might be good candidates, depending on factors such as server capacity, amount of free memory, and number of nodes.
  • High inter-machine network communication (relative to table size) is negatively impacting data retrieval performance.
  • You are willing to separate your frequently accessed smaller tables into a separate LASR library.
For smaller tables, in-memory access is faster when data is consolidated rather than distributed. For example, if a smaller table serves as the data source for a report, retrieval of that report is faster if the table is available in its entirety on a single machine rather than distributed across multiple machines. For reports that are widely and frequently accessed, the difference in retrieval performance can be worth the effort of managing a separate library for smaller tables.
To optimize retrieval performance for smaller tables, a distributed SAS LASR Analytic Server can keep multiple consolidated (full non-distributed) copies of each table. Each copy is written to and retrieved from a single machine. Each machine launches its own non-distributed server processes as needed to fulfill load and access requests. Load balancing and reuse of the non-distributed server processes further enhance performance.
For more information, see High Volume Access to Smaller Tables in the SAS LASR Analytic Server: Reference Guide.

Instructions

To optimize high-volume access to smaller tables in a distributed SAS LASR Analytic Server:
  1. Identify or create a LASR library that is exclusively for smaller tables.
    • Give the library a name that helps users recognize that they should never load or import large tables into it.
    • Associate the library with a distributed SAS LASR Analytic Server.
  2. Set the LASR library’s extended attribute VA.TableFullCopies to a positive integer. (You can use either SAS Management Console or SAS Environment Manager to set extended attributes for a LASR library.)
  3. To verify results, load a table to the LASR library. On the LASR Tables tab, verify the table’s status. See Get Table Information.

Extended Attribute

The following library-level extended attribute enables smaller-table optimization and controls the number of in-memory instances per table.
VA.TableFullCopies
specifies how many complete, in-memory, single-node instances are created for each loaded table. By default, no value is specified, so no full copy instances are created. If you have a LASR library that contains only smaller tables and is associated with a distributed server, set the value to a positive integer.
CAUTION:
If you specify a high value or if someone loads a large table to the library, server memory could be rapidly consumed.
Consider initially specifying a value less than 4 (and increasing the value incrementally if needed), setting a tables limit for the associated server, and limiting the Administer permission on the library.
Here are some additional details:
  • Autoload supports this attribute.
  • You cannot append data to tables that are loaded with additional full copies.
  • LASR star schemas, imports from Twitter, and imports from Facebook ignore this attribute.
  • Non-distributed SAS LASR Analytic Servers ignore this attribute.
  • In general, it is not beneficial to use compression for tables that are loaded with additional full copies.

Example

Scenario

  • LibraryA is a LASR library that contains only smaller tables.
  • LibraryA is associated with ServerA, a distributed SAS LASR Analytic Server.
  • LibraryA’s Extended Attributes tab specifies a value of 3 for VA.TableFullCopies.

Results

  • When TableA is loaded to LibraryA, three of the nodes on ServerA get a full copy of TableA.
  • When access to TableA is requested, one of those three nodes provides its full copy of TableA.
  • TableA is also loaded in the usual distributed manner. However, no access requests are fulfilled from the distributed instance of TableA.
  • You cannot append to TableA.
Last updated: December 18, 2018