How Are Tables Used?

Input Tables

The fact table and dimension tables are used in the star schema after you drag and drop them onto the workspace. By default, the data builder considers the first table that you drop in the workspace as the fact table. Click Edit star schema tables from the toolbar to select a different fact table.
Because the star schema is formed in memory by the server, the first step is to make sure that the tables are loaded to memory. Be aware that the tables are transferred each time the star schema runs and this can reduce performance. Conversely, if the tables are already loaded to memory on the server, the star schema runs and generates the output faster.
When creating a star schema using tables from the same HDFS library, the star schema will fail to run if one or more of the tables in the star schema is encrypted, but the HDFS library is not encrypted. In this case, you will receive an error message indicating that a password is needed for the encrypted tables. This can occur when your administrator turns encryption off for a previously encrypted library. Any table that was in the library beforehand will remain encrypted. Your administrator can resolve the issue by turning encryption back on for the library. Contact your administrator for more information.

Column Prefixes

The first 15 characters of a dimension table’s name and the underscore character are initially set as a prefix for the column names from the dimension table. Column names for the output table are a combination of the prefix and the original column name.
However, you can specify a different value for the prefix after you select the table name from the menu at the top of the Inputs tab.
Column names from the fact table are not modified with a prefix.
Tip
A column name is limited to 32 characters. If you have a long column name, then reducing the prefix can help you keep more of the original column name.

Output Table and Conserving Memory

To use memory efficiently, the default output for the schema is a view. The Create output as a view check box on the Output tab is selected by default to create a view. If you clear the check box, then the output is a table.
When the output is a view, the rows are created from the original tables when the view is accessed. This is the main advantage of a view because it does not create the entire output table and hold it all in memory.
If you clear the Create output as a view check box to create a table from the star schema, then the system must have enough available memory to store the table. If the system runs out of memory while running the star schema, then the memory that was used for the output table is freed and you receive an error message.
If you use SAS LASR Analytic Server tables for input to the schema, then the output table for the star schema must use the same library.
When using tables from an HDFS library to create a star schema, the star schema will fail to save if the HDFS library path and the tag of the LASR output library do not match. In this case, you will receive an error message stating that the library path does not match the SAS LASR Analytic Server tag. Try changing the output table to a LASR library whose tag matches the HDFS library path. You can view the tag for the LASR library in SAS Management Console. For more information, contact your administrator.

Performance Considerations

Creating a LASR star schema as a view can be more convenient than creating a table, but accessing data through the view can impact performance negatively. The scale of the impact depends on the size of the dimension tables. However, the number of passes through the data is even more important. For example, requesting percentiles or box plots or fitting statistical models requires passing through the data multiple times. These requests are impacted more than a request for summary statistics.
Keep the following considerations in mind:
  • The initial creation of a view is faster than forming a table. However, accessing the data in a view is slower than accessing the data from a table.
  • Creating a table requires more physical memory than creating a view. If your system has sufficient memory capacity, then creating a table provides the best performance for accessing the data.