Planning Your Data Stores and Processes |
For each data warehousing project, you will create a hierarchy of groups and data stores in the SAS/Warehouse Administrator Explorer, such as the one shown in Warehouse Environment in the Explorer.
Warehouse Environment in the Explorer
A group is an element in the Explorer or the Process Editor that is used to organize other elements. For example, in Warehouse Environment in the Explorer, Toy Store Env is the top-level group, and Toy Store Whouse and Sales Source Data are subgroups.
A data store is a table, a view, or a file that is registered in a Warehouse Environment or in one of its Data Warehouses. For example, the Customer item in Warehouse Environment in the Explorer is a data store that contains source data about toy customers.
The Explorer enforces a certain hierarchy of groups and data stores. You can only add a Data Warehouse object to a Warehouse Environment object, for example. Hierarchy of Groups and Data Stores in the Explorer illustrates the hierarchy of objects in the Explorer.
Hierarchy of Groups and Data Stores in the Explorer
The sections that follow provide details about each kind of group and data store.
Data Warehouse Environments |
Each data warehousing project requires at least one Data Warehouse Environment. A Data Warehouse Environment is a metadata record that specifies the SAS library _MASTER. The _MASTER library is the metadata repository for host definitions and other global metadata that is shared among one or more Data Warehouses and ODD Groups.
Environments are added and opened from the SAS/Warehouse Administrator desktop, as described in Opening a Warehouse Environment in the Explorer. New Warehouse Environment in the Explorer shows a new Environment (Toy Store Env) that has been opened in the SAS/Warehouse Administrator Explorer.
New Warehouse Environment in the Explorer
Warehouse Environment in the Explorer shows an Environment in which a number of subgroups and data stores have been defined. For details, see Maintaining Environments.
After you open a Warehouse Environment in the Explorer, you can define metadata records that can be shared by data stores and other elements within that Environment. For example, if a number of data stores in an Environment will be stored on the same host, you could create a definition for that host and save it as part of the global metadata for the Environment. You could then include the host definition in the metadata for the data stores that reside on that host, without retyping the host information each time.
In each Warehouse Environment, you will define at least some of the global metadata types below:
SAS library definitions (including SAS/ACCESS LIBNAME definitions and Jobs Information library definitions)
As you create data stores and other SAS/Warehouse Administrator elements, you must specify hosts, libraries, and other global metadata. If you create the main global metadata items first, you can simply select them from a list, rather than having to stop and create a host definition in the middle of creating a data store. For details, see Maintaining Global Metadata.
Operational Data Definitions (ODDs) |
After you have created an Environment, you can register the source data for that Environment. To do that, you must define at least one Operational Data Definition Group (ODD Group) and a number of Operational Data Definitions (ODDs).
An ODD Group is a simple grouping element for ODDs. It can also contain one or more Information Marts, another kind of SAS/Warehouse Administrator group. In the SAS/Warehouse Administrator Explorer, an ODD Group can only be added to a Warehouse Environment. ODD Group with ODDs in the Explorer shows one ODD Group (Sales Source Data) that contains a number of ODDs (Customer, Drop, and so on).
ODD Group with ODDs in the Explorer
An Operational Data Definition (ODD) is a metadata record that provides access to data sources. The ODDs, in turn, are used as inputs to data stores in a Warehouse Environment.
At a minimum, in order for a data source to be visible in a Warehouse Environment, you must specify the location of that data source in an ODD. You can define an ODD that simply registers the location of a SAS table or view, or that registers the location of a DBMS table with the help of a SAS/ACCESS LIBNAME definition. You can also define an ODD that extracts information from a data source, saves the results to a SAS table or view, and then specifies the location of the extraction table or view.
In the SAS/Warehouse Administrator Explorer, an ODD can be added only to an ODD Group. For details about ODDs, see Registering Data Sources.
Data Files and External Files are inputs to ODDs. They cannot be added in the Explorer; they can only be added in the Process Editor. Data File and External File in a Process Flow shows a Process Flow that includes a Data File and an External File.
Data File and External File in a Process Flow
A Data File is a metadata record that specifies a SAS file that is an input to an ODD. You can define a Data File that simply registers the location of a SAS table or view, or one that registers the location of a DBMS table with the help of a SAS/ACCESS LIBNAME definition. You can also define a Data File that extracts information from a data source, saves the results to a SAS table or view, and then specifies the location of the extraction table or view.
An External File is an input to an ODD that extracts information from one or more sources that are not in SAS format. That is, an External File is an input to an ODD whose Load Step is a DATA step view.
If you are defining an ODD whose Load Step is a DATA step view or an SQL view (but not a Pass-Through view), you must define its inputs in the Process Editor. Even if your ODD does not meet the previously discussed conditions, you might want to specify a Process Flow for this Job for documentation purposes.
For details about these objects, see Registering Data Sources.
Data Warehouses and Subjects |
To support your data warehousing project, you will create at least one Data Warehouse and one or more Subjects within each Warehouse.
A Data Warehouse is a metadata record that specifies the SAS library _DWMD. The _DWMD library is the metadata repository for most of the groups and data stores in a data warehouse. In the Explorer, a Data Warehouse object can only be added to a Data Warehouse Environment. Data Warehouse and Subject in the Explorer shows one Data Warehouse (Toy Store Whouse) that contains one Subject (Toy Sales).
Data Warehouse and Subject in the Explorer
A Subject is a grouping element for data related to one topic within a Data Warehouse. For example, a Data Warehouse might have a Subject called Products (information related to products) or Sales (information related to sales). Each Subject can be composed of a number of different data collections: detail data, summary data, charts, reports, and graphs. In the Explorer, a Subject can only be added to a Data Warehouse.
For details about Data Warehouses and Subjects, see Maintaining Data Warehouses and Subjects.
A data mart is a limited data warehouse that is often designed to meet the needs of a particular department or individual. A data mart is more limited in scope than a data warehouse, which typically contains information used by more than one department. To implement a data mart in SAS/Warehouse Administrator, use an appropriate SAS/Warehouse Administrator object: either a Data Warehouse or a Data Group.
Detail Data Stores |
Detail data is information that is at or near the fact level in a database. It is data that has not been summarized or has only been lightly summarized after extraction from a source. In a data warehousing project, detail data stores are often used as inputs to summary data stores. They can also be exploited directly -- data mining operations are typically run against the detail data in a warehouse, for example.
In SAS/Warehouse Administrator, detail data can be stored in Data Tables, Detail Logical Tables, or Detail Tables. Typically, you will create these objects whenever the source data specified in an ODD needs to be transformed or merged in order to provide useful detail data for your project.
A Data Table is a metadata record that specifies a SAS table or view or a DBMS table or view that can serve multiple purposes. Data Tables are frequently used to define intermediate data stores, such as look-up tables included as part of a join. They can be used to define detail data stores, summary data stores (if you write your own summary code and register it as the Load Step for the Data Table), or tables that hold information that does not fit anywhere else. In the SAS/Warehouse Administrator Explorer, a Data Table can only be added to a Data Group. For more information about Data Tables, see Maintaining Data Tables.
A Detail Logical Table is a metadata record that specifies a SAS table or view that can serve multiple purposes. A Detail Logical Table is often used to implement a view on multiple, related Detail Tables. Detail Logical Table with Detail Tables in the Explorer shows a Detail Logical Table (Sales Detail Grp) with Detail Tables (Customer, Drop, and so on).
Detail Logical Table with Detail Tables in the Explorer
In the SAS/Warehouse Administrator Explorer, a Detail Logical Table can only be added to a Subject. A Subject can have only one Detail Logical Table. A Detail Logical Table can contain any number of Detail Tables. Detail Logical Tables in different Subjects can share (link to) the same Detail Table.
A Detail Table is a metadata record that specifies a SAS table or view or a DBMS table or view that serves as a detail data store. In the SAS/Warehouse Administrator Explorer, a Detail Table can be added only to a Detail Logical Table. For more information, see Maintaining Detail Logical Tables and Detail Tables.
Typically, you will create Data Tables, Detail Logical Tables, or Detail Tables whenever the source data specified in an ODD needs to be transformed or merged in order to provide useful detail data for your project.
A Data Table is a good choice when:
You want to create multiple groups of detail data stores in the same Subject. (A Subject can only have one Detail Logical Table.)
You need an intermediate data store, such as a look-up table included as part of a join in the Process Editor. For details, see Example: Creating a Data Table.
A Detail Logical Table and its Detail Tables are a good choice when you want to implement a view on multiple, related Detail Tables. For more information, see Example: Creating a Detail Logical Table as a View to Multiple Detail Tables.
Summary Data Stores |
Summary data is information that is derived from the facts in a database. It is data that has been summarized after extraction from a source. Many data warehousing projects require a number of summary data stores in order to support end-user reports, queries, and analysis.
In SAS/Warehouse Administrator, most summary data is stored in OLAP Tables and OLAP MDDBs. It is possible to store summarized data in Data Tables, but you must specify user-written summarization code in the Load Step for the Data Table.
Note: SAS/Warehouse Administrator 2.0 replaces Summary Groups, Summary Tables, and Summary MDDBs with OLAP Groups, OLAP Tables, and OLAP MDDBs. That is, in Release 2.0, you cannot create Summary Groups. Instead, you can create OLAP Groups. The new summary data stores better support OLAP (online analytical processing) and facilitate the different requirements of OLAP reporting for HOLAP (hybrid OLAP), MOLAP (multidimensional OLAP), and ROLAP (relational OLAP).
An OLAP Group (Online Analytical Processing Group) organizes related summary data, which is stored in OLAP Tables or OLAP MDDBs. The OLAP Group properties specify the logical structure of the summarized data and how they relate to the detail data in a data warehouse. OLAP Groups have a type attribute, which you specify as ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), HOLAP (Hybrid OLAP), or MIXED.
In the SAS/Warehouse Administrator Explorer, an OLAP Group can be added only to a Subject. OLAP Group in the Explorer shows an OLAP Group (HOLAP Group 2) that contains an OLAP MDDB (OG2 Sum10) and a OLAP Table (OG2 Sum 12).
An OLAP MDDB is a metadata record that specifies a SAS MDDB. A SAS MDDB is not a SAS table. It is a specialized storage format that stores derived summary data in a multidimensional form, which is a highly indexed and compressed format. To load an OLAP MDDB, SAS/Warehouse Administrator generates code for the MDDB procedure, which summarizes data similar to the SUMMARY procedure.
An OLAP Table is a metadata record that specifies a file to store derived summary data. This file can be a SAS table or view or a DBMS table or view. An OLAP Table can have multiple crossings. To load an OLAP Table, SAS/Warehouse Administrator generates code for the SUMMARY procedure, which summarizes data by computing descriptive statistics for columns across rows or within groups of rows.
For details, see Maintaining Detail Logical Tables and Detail Tables.
A Data Table can be used as a simple summary data store if you write your own summarization code. Otherwise, the OLAP objects might be more useful. More specifically, use an OLAP Group of type
ROLAP to support online analytical processing performed on a relational database, such as a SAS table or an ORACLE table
MOLAP to support online analytical processing performed on a SAS multidimensional database (MDDB)
HOLAP to support online analytical processing performed on a integrated data store consisting of one or more SAS MDDBs and one or more relational databases
MIXED to support online analytical processing performed on individual (unrelated) OLAP Tables and OLAP MDDBs.
Information Marts (optional) |
An InfoMart is a simple grouping element for InfoMart Items and InfoMart Files, as shown in Information Mart in the Explorer.
Information Mart in the Explorer
Unlike most objects in SAS/Warehouse Administrator, InfoMart Items and Files are used to display information rather than store it. For example, an InfoMart Item might be used to display a chart that summarizes sales information from a warehouse data store. An InfoMart File might be used to open a spreadsheet that contains information useful to the person managing a given Warehouse Environment. In the SAS/Warehouse Administrator Explorer, an InfoMart can be added only to a Subject, a Data Group, or an ODD Group.
For details, see Maintaining Information Marts.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.