SYSTEM 2000 Databases

Overview of Database Definition

A SYSTEM 2000 database is hierarchical because you can store and access data according to organized relationships in groups of associated data. When a SYSTEM 2000 database is created, a plan called a database definition is devised, in which the following happens:
  • the database has an assigned name
  • the data to be stored is labeled
  • the data is arranged into groups
  • relationships are established among the groups of data
Usually, a database is organized according to the types of data and the way you want to use the data. To create descriptor files for the SAS/ACCESS interface, you must understand and be familiar with the contents of the database and its organization in order to retrieve and update information accurately and efficiently. The output below shows an excerpt of the database definition for Employee, which is the output from a DESCRIBE statement in SYSTEM 2000. (For a complete listing of the database definition for Employee, see Example Programs.)
Database Definition for the Database Employee
  SYSTEM RELEASE NUMBER  12.1
  DATA BASE NAME IS      EMPLOYEE
  DEFINITION NUMBER             2
  DATA BASE CYCLE NUMBER       25
       1*  EMPLOYEE NUMBER (INTEGER NUMBER 9999)
       2*  LAST NAME (CHAR X(10) WITH  FEW FUTURE OCCURRENCES )
       3*  FORENAME (NON-KEY CHAR X(20))
               .
               .
               .
      16*  ZIP CODE (CHAR X(5) WITH  FEW FUTURE OCCURRENCES )
     100*  POSITION WITHIN COMPANY (RECORD)
       101*  POSITION TITLE (NON-KEY CHAR X(10) IN 100)
       102*  DEPARTMENT (CHAR X(14) IN 100 WITH SOME FUTURE OCCURRENCES )
       103*  MANAGER (CHAR XXX IN 100 WITH  FEW FUTURE OCCURRENCES )
       104*  POSITION TYPE (CHAR X(12) IN 100 WITH SOME FUTURE OCCURRENCES )
       105*  START DATE (DATE IN 100)
       106*  END DATE (NON-KEY DATE IN 100)
       110*  SALARY WITHIN POSITION (RECORD IN 100)
         111*  PAY RATE (MONEY $9999.99 IN 110)
         112*  PAY SCHEDULE (CHAR X(7) IN 110)
         113*  EFFECTIVE DATE (DATE IN 110)
         114*  CURRENT DEDUCTION (NON-KEY MONEY $9999.99 IN 110)
                .
                .
                .
     400*  EDUCATIONAL BACKGROUND (RECORD)
       410*  EDUCATION (RECORD IN 400)
         411*  SCHOOL (CHAR X(15) IN 410)
         412*  DEGREE/CERTIFICATE (CHAR X(7) IN 410 WITH  FEW FUTURE OCCUR
               RENCES )
         413*  DATE COMPLETED (DATE IN 410)
         414*  MAJOR FIELD (NON-KEY CHAR X(16) IN 410)
         415*  MINOR FIELD (NON-KEY CHAR X(12) IN 410)

Database Names

The database name is a unique name, from 1 to 16 characters in length, that is assigned to a specific SYSTEM 2000 database definition. Each database also has one or more passwords associated with it.
To create descriptor files for the SAS/ACCESS interface, you must know the name and password for the SYSTEM 2000 database that you want to access.

Labeling Data

A database definition consists of schema items and schema records, which describe a blueprint for the type of data to be stored. For example, in the database definition for Employee, EMPLOYEE NUMBER is a schema item and POSITION WITHIN COMPANY is a schema record.
A schema item names and defines the characteristics of a group of values. A schema item has a name, a type, and a picture (length). Each value stored in a SYSTEM 2000 database corresponds to a schema item. For example, the following schema item describes the numbers used as employee identification numbers. The four 9s indicate that each employee number can contain a maximum of four digits.
    1* EMPLOYEE NUMBER (INTEGER NUMBER 9999)
A schema record groups associated schema items. Schema records are explained in Grouping Schema Items.
Schema items and schema records are referred to as schema components, and each is identified by a component number and a component name, as shown in the following figure. (A component number can also be referred to as a C-number, such as C101.)
Schema Components
Schema Components
To access data stored in a SYSTEM 2000 database, you must specify either the component number or the component name. Both are unique in the database definition to avoid ambiguity. Each line in a database definition begins with the component number and the component name.
When you create descriptor files for SYSTEM 2000 databases, PROC ACCESS creates corresponding SAS variable names from the SYSTEM 2000 schema item names. You can then use the variable names in SAS procedures.

Grouping Schema Items

In a SYSTEM 2000 database, associated schema items are grouped by schema records. That is, different schema records store different groups of data, and a schema item belongs to only one schema record. Grouping associated schema items into schema records is similar to planning a form. A form is usually divided into sections with one section for each set of related data.
For example, look at the database definition for Employee in Database Definition for the Database Employee. Schema items C1 through C16 contain personal information about each employee. These items are grouped into one record, the ENTRY or C0 record. (The component number and name for the ENTRY record is not listed in a database definition unless it has been renamed.) The schema items C101 through C106, which contain information about an employee's position, are grouped in schema record C100, POSITION WITHIN COMPANY.

Grouping Schema Records

Schema record relationships are established by arranging the schema records into levels. Each schema record is placed at a specific level, which creates a hierarchical structure.
These levels are achieved by ranking schema items with values that occur only one time per employee (such as an employee's name and address) at a higher level than schema items with multiple values (such as an employee's job titles and salaries). That is, schema items having a one-to-many relationship with other schema items rank higher in the database hierarchy than the other schema items.
Look at the database definition for Employee shown in Levels in a Database Definition. Notice that schema items are indented under their parent schema record, and schema records are indented farther to the right. This reflects the relationships among the records:
  • Schema items C1 through C16 store values that occur only one time per employee and are grouped as the top level of the database in the ENTRY record or at level 0.
  • The ENTRY record (C1 to C16) has a one-to-many relationship with the POSITION WITHIN COMPANY record (C100) because each employee can have more than one position during their employment, so position title, department, and so on, can have multiple values. Because positions are associated with specific employees, the POSITION WITHIN COMPANY record is related to the ENTRY record. POSITION WITHIN COMPANY is below level 0. It is at level 1.
  • Positions have a one-to-many relationship with salary data because an employee can have more than one salary in a single position. Salary information is grouped in a record named SALARY WITHIN POSITION (C110), which is related to the POSITION WITHIN COMPANY record. SALARY WITHIN POSITION is below level 1. It is at level 2.
Levels in a Database Definition
Levels in a Database Definition
The next set of terms refers to the relationships between the levels, which are like relationships in a family.
  • A parent is the record immediately preceding a specified record. Each record can have only one parent, and no record is an orphan, except for the ENTRY record at level 0.
  • An ancestor is a record that exists on the level that precedes a specified record in the same path. The ENTRY record is an ancestor of all other records in the database.
  • A descendant is a record that exists at a lower level than a specified record in a family. C100 is a descendant of the ENTRY record; C110 is a descendant of the ENTRY record and C100.
  • Children are the records immediately following a specified record. C100 is a child of the ENTRY record; C110 is a child of C100.
  • A family consists of a record, all its ancestors, and all its descendants.
  • The path of a record is a record and all its ancestors. C110, C100, and the ENTRY record make up a path; C410, C400, and the ENTRY record make up another path.
Schema records are disjoint if their paths are different. When you create a view descriptor, you cannot include items that are from disjoint schema records. For example, items from C110 and items from C410 cannot be included in the same view descriptor.

Logical Entries

A database consists of groups of logically related data called logical entries. The database definition serves as a pattern to create logical entries for the database and to interpret them. A logical entry contains groups of related data called data records. A data record is an identifiable set of values that are treated as a unit and associated with a schema record.
For example, in the database Employee, logical entries contain data about employees; all data records that pertain to one employee make up a single logical entry. Each logical entry has a data record for personal data (such as the employee's name, address, and birthday), and a data record that pertains to the position that the employee holds in the company (such as title, department, manager, and pertinent dates). If the employee held several positions in the company, there is a data record for each position.
Using the layout of the database definition, the following figure shows the schema items with values for one employee. David Reid held two positions: programmer and assistant programmer. In addition, he has three salary data records for his programmer position.
Logical Entry
Logical Entry
When you use a view descriptor in a SAS program to access a SYSTEM 2000 database, you must be familiar with the database structure in order to understand how the interface view engine maps a SYSTEM 2000 logical entry into multiple SAS observations and back again. This process is explained in the next section.

Mapping Data between SAS and SYSTEM 2000

When you create a view descriptor to access data stored in a SYSTEM 2000 database, you define one path in the database. For example, using the database Employee, you can define a view descriptor that includes the items LAST NAME, FORENAME, POSITION TITLE, and PAY RATE. When you access the data using the view descriptor, the interface view engine maps the specified path for each logical entry into multiple observations. The following output shows the logical entry for David Reid mapped into SAS observations.
Logical Entry Mapped into SAS Observations
  
   LASTNAME   FORENAME   POSITION               PAYRATE

   REID       DAVID G.   ASSISTANT PROGRAMMER   $1,000.00
   REID       DAVID G.   PROGRAMMER             $1,100.00
   REID       DAVID G.   PROGRAMMER             $1,200.00
   REID       DAVID G.   PROGRAMMER             $1,300.00
When browsing SYSTEM 2000 data, such as with the FSVIEW procedure, the results would be similar to those shown in the previous output. (See Browsing and Updating SYSTEM 2000 Data for more information.)

Null Data (Missing Values)

A logical entry does not need data at every level of the database definition. A logical entry can contain nulls, that is, missing values or empty records.
  • A null item is a schema item that has no value in the data record. For example, in the logical entry shown in Logical Entry, because David Reid still works for the company, he does not have a value for the schema item END DATE for his programmer position. Therefore, that item is a null item.
  • A null record is a data record that consists entirely of null items. A null record can occur when there is data for a given data record but no data for its parent record. For example, position information exists but an employee has not been hired yet; there is data at level 1 but the ENTRY record is a null record. Another example is when salary information exists, but position information has not been entered; there is data at level 0 and level 2, but a null record exists at level 1. In both examples, the null record must be present in the database because a parent record must exist for all records except the ENTRY record.
  • A control node is a schema record that contains no schema items. A control node serves as a control record for descendant records. Look at the database definition for Employee in Example Programs, and you see that schema record 400 is a control node.
Note: In SAS, nulls are referred to as missing values. SYSTEM 2000 and SAS handle nulls (missing values) differently. However, the interface view engine takes care of the differences in a predictable, useful way. See Missing Values (Nulls) for a discussion of the differences.