space
Previous Page | Next Page

Creating and Loading SYSTEM 2000 Databases

DBLOAD Procedure in SAS and SYSTEM 2000


Using the DBLOAD Procedure

The DBLOAD procedure runs in batch and interactive line mode and enables you to create and load a SYSTEM 2000 database from a SAS data set. You can create the database definition only and execute one or more incremental loads at a later time.

PROC DBLOAD constructs SYSTEM 2000 statements to create the database definition. You can load new logical entries into the database, or you can insert new records into existing logical entries. The data can come from a SAS data file, from a view created by using the SQL procedure, or from a SYSTEM 2000 database or another DBMS (using a view created by using the ACCESS procedure).

PROC DBLOAD associates each SAS variable in the input data with a SYSTEM 2000 item and assigns a default name, number, and type to each item. By default, items are non-key at level 0. You can use the defaults or change the names, the status (key or non-key), and the level, as necessary. When you are finished customizing the items, PROC DBLOAD creates the SYSTEM 2000 database.


Compatibility with SAS 6

You can use SAS 6 and later SAS data files to create and load SYSTEM 2000 databases by using PROC DBLOAD. However, beginning with SAS 7 data files, variables with names that are longer than eight characters will have their names truncated to eight characters in the access and view descriptors created by PROC DBLOAD. The RENAME statement in PROC DBLOAD can be used to rename the variables in the SYSTEM 2000 database, but it will not change the variable names in the access and view descriptors. The truncated names must be used to access the data described by the view descriptors.


Creating a SYSTEM 2000 Database

In this section, PROC DBLOAD is used to create the database BANKING and load data into it. In this new database, each logical entry contains data for one customer.

The ENTRY schema record at level 0 contains two items: the customer name and customer ID. The schema record at level 1 contains information about the customer's checking and savings accounts. The schema record at level 2 contains information about the transactions for each account.

After you create the database BANKING, use the QUEST procedure to execute the DESCRIBE statement in SYSTEM 2000 to produce the following output, which shows the database definition.

BANKING Database Definition

  SYSTEM RELEASE NUMBER  12.1
  DATA BASE NAME IS      BANKING
  DEFINITION NUMBER             1
  DATA BASE CYCLE NUMBER       18
       1*  CUSTNAME (CHAR X(20))
       2*  CUSTID (CHAR X(7))
     100*  RECORD_LEVEL_1 (RECORD)
       101*  ACCOUNT NUMBER (INTEGER NUMBER 9999 IN 100)
       102*  ACCOUNT TYPE (CHAR X IN 100)
       200*  RECORD_LEVEL_2 (RECORD IN 100)
         201*  TRANS TYPE (CHAR X IN 200)
         202*  TRANS AMOUNT (NON-KEY MONEY $9(7).99 IN 200)
         203*  TRANS DATE (DATE IN 200)
Notice that the records are indented at three different levels to reflect the record relationships. That is, record C200 is a descendant of record C100, which is a descendant of the level 0 record.

Loading the Input Data File

The input SAS data file that is used for the examples given here is shown in the section Data File TRANS.BANKING. If you want to run the examples, make sure that you sort the observations before you use PROC DBLOAD. Sorting the observations groups them by accounts for each customer, which produces data in the sequence required for loading the three-level database BANKING.

Each observation in the input data file represents one transaction. For example, John Booker has four transactions, two for each of his accounts. His name and account numbers are repeated in each observation as shown in the following output.

The SAS Data File TRANS.BANKING

OBS  CUSTNAME       CUSTID   ACCTNUM  ACCTTYP  TRANSTYP  TRANSAMT  TRANSDAT

  1  BOOKER, JOHN   74-9838   8349       S         D       $40.00  05JUN89
  2  LOPEZ, PAT     38-7274   9896       S         D       $15.67  23JUN89
  3  JONES, APRIL   85-4941   4141       C         W      $213.78  29JUN89
  4  BOOKER, JOHN   74-9838   8349       S         I       $34.76  30JUN89
  5  MILLER, NANCY  07-6163   7890       S         I       $53.98  30JUN89
  6  LOPEZ, PAT     38-7274   9896       S         I       $16.43  30JUN89
  7  JONES, APRIL   85-4941   4141       C         W      $354.70  30JUN89
  8  MILLER, NANCY  07-6163   7890       S         D    $1,245.87  01JUL89
  9  JONES, APRIL   85-4941   4141       C         D    $2,298.65  01JUL89
 10  MILLER, NANCY  07-6163   3876       C         W       $45.98  08JUL89
 11  ROGERS, MIKE   96-5052   4576       C         D       $75.00  10JUL89
 12  BOOKER, JOHN   74-9838   3673       C         D      $150.00  10JUL89
 13  LOPEZ, PAT     38-7274   9896       S         D       $50.00  10JUL89
 14  BOOKER, JOHN   74-9838   3673       C         W       $65.43  13JUL89
 15  ROGERS, MIKE   96-5052   4576       C         W       $12.34  13JUL89
 16  ROGERS, MIKE   96-5052   4576       C         W       $45.67  13JUL89
 17  MILLER, NANCY  07-6163   3876       C         D       $56.79  14JUL89
 18  ROGERS, MIKE   96-5052   4576       C         W       $12.16  15JUL89

After you sort the input data file by customer name and account type, PROC DBLOAD loads data for each customer as a logical entry in the SYSTEM 2000 database. Redundant data is reduced, which saves storage space. The logical entry for John Booker would look like the following figure.

Sample Logical Entry in BANKING Database

[Sample Logical Entry in BANKING Database]

After you load the input data from TRANS.BANKING, run the following SYSTEM 2000 LIST statement in PROC QUEST.

    list c1, c101, c102, c201, c202;

Here are the results.

Output from LIST Statement Run on the Database BANKING

* CUSTNAME        ACCOUNT NUMBER   ACCOUNT TYPE   TRANS TYPE   TRANS AMOUNT
***
* BOOKER, JOHN              3673   C              D                 $150.00
*                                                 W                  $65.43
*                           8349   S              D                  $40.00
*                                                 I                  $34.76
* JONES, APRIL              4141   C              W                 $213.78
*                                                 W                 $354.70
*                                                 D               $2,298.65
* LOPEZ, PAT                9896   S              D                  $15.67
*                                                 I                  $16.43
*                                                 D                  $50.00
* MILLER, NANCY             3876   C              W                  $45.98
*                                                 D                  $56.79
*                           7890   S              I                  $53.98
*                                                 D               $1,245.87
* ROGERS, MIKE              4576   C              D                  $75.00
*                                                 W                  $12.34
*                                                 W                  $45.67
*                                                 W                  $12.16

Notice the values shown for John Booker. His name appears only one time, but he has two account numbers and four transactions. Because the examples that use PROC DBLOAD rank the data values into levels, you have a clear, logical view of the data.


Subsetting Input Data

To subset your input data, use the WHERE statement in SAS. Creating a subset of the input data is useful if you need to transfer only a portion of your SAS data to a SYSTEM 2000 database. For example, you might want to include only observations in which the value in a variable is greater than a specified number.

The following program subsets the input data to include only those observations in which the SAS variable ACCTNUM has a value greater than 4141. None of the items are renamed or indexed, and they are all at level 0.

Notice that you use the SAS variable name in the WHERE statement, not the SYSTEM 2000 item name. For information about the syntax of the WHERE statement, see SAS Language Reference: Dictionary.

  proc dbload dbms=s2k data=trans.banking;
    s2kpw=mine;
    dbn=banking;
    s2kmode=m;
    where acctnum > 4141;
  load;
  run;


Subsetting Input Data

To subset your input data, use the WHERE statement in SAS. Creating a subset of the input data is useful if you need to transfer only a portion of your SAS data to a SYSTEM 2000 database. For example, you might want to include only observations in which the value in a variable is greater than a specified number.

The following program subsets the input data to include only those observations in which the SAS variable ACCTNUM has a value greater than 4141. None of the items are renamed or indexed, and they are all at level 0.

Notice that you use the SAS variable name in the WHERE statement, not the SYSTEM 2000 item name. For information about the syntax of the WHERE statement, see SAS Language Reference: Dictionary.

  proc dbload dbms=s2k data=trans.banking;
    s2kpw=mine;
    dbn=banking;
    s2kmode=m;
    where acctnum > 4141;
  load;
  run;


Loading a SYSTEM 2000 Database

To create and load a SYSTEM 2000 database use PROC DBLOAD with options and statements that describe the SYSTEM 2000 database that you want to create and the data that you want to load into the database. To load the database BANKING, use the following program. The function of each statement is explained in the section that follows.

JCL statements;

proc dbload dbms=s2k data=trans.banking;
   s2kpw=mine;
   dbn=banking;
   accdesc=mylib.bank;
   viewdesc=vlib.myview;
   s2kmode=m;
   rename acctnum='account number' 4= 'account type'        
         5='trans type' 6='trans amount' 
         7='trans date';
   index 1=y 2=y 3=y 4=y transtyp=y 7=y;
   level 3=1 4=1 5=2 6=2 transdat=2;
   list all; 
 load;
 run;

JCL statements;

submit your statements for execution under SAS.

proc dbload dbms=s2k data=trans.banking;

invokes the DBLOAD procedure. The DBMS= option specifies the DBMS that you want to load data into. The DATA= option specifies the SAS data file that contains the data.

s2kpw=mine;

issues the password that will become the master password for the new database.

dbn=banking;

identifies the database that you want to create. In this example, you create a SYSTEM 2000 database named BANKING.

accdesc=mylib.bank;

specifies the access descriptor libref and member name. The access descriptor is created automatically by PROC DBLOAD. In this example, the specified name is MYLIB.BANK. The default access descriptor name is WORK.BANKING.ACCESS, where BANKING is the name of the database to be created. The access descriptor member name must not already exist when you are creating a new database.

viewdesc=vlib.myview;

specifies the view descriptor libref and member name. The view descriptor is created automatically by PROC DBLOAD. In this example, the specified name is VLIB.MYVIEW. The default view descriptor name is WORK.BANKING.VIEW, where BANKING is the name of the database to be created. The view descriptor member name must not already exist when you are creating a new database.

s2kmode=m;

creates the new database in a Multi-User environment. The default, S, specifies a single-user environment. For a Multi-User session, the new database files can be allocated when the session is initialized or dynamically allocated during execution by using the ALLOC command in SYSTEM 2000, Release 12.0 and later. For a single-user job, you must allocate the database files in the JCL for the job, or dynamically allocate the database files using the S2KDBCNT file.

rename acctnum='account number' 4= 'account type' 5='trans type' 6='trans amount' 7='trans date';

changes the names of the last five items. In a RENAME statement, always use a SAS variable on the left side of the equal sign (=). You can use either the SAS variable name or its positional equivalent as shown in the LIST statement output (Output 6.4), and you can rename as many items as you want in one RENAME statement.

index 1=y 2=y 3=y 4=y transtyp=y 7=y;

defines items as key (indexed). In this INDEX statement, all items except TRANS AMOUNT are key items. TRANS AMOUNT is not listed, so it defaults to non-key.

In an INDEX statement, always use a SAS variable on the left side of the equal sign (=). You can use either the SAS variable name or its positional equivalent as shown in the LIST statement output below, and you can index as many items as you want in one INDEX statement.

level 3=1 4=1 5=2 6=2 transdat=2;

changes the level number of an item. In this LEVEL statement, ACCOUNT NUMBER and ACCOUNT TYPE become items in a level 1 record; TRANS TYPE, TRANS AMOUNT, and TRANS DATE become items in a level 2 record. PROC DBLOAD automatically defines the schema record names RECORD_LEVEL_1 and RECORD_LEVEL_2, respectively, and assigns appropriate component numbers.

In a LEVEL statement, always use a SAS variable on the left side of the equal sign (=). You can use either the SAS variable name or its positional equivalent as shown in the LIST statement output below, and you can change the level number for as many items as you want in one LEVEL statement.

list all;

lists the items, levels, and index settings.

load; run;

executes PROC DBLOAD and creates and loads the database.

LIST Statement Output

 Command ===>
 
 PROC DBLOAD for SYSTEM 2000 - OPTIONS FOLLOW:
    Input data set=        TRANS    BANKING  DATA
    View descriptor=       VLIB     MYVIEW   VIEW
    Access descriptor=     MYLIB    BANK     ACCESS
    Database name=         BANKING
    S2KMODE=               M
    Label option=          N
    Create option=         N
    S2KLOAD=               N
 ------------SAS NAME---LEVEL---INDEX---COMPONENT NAME----
   1          CUSTNAME           YES     CUSTNAME
   2          CUSTID             YES     CUSTID
   3          ACCTNUM    1       YES     ACCOUNT NUMBER
   4          ACCTTYP    1       YES     ACCOUNT TYPE
   5          TRANSTYP   2       YES     TRANS TYPE
   6          TRANSAMT   2               TRANS AMOUNT
   7          TRANSDAT   2       YES     TRANS DATE

To load additional logical entries into an existing SYSTEM 2000 database, invoke PROC DBLOAD and specify the input data file and the appropriate view descriptor. The view descriptor contains the database name, the component names, levels, and so on. It also contains the password for the database and the access mode (single-user or Multi-User). You can use a WHERE clause in SAS to limit the input. However, a SYSTEM 2000 where-clause in the view descriptor does not affect an incremental load.

To perform an incremental load with PROC DBLOAD, use the following program. In this example, the data file is TRANS.INCLOAD. The function of each statement is explained immediately following the program.

JCL statements;

proc dbload dbms=s2k data=trans.incload;
  viewdesc=vlib.myview;
load;
run;

JCL statements;

submit your statements for execution under SAS.

proc dbload dbms=s2k data=trans.incload;

invokes the DBLOAD procedure. The DBMS= option specifies the DBMS that you want to load data into. The DATA= option specifies the SAS data file that contains the data.

viewdesc=vlib.myview;

specifies the existing view descriptor libref and member name.

load; run;

executes PROC DBLOAD and loads the database.


Adding New Logical Entries vs. Updating Existing Logical Entries

Incremental loads can either insert new logical entries or append new records to existing logical entries. Both types of incremental loading are performed the same way, as shown in the previous example. How does the SYSTEM 2000 engine know which action to perform?

If you issue an S2KLOAD statement, the input observations are treated as new logical entries. Several observations can be collected to form each logical entry, but they are all new entries. The observations must be sorted in order to achieve the correct result.

If you do not issue the S2KLOAD statement, your results are based on the order of the observations and whether the view descriptor contains a BY key. A BY key identifies the placement of inserted data records in an incremental load. See Using a BY Key. When using a BY key, it is best (less ambiguous) if your view descriptor and the BY key begin at level 0, even if you are loading records only at a lower level.

space
Previous Page | Next Page | Top of Page