Selecting and Combining CA-Datacom/DB Data

Using the WHERE Statement or the SQL Procedure

Many SAS programs select and combine data from various sources. The method that you use depends on the configuration of the data. The next examples show you how to select and combine data using two different methods. When choosing between these methods, consider the issues described in Performance Considerations.

Selecting Data with the WHERE Statement

Suppose you have two view descriptors, Vlib.UsaInv and Vlib.ForInv, that list the invoices for the USA and foreign countries, respectively. You could use the SET statement to concatenate these files into a single SAS data file. The WHERE statement specifies that you want a data file containing information about customers who have not paid their bills and whose bills amount to at least $300,000.
data notpaid(keep=invoicen billedto amtbille billedon);
   set vlib.usainv vlib.forinv;
   where paidon is missing and amtbille>=300000.00;
run;

proc print;
   title 'High Bills--Not Paid';
run;
In the SAS WHERE statement, be sure to use the SAS column names, not the CA-Datacom/DB field names. Both Vlib.UsaInv and Vlib.ForInv are based on the CA-Datacom/DB table Invoice. The following output shows the result of the new temporary data file, Work.NotPaid.
Results of Selecting Data with One WHERE Statement
                              High Bills--Not Paid                             1
           OBS    INVOICEN    BILLEDTO           AMTBILLE    BILLEDON

            1      12102      18543489        11063836.00    17NOV88
            2      11286      43459747        12679156.00    10OCT88
            3      12051      39045213      1340738760.90    02NOV88
            4      12471      39045213      1340738760.90    27DEC88
            5      12476      38763919        34891210.20    24DEC88
The first line of the DATA step uses the KEEP= data set option. This data set option works with SAS/ACCESS views just as it works with other SAS data sets. That is, the KEEP= option specifies that you want only the listed columns included in the new data file, NotPaid, although you can use the other columns within the DATA step.
Notice that the WHERE statement includes two conditions to be met. First, it selects only rows that have a missing value for the field PAIDON. As you can see, it is important to know how the CA-Datacom/DB data is configured before you use this data in a SAS program. The field PAIDON contains values that translate to missing values in SAS. (Also, each of the two view descriptors has its own WHERE clause.)
Second, the WHERE statement requires that the amount in each bill be higher than a certain figure. Again, you should be familiar with the CA-Datacom/DB data so that you can determine a reasonable figure for this expression.
When referencing a view descriptor in a SAS procedure or DATA step, it is more efficient to use a WHERE statement than a subsetting IF statement. A DATA step or SAS procedure passes the SAS WHERE statement as a WHERE clause to the interface view engine, which adds it (using a Boolean AND) to any WHERE clause defined in the view descriptor's selection criteria. The selection criteria are then passed to CA-Datacom/DB for processing. Processing CA-Datacom/DB data using a WHERE clause might reduce the number of records read from the database and therefore often improves performance.
For more information about the SAS WHERE statement, see the SAS Statements: Reference.

Combining Data with the SQL Procedure

This section provides two examples of using the SAS SQL procedure with CA-Datacom/DB data. PROC SQL implements the Structured Query Language (SQL) and is included in Base SAS software. The first example illustrates using PROC SQL to combine data from three sources. The second example shows how to use the PROC SQL GROUP BY clause to create a new column from data that is described by a view descriptor.

Combining Data from Various Sources

The SQL procedure provides a way to select and combine data from one or more database products. For example, suppose you have view descriptors Vlib.CusPhon and Vlib.CusOrdr based on the CA-Datacom/DB tables Customers and Order, respectively, and a SAS data file, MyData.OutOfStk, which contains product names and numbers that are out of stock. You can use the SQL procedure to join all these sources of data to form a single output file. A WHERE statement or a subsetting IF statement would not be appropriate in this case because you want to compare column values from several sources rather than simply merge or concatenate the data.
proc print data=vlib.cusphon;
   title 'Data Described by VLIB.CUSPHON';
run;

proc print data=vlib.cusordr;
   title 'Data Described by VLIB.CUSORDR';
run;

proc print data=mydata.outofstk;
   title 'SAS Data File MYDATA.OUTOFSTK';
run;
The following output shows the results of the PRINT procedure performed on the data that is described by the Vlib.CusPhon and Vlib.CusOrdr view descriptors and on the MyData.OutOfStk SAS data file.
Data That Is Described by the View Descriptor Vlib.CusPhon
                        Data Described by VLIB.CUSPHON                         1

      OBS    CUSTNUM     PHONE

        1    12345678    919/489-5682
        2    14324742    408/629-0589
        3    14569877    919/489-6792
        4    14898029    301/760-2541
        5    15432147    616/582-3906
        6    18543489    512/478-0788
        7    19783482    703/714-2900
        8    19876078    209/686-3953
        9    24589689    (012)736-202
       10    26422096    4268-54-72
       11    26984578    43-57-04
       12    27654351    02/215-37-32
       13    28710427    (021)570517
       14    29834248    (0552)715311
       15    31548901    406/422-3413
       16    38763919    244-6324
       17    39045213    012/302-1021
       18    43290587    (02)933-3212
       19    43459747    03/734-5111
       20    46543295    (03)022-2332
       21    46783280    3762855
       22    48345514    213445

      OBS    NAME

        1    DURHAM SCIENTIFIC SUPPLY COMPANY
        2    SANTA CLARA VALLEY TECHNOLOGY SPECIALISTS
        3    PRECISION PRODUCTS
        4    UNIVERSITY BIOMEDICAL MATERIALS
        5    GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS
        6    LONE STAR STATE RESEARCH SUPPLIERS
        7    TWENTY-FIRST CENTURY MATERIALS
        8    SAN JOAQUIN SCIENTIFIC AND INDUSTRIAL SUPPLY, INC.
        9    CENTAR ZA TECHNICKU I NAUCNU RESTAURIRANJE UMJETNINA
       10    SOCIETE DE RECHERCHES POUR DE CHIRURGIE ORTHOPEDIQUE
       11    INSTITUT FUR TEXTIL-FORSCHUNGS
       12    INSTITUT DE RECHERCHE SCIENTIFIQUE MEDICALE
       13    ANTONIE VAN LEEUWENHOEK VERENIGING VOOR MICROBIOLOGIE
       14    BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY
       15    NATIONAL COUNCIL FOR MATERIALS RESEARCH
       16    INSTITUTO DE BIOLOGIA Y MEDICINA NUCLEAR
       17    LABORATORIO DE PESQUISAS VETERNINARIAS DESIDERIO FINAMOR
       18    HASSEI SAIBO GAKKAI
       19    RESEARCH OUTFITTERS
       20    WESTERN TECHNOLOGICAL SUPPLY
       21    NGEE TECHNOLOGICAL INSTITUTE
       22    GULF SCIENTIFIC SUPPLIES
Data That Is Described by the View Descriptor Vlib.CusOrdr
                   Data Described by VLIB.CUSORDR                        1
                    OBS    STOCKNUM    SHIPTO

                      1      9870      19876078
                      2      1279      39045213
                      3      8934      18543489
                      4      3478      29834248
                      5      2567      19783482
                      6      4789      15432147
                      7      3478      29834248
                      8      1279      14324742
                      9      8934      31548901
                     10      2567      14898029
                     11      9870      48345514
                     12      1279      39045213
                     13      8934      18543489
                     14      2567      19783482
                     15      9870      18543489
                     16      3478      24589689
                     17      1279      38763919
                     18      8934      43459747
                     19      2567      15432147
                     20      9870      14324742
                     21      9870      19876078
                     22      1279      39045213
                     23      8934      18543489
                     24      3478      29834248
                     25      2567      19783482
                     26      4789      15432147
                     27      3478      29834248
                     28      1279      14324742
                     29      8934      31548901
                     30      2567      14898029
                     31      9870      48345514
                     32      1279      39045213
                     33      8934      18543489
                     34      2567      19783482
                     35      9870      18543489
                     36      3478      24589689
                     37      1279      38763919
                     38      8934      43459747
                     39      2567      15432147
                     40      9870      14324742
Data in the SAS Data File MyData.OutOfStk
             SAS Data File MYDATA.OUTOFSTK                         1

              OBS    FIBERNAM    FIBERNUM  


               1      olefin       3478  
               2      gold         8934 
               3      dacron       4789
The following SAS code selects and combines data from these three sources to create a view, SQL.BADORDRS. Note that you might want to store you PROC SQL views in a SAS library other than the one storing you view descriptors, because they both have member type VIEW. This view retrieves customer and product information so that the sales department can notify customers of products that are no longer available.
proc sql;
create view sql.badordrs as
   select cusphon.custnum, cusphon.name, cusphon.phone,
          cusordr.stocknum, outofstk.fibernam as product
      from vlib.cusphon, vlib.cusordr, mydata.outofstk
      where cusordr.stocknum=outofstk.fibernum and
            cusphon.custnum=cusordr.shipto
      order by cusphon.custnum, product;
   title 'Data Described by SQL.BADORDRS';
   select * from sql.badordrs;
The CREATE VIEW statement incorporates a WHERE clause as part of the SELECT statement, but it is not the same as the SAS WHERE statement illustrated earlier in this section. The last SELECT statement retrieves and displays the PROC SQL view, SQL.BADORDRS. To select all fields from the view, an asterisk (*) is used in place of field names. The fields are displayed in the same order as they were specified in the first SELECT clause.
The following output shows the data that is described by the SQL.BADORDRS view. Note that the SQL procedure uses the DBMS labels in the output by default.
Results of Combining DA-Datacom/DB Data
                         Data Described by SQL.BADORDRS                                           1

     CUSTOMER  NAME                                                  TELEPHONE     STOCKNUM  PRODUCT
     -----------------------------------------------------------------------------------------------
     15432147  GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS        616/582-3906      4789  dacron

     15432147  GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS        616/582-3906      4789  dacron

     18543489  LONE STAR STATE RESEARCH SUPPLIERS                    512/478-0788      8934  gold

     18543489  LONE STAR STATE RESEARCH SUPPLIERS                    512/478-0788      8934  gold

     18543489  LONE STAR STATE RESEARCH SUPPLIERS                    512/478-0788      8934  gold

     18543489  LONE STAR STATE RESEARCH SUPPLIERS                    512/478-0788      8934  gold

     24589689  CENTAR ZA TECHNICKU I NAUCNU RESTAURIRANJE UMJETNINA  (012)736-202      3478  olefin

     24589689  CENTAR ZA TECHNICKU I NAUCNU RESTAURIRANJE UMJETNINA  (012)736-202      3478  olefin

     29834248  BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY          (0552)715311      3478  olefin

     29834248  BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY          (0552)715311      3478  olefin

     29834248  BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY          (0552)715311      3478  olefin

     29834248  BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY          (0552)715311      3478  olefin

     31548901  NATIONAL COUNCIL FOR MATERIALS RESEARCH               406/422-3413      8934  gold

     31548901  NATIONAL COUNCIL FOR MATERIALS RESEARCH               406/422-3413      8934  gold

     43459747  RESEARCH OUTFITTERS                                   03/734-5111       8934  gold

     43459747  RESEARCH OUTFITTERS                                   03/734-5111       8934  gold
The view SQL.BADORDRS lists entries for all customers who have ordered out-of-stock products. However, it contains duplicate rows because some companies have ordered the same product more than once. To make the data more readable for the sales department, you can create a final SAS data file, MyData.BadNews, using the SET statement and the special variable FIRST.PRODUCT. This variable identifies the first row in a particular BY group. You need a customer's name associated only once to notify that customer that a product is out of stock, regardless of the number of times the customer has placed an order for it.
data mydata.badnews;
   set sql.badordrs;
   by custnum product;
   if first.product;
run;

proc print;
   title 'MYDATA.BADNEWS Data File';
run;
The data file MyData.BadNews contains a row for each unique combination of customer and out-of-stock product. The following output displays this data file.
Results of Sub-setting Data with the FIRST Variable
                                 MYDATA.BADNEWS Data File                     1
    OBS    CUSTNUM     NAME
     1     15432147    GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS
     2     18543489    LONE STAR STATE RESEARCH SUPPLIERS
     3     24589689    CENTAR ZA TECHNICKU I NAUCNU RESTAURIRANJE UMJETNINA
     4     29834248    BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY
     5     31548901    NATIONAL COUNCIL FOR MATERIALS RESEARCH
     6     43459747    RESEARCH OUTFITTERS

    OBS   PHONE           STOCKNUM    PRODUCT
     1    616/582-3906      4789      dacron
     2    512/478-0788      8934      gold
     3    (012)736-202      3478      olefin
     4    (0552)715311      3478      olefin
     5    406/422-3413      8934      gold
     6    03/734-5111       8934      gold
For more information about the special variable FIRST, see "By Statement" in the SAS Statements: Reference.

Creating New Fields with the PROC SQL GROUP BY Clause

It is often useful to create new fields with summary or aggregate functions, such as AVG or SUM. Although you cannot use the ACCESS procedure to create new fields, you can easily use the SQL procedure with data that is described by a view descriptor to display output containing new fields.
This example uses the SQL procedure to retrieve and manipulate data from the view descriptor Vlib.AllEmp, which is based on the CA-Datacom/DB table Employees. When this query (as a SELECT statement is often called) is submitted, it calculates and displays the average salary for each department. The AVG function is the SQL procedure's equivalent of the SAS MEAN function.
proc sql;
   title 'Average Salary Per Department';
   select distinct dept,
          avg(salary) label='Average Salary' format=dollar12.2
      from vlib.allemp
      where dept is not missing
      group by dept;
The order of the columns displayed matches the order of the columns specified in the SELECT list of the query. The following output shows the query's result.
Results of Creating New Fields with the SQL Procedure
                         Average Salary Per Department                         1

                                           Average
                              DEPT          Salary
                              --------------------
                              ACC013    $54,591.33
                              ACC024    $55,370.55
                              ACC043    $75,000.34
                              CSR004    $17,000.00
                              CSR010    $44,324.19
                              CSR011    $41,966.16
                              SHP002    $40,111.31
                              SHP013    $41,068.44
                              SHP024    $50,000.00
For more information about the SQL procedure, see the SAS SQL Procedure User's Guide.