Using CA-Datacom/DB Data in SAS Programs |
Using the WHERE Statement or the SQL Procedure |
Many SAS programs select and combine data from various sources. The method you use depends on the configuration of the data. The next examples show you how to select and combine data using two different methods. When choosing between these methods, consider the issues described in Performance Considerations.
Selecting Data with the WHERE Statement |
Suppose you have two view descriptors, VLIB.USINV and VLIB.FORINV, that list the invoices for the USA and foreign countries, respectively. You could use the SET statement to concatenate these files into a single SAS data file. The WHERE statement specifies that you want a data file containing information about customers who have not paid their bills and whose bills amount to at least $300,000.
data notpaid(keep=invoicen billedto amtbille billedon); set vlib.usainv vlib.forinv; where paidon is missing and amtbille>=300000.00; run; proc print; title 'High Bills--Not Paid'; run;
In the SAS WHERE statement, be sure to use the SAS column names, not the CA-Datacom/DB field names. Both VLIB.USAINV and VLIB.FORINV are based on the CA-Datacom/DB table INVOICE. The following output shows the result of the new temporary data file, WORK.NOTPAID.
Results of Selecting Data with one WHERE Statement
High Bills--Not Paid 1 OBS INVOICEN BILLEDTO AMTBILLE BILLEDON 1 12102 18543489 11063836.00 17NOV88 2 11286 43459747 12679156.00 10OCT88 3 12051 39045213 1340738760.90 02NOV88 4 12471 39045213 1340738760.90 27DEC88 5 12476 38763919 34891210.20 24DEC88
The first line of the DATA step uses the KEEP= data set option. This data set option works with SAS/ACCESS views just as it works with other SAS data sets. That is, the KEEP= option specifies that you want only the listed columns included in the new data file, NOTPAID, although you can use the other columns within the DATA step.
Notice that the WHERE statement includes two conditions to be met. First, it selects only rows that have a missing value for the field PAIDON. As you can see, it is important to know how the CA-Datacom/DB data is configured before you use this data in a SAS program. The field PAIDON contains values that translate to missing values in SAS. (Also, each of the two view descriptors has its own WHERE clause.)
Second, the WHERE statement requires that the amount in each bill be higher than a certain figure. Again, you should be familiar with the CA-Datacom/DB data so that you can determine a reasonable figure for this expression.
When referencing a view descriptor in a SAS procedure or DATA step, it is more efficient to use a WHERE statement than a subsetting IF statement. A DATA step or SAS procedure passes the SAS WHERE statement as a WHERE clause to the interface view engine, which adds it (using a Boolean AND) to any WHERE clause defined in the view descriptor's selection criteria. The selection criteria are then passed to CA-Datacom/DB for processing. Processing CA-Datacom/DB data using a WHERE clause might reduce the number of records read from the database and therefore often improves performance.
For more information about the SAS WHERE statement, refer to the SAS Language Reference: Dictionary.
Combining Data with the SQL Procedure |
This section provides two examples of using the SAS SQL procedure with CA-Datacom/DB data. PROC SQL implements the Structured Query Language (SQL) and is included in Base SAS software. The first example illustrates using PROC SQL to combine data from three sources. The second example shows how to use the PROC SQL GROUP BY clause to create a new column from data that is described by a view descriptor.
Combining Data from Various Sources |
The SQL procedure provides a way to select and combine data from one or more database products. For example, suppose you have view descriptors VLIB.CUSPHON and VLIB.CUSORDR based on the CA-Datacom/DB tables CUSTOMERS and ORDER, respectively, and a SAS data file, MYDATA.OUTOFSTK, which contains product names and numbers that are out of stock. You can use the SQL procedure to join all these sources of data to form a single output file. A WHERE statement or a subsetting IF statement would not be appropriate in this case because you want to compare column values from several sources rather than simply merge or concatenate the data.
proc print data=vlib.cusphon; title 'Data Described by VLIB.CUSPHON'; run; proc print data=vlib.cusordr; title 'Data Described by VLIB.CUSORDR'; run; proc print data=mydata.outofstk; title 'SAS Data File MYDATA.OUTOFSTK'; run;
The following output shows the results of the PRINT procedure performed on the data that is described by the VLIB.CUSPHON and VLIB.CUSORDR view descriptors and on the MYDATA.OUTOFSTK SAS data file.
Data that is Described by the View Descriptor VLIB.CUSPHON
Data Described by VLIB.CUSPHON 1 OBS CUSTNUM PHONE 1 12345678 919/489-5682 2 14324742 408/629-0589 3 14569877 919/489-6792 4 14898029 301/760-2541 5 15432147 616/582-3906 6 18543489 512/478-0788 7 19783482 703/714-2900 8 19876078 209/686-3953 9 24589689 (012)736-202 10 26422096 4268-54-72 11 26984578 43-57-04 12 27654351 02/215-37-32 13 28710427 (021)570517 14 29834248 (0552)715311 15 31548901 406/422-3413 16 38763919 244-6324 17 39045213 012/302-1021 18 43290587 (02)933-3212 19 43459747 03/734-5111 20 46543295 (03)022-2332 21 46783280 3762855 22 48345514 213445 OBS NAME 1 DURHAM SCIENTIFIC SUPPLY COMPANY 2 SANTA CLARA VALLEY TECHNOLOGY SPECIALISTS 3 PRECISION PRODUCTS 4 UNIVERSITY BIOMEDICAL MATERIALS 5 GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS 6 LONE STAR STATE RESEARCH SUPPLIERS 7 TWENTY-FIRST CENTURY MATERIALS 8 SAN JOAQUIN SCIENTIFIC AND INDUSTRIAL SUPPLY, INC. 9 CENTAR ZA TECHNICKU I NAUCNU RESTAURIRANJE UMJETNINA 10 SOCIETE DE RECHERCHES POUR DE CHIRURGIE ORTHOPEDIQUE 11 INSTITUT FUR TEXTIL-FORSCHUNGS 12 INSTITUT DE RECHERCHE SCIENTIFIQUE MEDICALE 13 ANTONIE VAN LEEUWENHOEK VERENIGING VOOR MICROBIOLOGIE 14 BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY 15 NATIONAL COUNCIL FOR MATERIALS RESEARCH 16 INSTITUTO DE BIOLOGIA Y MEDICINA NUCLEAR 17 LABORATORIO DE PESQUISAS VETERNINARIAS DESIDERIO FINAMOR 18 HASSEI SAIBO GAKKAI 19 RESEARCH OUTFITTERS 20 WESTERN TECHNOLOGICAL SUPPLY 21 NGEE TECHNOLOGICAL INSTITUTE 22 GULF SCIENTIFIC SUPPLIES
Data that is Described by the View Descriptor VLIB.CUSORDR
Data Described by VLIB.CUSORDR 1 OBS STOCKNUM SHIPTO 1 9870 19876078 2 1279 39045213 3 8934 18543489 4 3478 29834248 5 2567 19783482 6 4789 15432147 7 3478 29834248 8 1279 14324742 9 8934 31548901 10 2567 14898029 11 9870 48345514 12 1279 39045213 13 8934 18543489 14 2567 19783482 15 9870 18543489 16 3478 24589689 17 1279 38763919 18 8934 43459747 19 2567 15432147 20 9870 14324742 21 9870 19876078 22 1279 39045213 23 8934 18543489 24 3478 29834248 25 2567 19783482 26 4789 15432147 27 3478 29834248 28 1279 14324742 29 8934 31548901 30 2567 14898029 31 9870 48345514 32 1279 39045213 33 8934 18543489 34 2567 19783482 35 9870 18543489 36 3478 24589689 37 1279 38763919 38 8934 43459747 39 2567 15432147 40 9870 14324742
Data in the SAS Data File MYDATA.OUTOFSTK
SAS Data File MYDATA.OUTOFSTK 1 OBS FIBERNAM FIBERNUM 1 olefin 3478 2 gold 8934 3 dacron 4789
The following SAS code selects and combines data from these three sources to create a view, SQL.BADORDRS(footnote 1). This view retrieves customer and product information so that the sales department can notify customers of products that are no longer available.
proc sql; create view sql.badordrs as select cusphon.custnum, cusphon.name, cusphon.phone, cusordr.stocknum, outofstk.fibernam as product from vlib.cusphon, vlib.cusordr, mydata.outofstk where cusordr.stocknum=outofstk.fibernum and cusphon.custnum=cusordr.shipto order by cusphon.custnum, product; title 'Data Described by SQL.BADORDRS'; select * from sql.badordrs;
The CREATE VIEW statement incorporates a WHERE clause as part of the SELECT statement, but it is not the same as the SAS WHERE statement illustrated earlier in this section. The last SELECT statement retrieves and displays the PROC SQL view, SQL.BADORDRS. To select all fields from the view, an asterisk (*) is used in place of field names. The fields are displayed in the same order as they were specified in the first SELECT clause.
The following output shows the data that is described by the SQL.BADORDRS view. Note that the SQL procedure uses the DBMS labels in the output by default.
Results of Combining DA-Datacom/DB Data
Data Described by SQL.BADORDRS 1 CUSTOMER NAME TELEPHONE STOCKNUM PRODUCT ---------------------------------------------------------------------- 15432147 GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS 616/582-3906 4789 dacron 15432147 GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS 616/582-3906 4789 dacron 18543489 LONE STAR STATE RESEARCH SUPPLIERS 512/478-0788 8934 gold 18543489 LONE STAR STATE RESEARCH SUPPLIERS 512/478-0788 8934 gold 18543489 LONE STAR STATE RESEARCH SUPPLIERS 512/478-0788 8934 gold 18543489 LONE STAR STATE RESEARCH SUPPLIERS 512/478-0788 8934 gold 24589689 CENTAR ZA TECHNICKU I NAUCNU RESTAURIRANJE UMJETNINA (012)736-202 3478 olefin 24589689 CENTAR ZA TECHNICKU I NAUCNU RESTAURIRANJE UMJETNINA (012)736-202 3478 olefin 29834248 BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY (0552)715311 3478 olefin 29834248 BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY (0552)715311 3478 olefin 29834248 BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY (0552)715311 3478 olefin 29834248 BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY (0552)715311 3478 olefin 31548901 NATIONAL COUNCIL FOR MATERIALS RESEARCH 406/422-3413 8934 gold 31548901 NATIONAL COUNCIL FOR MATERIALS RESEARCH 406/422-3413 8934 gold 43459747 RESEARCH OUTFITTERS 03/734-5111 8934 gold 43459747 RESEARCH OUTFITTERS 03/734-5111 8934 gold
The view SQL.BADORDRS lists entries for all customers who have ordered out-of-stock products. However, it contains duplicate rows because some companies have ordered the same product more than once. To make the data more readable for the sales department, you can create a final SAS data file, MYDATA.BADNEWS, using the SET statement and the special variable FIRST.PRODUCT. This variable identifies the first row in a particular BY group. You need a customer's name associated only once to notify that customer that a product is out of stock, regardless of the number of times the customer has placed an order for it.
data mydata.badnews; set sql.badordrs; by custnum product; if first.product; run; proc print; title 'MYDATA.BADNEWS Data File'; run;
The data file MYDATA.BADNEWS contains a row for each unique combination of customer and out-of-stock product. The following output displays this data file.
Results of Subsetting Data with the FIRST Variable
MYDATA.BADNEWS Data File 1 OBS CUSTNUM NAME 1 15432147 GREAT LAKES LABORATORY EQUIPMENT MANUFACTURERS 2 18543489 LONE STAR STATE RESEARCH SUPPLIERS 3 24589689 CENTAR ZA TECHNICKU I NAUCNU RESTAURIRANJE UMJETNINA 4 29834248 BRITISH MEDICAL RESEARCH AND SURGICAL SUPPLY 5 31548901 NATIONAL COUNCIL FOR MATERIALS RESEARCH 6 43459747 RESEARCH OUTFITTERS OBS PHONE STOCKNUM PRODUCT 1 616/582-3906 4789 dacron 2 512/478-0788 8934 gold 3 (012)736-202 3478 olefin 4 (0552)715311 3478 olefin 5 406/422-3413 8934 gold 6 03/734-5111 8934 gold
For more information about the special variable FIRST, see "BY Statement" in the SAS Language Reference: Dictionary.
Creating New Fields with the PROC SQL GROUP BY Clause |
It is often useful to create new fields with summary or aggregate functions, such as AVG or SUM. Although you cannot use the ACCESS procedure to create new fields, you can easily use the SQL procedure with data that is described by a view descriptor to display output containing new fields.
This example uses the SQL procedure to retrieve and manipulate data from the view descriptor VLIB.ALLEMP, which is based on the CA-Datacom/DB table EMPLOYEES. When this query (as a SELECT statement is often called) is submitted, it calculates and displays the average salary for each department. The AVG function is the SQL procedure's equivalent of the SAS MEAN function.
proc sql; title 'Average Salary Per Department'; select distinct dept, avg(salary) label='Average Salary' format=dollar12.2 from vlib.allemp where dept is not missing group by dept;
The order of the columns displayed matches the order of the columns specified in the SELECT list of the query. The following output shows the query's result.
Results of Creating New Fields with the SQL Procedure
Average Salary Per Department 1 Average DEPT Salary -------------------- ACC013 $54,591.33 ACC024 $55,370.55 ACC043 $75,000.34 CSR004 $17,000.00 CSR010 $44,324.19 CSR011 $41,966.16 SHP002 $40,111.31 SHP013 $41,068.44 SHP024 $50,000.00
For more information about the SQL procedure, refer to the Base SAS Procedures Guide.
FOOTNOTE 1: You might want to store your PROC SQL views in a SAS library other than the one storing your view descriptors, because they both have member type view.
Copyright © 2007 by SAS Institute Inc., Cary, NC, USA. All rights reserved.