IMS Data in SAS Programs |
Methods to Selecting and Combining IMS Data |
A great majority of SAS programs select and combine data from various sources. The method you use depends on the configuration of the data. The next three examples show you how to select and combine data using two different methods: the WHERE statement used in a DATA step and the SQL procedure. When choosing between these methods, you should first read the performance considerations discussed in Advanced User Topics for the SAS/ACCESS Interface View Engine for IMS.
Selecting and Combining Data Using the WHERE Statement |
Suppose you have two view descriptors, VLIB.CHKCRD and VLIB.CHKDEB, that contain information about the checking accounts of customers. The view descriptor VLIB.CHKCRD describes the checking credit data in the CUSTOMER, CHCKACCT, and CHCKCRDT segments, and the view descriptor VLIB.CHKDEB describes the checking debit data in the CUSTOMER, CHCKACCT, and CHCKDEBT segments. You could use the SET statement to concatenate the data in these files and create a SAS data file that contains information about checking account transactions by customer. Since you are accessing the same database more than once, you need to reference the same PSB in both view descriptors, but use different PCB index values, where each value references an ACCTDBD PCB that is sensitive to the segments defined in the view. In this example, VLIB.CHKCRD uses a PCB index value of 2, and VLIB.CHKDEB uses a PCB index value of 3 in the ACCUPSB PSB.
The PROC SORT statement orders the accounts by Social Security number and checking account number.
data chktrans (keep=soc_sec_number check_account_number trantype date amount); length trantype $ 6; format date date9. amount dollar12.2; set vlib.chkcrd(in=crd) vlib.chkdeb(in=dbt); where check_balance>0; if crd then do; trantype='Credit'; date=check_credit_date; amount=check_credit_amount; end; else if dbt then do; trantype='Debit'; date=check_debit_date; amount=check_debit_amount; end run; proc sort; by soc_sec_number check_account_number; run; options nodate linesize=80; proc print data=chktrans; by soc_sec_number; var check_account_number trantype date amount; title2 'Checking Account Transactions by SSN'; run;
In the SAS WHERE statement, be sure to use the IMS item name as the search criteria when VALIDVARNAME=V7 and the SAS variable name when VALIDVARNAME=V6. The following output shows the result of the new temporary SAS data file WORK.CHKTRANS.
Results of Selecting and Combining Data Using the WHERE Statement
Checking Account Transactions by SSN ---------------------- SOC_SEC_NUMBER=156-45-5672 -------------------- CHECK_ACCOUNT_ OBS NUMBER TRANTYPE DATE AMOUNT 1 345620123456 Credit 01APR1991 $100.00 2 345620123456 Debit 28MAR1991 $13.29 3 345620123456 Debit 31MAR1991 $32.87 4 345620123456 Debit 02APR1991 $50.00 5 345620123456 Debit 31MAR1991 $13.42 ---------------------- SOC_SEC_NUMBER=178-42-6534 -------------------- CHECK_ACCOUNT_ OBS NUMBER TRANTYPE DATE AMOUNT 6 745920057114 Credit 12JUN1991 $1,300.00 7 745920057114 Debit 10JUN1991 $25.89 ---------------------- SOC_SEC_NUMBER=234-74-4612 -------------------- CHECK_ACCOUNT_ OBS NUMBER TRANTYPE DATE AMOUNT 8 345620113263 Credit 31MAR1991 $672.32 9 345620113263 Debit . . ---------------------- SOC_SEC_NUMBER=434-62-1224 -------------------- CHECK_ACCOUNT_ OBS NUMBER TRANTYPE DATE AMOUNT 10 345620134564 Credit 22MAR1991 $1,342.42 11 345620134564 Debit 18MAR1991 $432.87 12 345620134564 Debit 18MAR1991 $19.23 13 345620134564 Debit 22MAR1991 $723.23 14 345620134564 Debit 22MAR1991 $82.32 15 345620134564 Debit 26MAR1991 $73.62 16 345620134564 Debit 26MAR1991 $31.23 17 345620134564 Debit 29MAR1990 $162.87 18 345620134564 Debit 29MAR1991 $7.12 19 345620134564 Debit 31MAR1991 $62.34 20 345620134663 Credit 28MAR1991 $120.00 21 345620134663 Debit 28MAR1991 $25.00 ---------------------- SOC_SEC_NUMBER=434-62-1234 -------------------- CHECK_ACCOUNT_ OBS NUMBER TRANTYPE DATE AMOUNT 22 345620104732 Credit 02APR1991 $400.00 23 345620104732 Debit . .
Checking Account Transactions by SSN ---------------------- SOC_SEC_NUMBER=436-42-6394 -------------------- CHECK_ACCOUNT_ OBS NUMBER TRANTYPE DATE AMOUNT 24 345620135872 Credit 02APR1991 $50.00 25 345620135872 Debit 30MAR1990 . ---------------------- SOC_SEC_NUMBER=456-45-3462 -------------------- CHECK_ACCOUNT_ OBS NUMBER TRANTYPE DATE AMOUNT 26 345620134522 Credit 05APR1991 $50.00 27 345620134522 Debit 29MAR1991 $42.73 28 345620134522 Debit 29MAR1991 $172.45 29 345620134522 Debit 30MAR1991 $38.23 30 345620134522 Debit 02APR1991 $10.00 ---------------------- SOC_SEC_NUMBER=657-34-3245 -------------------- CHECK_ACCOUNT_ OBS NUMBER TRANTYPE DATE AMOUNT 31 345620131455 Credit 04APR1991 $230.00 32 345620131455 Debit . . ---------------------- SOC_SEC_NUMBER=667-73-8275 -------------------- CHECK_ACCOUNT_ OBS NUMBER TRANTYPE DATE AMOUNT 33 345620145345 Credit 31MAR1991 $1,563.23 34 345620145345 Debit 19MAR1990 . 35 345620145345 Debit 23MAR1991 $820.00 36 345620145345 Debit 23MAR1991 $52.00 37 345620145345 Debit 28MAR1991 $193.00 38 345620154633 Credit 31MAR1991 $1,563.23 39 345620154633 Debit . . ---------------------- SOC_SEC_NUMBER=667-82-8275 -------------------- CHECK_ACCOUNT_ OBS NUMBER TRANTYPE DATE AMOUNT 40 382957492811 Credit 16APR1991 $100.00 41 382957492811 Debit . .The first line of the DATA step uses the KEEP= data set option. This option works with view descriptors just as it works with other SAS data sets. The KEEP= option specifies that you want only the listed variables included in the new SAS data file WORK.CHKTRANS, although you can use the other variables in the view descriptor within the DATA step. Note that the KEEP= option does not reduce the number of variables mapped by the view descriptor and, therefore, does not reduce the amount of data read by the engine.
When you reference a view descriptor in a SAS procedure or DATA step, it is more efficient to use a SAS WHERE statement than a subsetting IF statement because an IF statement does not reduce the amount of data read. A DATA step or SAS procedure passes the SAS WHERE statement to the interface view engine, which attempts to create SSAs from the WHERE statement. If the engine can create the SSAs, it processes the SAS WHERE statement and returns to SAS only the data that satisfies the WHERE statement. Otherwise, all the data referenced by the view descriptor is returned to SAS for processing. Processing IMS data using a WHERE statement that the IMS engine can turn into SSAs reduces the amount of data read and retrieved by the engine. This improves engine performance significantly. For more information about how IMS handles WHERE statements, see Performance and Efficient View Descriptors.
For more information about the SAS WHERE statement, refer to SAS Language Reference: Dictionary.
Selecting and Combining Data Using the SAS SQL Procedure |
This section provides two examples of using the SAS SQL procedure on IMS data. The SQL procedure implements the Structured Query Language (SQL) in Version 7 and later of SAS. The SQL procedure is a good way to perform SQL operations with IMS, which by itself has no SQL capabilities. The first example illustrates how to use PROC SQL to combine data from three sources. The second example shows how to use the GROUP BY clause to create new items from data that is described by a view descriptor.
Combining Data from Various Sources |
Suppose you have the following items:
a view descriptor, VLIB.CUSTACCT, that is based on the CUSTOMER and CHCKACCT segments of the IMS database ACCTDBD.
a SAS data file, MYDATA.CHGDATA, which contains checking account numbers and checking fees.
a view descriptor, MYDATA.BANKCHRG, that is based on data in a DB2 table that contains additional banking fees. (The MYDATA.BANKCHRG view descriptor has been created using the SAS/ACCESS interface to DB2.)
You can use PROC SQL to create a view that joins all these sources of data. When you use the PROC SQL view in your SAS program, the joined data is presented in a single output table. In this example, using the SAS WHERE or subsetting IF statements would not be an appropriate way of presenting data from various sources because you want to compare variables from several sources rather than simply merge or concatenate the data. For more information about the DB2 table that is used in this example, see Example Data.
In addition, a PCB must be included in that PSB for each database you want to access. If you are accessing the same database multiple times, each view descriptor must specify a different PCB using the PCB index field.
The following code prints the view descriptors and the SAS data file:
options nodate linesize=120; proc print data=vlib.custacct; title2 'Data Described by VLIB.CUSTACCT'; run; options nodate linesize=80; proc print data=mydata.bankchrg; title2 'Data Described by MYDATA.BANKCHRG'; run;
proc print data=mydata.chgdata; title2 'SAS Data File MYDATA.CHGDATA'; run;
The following three outputs show the results of the PRINT procedure performed on the VLIB.CUSTACCT view descriptor (based on IMS data), the MYDATA.BANKCHRG view descriptor (based on DB2 data), and the MYDATA.CHGDATA data file.
Data That is Described by VLIB.CUSTACCT
The SAS System Data Described by VLIB.CUSTACCT SOC_SEC_ CHECK_ACCOUNT_ OBS NUMBER CUSTOMER_NAME NUMBER 1 667-73-8275 WALLS, HOOPER J. 345620145345 2 667-73-8275 WALLS, HOOPER J. 345620154633 3 434-62-1234 SUMMERS, MARY T. 345620104732 4 436-42-6394 BOOKER, APRIL M. 345620135872 5 434-62-1224 SMITH, JAMES MARTIN 345620134564 6 434-62-1224 SMITH, JAMES MARTIN 345620134663 7 178-42-6534 PATTILLO, RODRIGUES 745920057114 8 156-45-5672 O'CONNOR, JOSEPH 345620123456 9 657-34-3245 BARNHARDT, PAMELA S. 345620131455 10 667-82-8275 COHEN, ABRAHAM 382957492811 11 456-45-3462 LITTLE, NANCY M. 345620134522 12 234-74-4612 WIKOWSKI, JONATHAN S. 345620113263
Data That is Described by MYDATA.BANKCHRG
The SAS System Data Described by MYDATA.BANKCHRG OBS ssn accountn chckchrg atmfee loanchrg 1 667-73-8275 345620145345 3.75 5.00 2.00 2 434-62-1234 345620104732 15.00 25.00 552.23 3 436-42-6394 345620135872 1.50 7.50 332.15 4 434-62-1224 345620134564 9.50 0.00 0.00 5 178-42-6534 . 0.50 15.00 223.77 6 156-45-5672 345620123456 0.00 0.00 0.00 7 657-34-3245 345620132455 10.25 10.00 100.00 8 667-82-8275 . 7.50 7.50 175.75 9 456-45-3462 345620134522 23.00 30.00 673.23 10 234-74-4612 345620113262 4.50 7.00 0.00
Data in the SAS Data File MYDATA.CHGDATA
The SAS System SAS Data File MYDATA.CHGDATA OBS account charge 1 345620135872 $10 2 345620134522 $7 3 345620123456 $12 4 382957492811 $3 5 345620134663 $8 6 345620131455 $6 7 345620104732 $9The following SAS statements select and combine data from these three sources to create a PROC SQL view, SQL.CHARGES. The SQL.CHARGES view retrieves checking fee information so that the bank can charge customers for checking services.
options nodate linesize=132; libname sql 'SAS-data-library'; proc sql; create view sql.charges as select distinct custacct.soc_sec_number, custacct.customer_name, custacct.check_account_number, chgdata.charge, bankchrg.chckchrg, bankchrg.atmfee, bankchrg.loanchrg from vlib.custacct, mydata.bankchrg, mydata.chgdata where custacct.soc_sec_number=bankchrg.ssn and custacct.check_account_number=chgdata.account; title2 'Banking Charges for the Month'; select * from sql.charges;
The CREATE statement incorporates a WHERE clause along with the SELECT clause. The last SELECT statement retrieves and displays the PROC SQL view SQL.CHARGES. To select all the items from the view, use an asterisk (*) in place of item names. When an asterisk is used, the order of the items displayed matches the order of the items as specified in the SQL.CHARGES view definition. Notice that PROC SQL prints the output automatically to the display using the IMS item names instead of the SAS variable names. It also executes without a RUN statement when the procedure is submitted. The following output shows the data that is described by the PROC SQL view SQL.CHARGES.
Results of Combining Data From Various Sources
The SAS System Banking Charges for the Month SOC_SEC_ CHECK_ACCOUNT_ NUMBER CUSTOMER_NAME NUMBER charge chckchrg atmfee loanchrg ---------------------------------------------------------------------------------------- 156-45-5672 O'CONNOR, JOSEPH 345620123456 $12 0.00 0.00 0.00 434-62-1224 SMITH, JAMES MARTIN 345620134663 $8 9.50 0.00 0.00 434-62-1234 SUMMERS, MARY T. 345620104732 $9 15.00 25.00 552.23 436-42-6394 BOOKER, APRIL M. 345620135872 $10 1.50 7.50 332.15 456-45-3462 LITTLE, NANCY M. 345620134522 $7 23.00 30.00 673.23 657-34-3245 BARNHARDT, PAMELA S. 345620131455 $6 10.25 10.00 100.00 667-82-8275 COHEN, ABRAHAM 382957492811 $3 7.50 7.50 175.75
Creating New Items with the GROUP BY Clause |
It is often useful to create new items with summary or aggregate functions such as the SUM function. Although you cannot use the ACCESS procedure to create new items, you can easily use the SQL procedure with data that is described by a view descriptor to display output that contains new items.
This example uses PROC SQL to retrieve and manipulate data from the view descriptor VLIB.SAVEBAL, which is based on the CUSTOMER and SAVEACCT segments in the ACCTDBD database. When this query (as a SELECT statement is often called) is submitted, it calculates and displays the average savings account balance for each city.
options nodate linesize=80; proc sql; title2 'Average Savings Balance Per City'; select distinct city, avg(savings_balance) label='Average Balance' format=dollar12.2 from vlib.savebal where city is not missing group by city;
The following output shows the query's result.
Results of Creating New Items with the GROUP BY Clause
The SAS System Average Savings Balance Per City Average CITY Balance ------------------------------------------ CHARLOTTESVILLE $1,673.35 GORDONSVILLE $4,758.26 ORANGE $615.60 RAPIDAN $672.63 RICHMOND $924.62
For more information about the SQL procedure, refer to the SAS SQL Procedure User's Guide.
Copyright © 2007 by SAS Institute Inc., Cary, NC, USA. All rights reserved.