Importing XML Documents Using an XMLMap |
This example imports the XML document Pharmacy.XML, which contains hierarchical data and is used in the example Importing Hierarchical Data as Related Data Sets. This example continues with the XMLMap by adding a key field with generated numeric key values in order to provide a relationship between the two data sets. (A key field holds unique data in order to identify that record from the other records. For example, account number, product code, and customer name are typical key fields.)
To generate key field values, use the class="ORDINAL" attribute in the COLUMN element in order to create a counter variable. A counter variable keeps track of the number of times the location path, which is specified by the INCREMENT-PATH element, is encountered. The counter variable increments its count by 1 each time the location path is matched. (The counter variable is similar to the _N_ automatic variable in DATA step processing in that it counts the number of observations being read into a SAS data set.)
Note: When using a counter variable to create a key field for related data sets, you must specify the same location paths for both TABLE elements. Otherwise, the results will not match. Each table must have the same generated key for like-named data elements.
The following XMLMap imports Pharmacy.XML document as two SAS data sets that have related information and also creates a key field that holds generated numeric key values:
<?xml version="1.0" encoding="UTF-8" ?> <!-- ############################################################ --> <!-- 2003-04-15T10:55:43 --> <!-- SAS XML Libname Engine Map --> <!-- Generated by XML Mapper, 9.1.10.20030413.1400 --> <!-- ############################################################ --> <SXLEMAP version="1.2" name="SXLEMAP"> <!-- ############################################################ --> <TABLE name="PERSON"> <TABLE-PATH syntax="XPATH">/PHARMACY/PERSON</TABLE-PATH> 1 <COLUMN name="KEY" retain="YES" class="ORDINAL"> 2 <INCREMENT-PATH syntax="XPATH">/PHARMACY/PERSON</INCREMENT-PATH> <TYPE>numeric</TYPE> <DATATYPE>integer</DATATYPE> <FORMAT width="3">Z</FORMAT> </COLUMN> <COLUMN name="NAME"> <PATH syntax="XPATH">/PHARMACY/PERSON/NAME</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>11</LENGTH> </COLUMN> <COLUMN name="STREET"> <PATH syntax="XPATH">/PHARMACY/PERSON/STREET</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>18</LENGTH> </COLUMN> <COLUMN name="CITY"> <PATH syntax="XPATH">/PHARMACY/PERSON/CITY</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>6</LENGTH> </COLUMN> </TABLE> <!-- ############################################################ --> <TABLE name="PRESCRIPTION"> <TABLE-PATH syntax="XPATH">/PHARMACY/PERSON/PRESCRIPTION</TABLE-PATH> 3 <COLUMN name="KEY" retain="YES" class="ORDINAL"> 4 <INCREMENT-PATH syntax="XPATH">/PHARMACY/PERSON</INCREMENT-PATH> <TYPE>numeric</TYPE> <DATATYPE>integer</DATATYPE> <FORMAT width="3">Z</FORMAT> </COLUMN> <COLUMN name="NUMBER"> <PATH syntax="XPATH">/PHARMACY/PERSON/PRESCRIPTION/NUMBER</PATH> <TYPE>numeric</TYPE> <DATATYPE>integer</DATATYPE> </COLUMN> <COLUMN name="DRUG"> <PATH syntax="XPATH">/PHARMACY/PERSON/PRESCRIPTION/DRUG</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>12</LENGTH> </COLUMN> </TABLE> </SXLEMAP>
The following explains the XMLMap syntax that generates the key fields:
In the TABLE element that defines the Person data set, the TABLE-PATH element identifies the observation boundary for the data set. The location path generates a new observation each time a PERSON element is read.
For the Person data set, the COLUMN element for the Key variable contains the class="ORDINAL" attribute as well as the INCREMENT-PATH element. The XML engine follows this process in order to generate the key field values for the Person data set:
When the XML engine encounters the <PERSON> start tag, it reads the value into the input buffer, and then increments the value for the Key variable by 1.
The XML engine continues reading values into the input buffer until it encounters the </PERSON> end tag, at which time it writes the completed input buffer to the SAS data set as one observation.
The process is repeated for each <PERSON> start tag (from INCREMENT-PATH) and </PERSON> end tag (from TABLE-PATH) sequence.
In the TABLE element that defines the Prescription data set, the TABLE-PATH element identifies the observation boundary for the data set. The location path generates a new observation each time a PRESCRIPTION element is read.
For the Prescription data set, the COLUMN element for the Key variable contains the class="ORDINAL" attribute as well as the INCREMENT-PATH element.
The XML engine follows this process in order to generate the key field values for the Prescription data set:
When the XML engine encounters the <PERSON> start tag, it reads the value into the input buffer, and then increments the value for the Key variable by 1.
The XML engine continues reading values into the input buffer until it encounters the </PRESCRIPTION> end tag, at which time it writes the completed input buffer to the SAS data set as one observation.
Note: Because the location paths for the counter variables must be the same for both TABLE elements, the behavior of the XML engine for the Prescription table Key variable is the same as the Person table Key variable. While the XML engine tracks the occurrence of a PERSON tag as a key for both counter variables, the observations are derived from different TABLE-PATH locations.
The process is repeated for each <PERSON> start tag (from INCREMENT-PATH) and </PRESCRIPTION> end tag (from TABLE-PATH) sequence.
The following SAS statements import the XML document:
filename pharm 'c:\My Documents\XML\Pharmacy.xml'; filename map 'c:\My Documents\XML\PharmacyOrdinal.map'; libname pharm xml92 xmlmap=map;
Here is PROC PRINT output for both of the imported SAS data sets with a numeric key:
PROC PRINT Output for PHARM.PERSON
The SAS System 1 Obs KEY NAME STREET CITY 1 001 Brad Martin 11900 Glenda Court Austin 2 002 Jim Spano 1611 Glengreen Austin
PROC PRINT Output for PHARM.PRESCRIPTION
The SAS System 2 Obs KEY NUMBER DRUG 1 001 1234 Tetracycline 2 001 1245 Lomotil 3 002 1268 Nexium
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.