Importing XML Documents Using an XMLMap |
XML documents often contain hierarchical data in that the data is structured into different levels like a company organization chart. Hierarchical structures are one-to-many relationships, with top items having one or more items below it (for example, customer to orders).
This example explains how to define an XMLMap in order to import an XML document as two data sets that have related information.
First, here is the XML document Pharmacy.XML. The file contains hierarchical data with related entities in the form of individual customers and their prescriptions. Each customer can have one or multiple prescriptions. Notice that PRESCRIPTION elements are nested within each <PERSON> start tag and </PERSON> end tag:
<?xml version="1.0" ?> <PHARMACY> <PERSON> <NAME>Brad Martin</NAME> <STREET>11900 Glenda Court</STREET> <CITY>Austin</CITY> <PRESCRIPTION> <NUMBER>1234</NUMBER> <DRUG>Tetracycline</DRUG> </PRESCRIPTION> <PRESCRIPTION> <NUMBER>1245</NUMBER> <DRUG>Lomotil</DRUG> </PRESCRIPTION> </PERSON> <PERSON> <NAME>Jim Spano</NAME> <STREET>1611 Glengreen</STREET> <CITY>Austin</CITY> <PRESCRIPTION> <NUMBER>1268</NUMBER> <DRUG>Nexium</DRUG> </PRESCRIPTION> </PERSON> </PHARMACY>
To import separate data sets, one describing the customers and the other containing prescription information, a relation between each customer and associated prescriptions must be designated in order to know which prescriptions belong to each customer.
An XMLMap defines how to translate the XML markup into two SAS data sets. The customer table imports the name and address of each customer, and the prescription table imports the customer's name, prescription number, and drug. Notations in the XMLMap syntax are explained below.
Note: The XMLMap was generated by using SAS XML Mapper.
<?xml version="1.0" encoding="UTF-8"?> <!-- ############################################################ --> <!-- 2003-04-08T15:03:16 --> <!-- SAS XML Libname Engine Map --> <!-- Generated by XML Mapper, 9.1.10.20030407.1378 --> <!-- ############################################################ --> <SXLEMAP version="1.2" name="SXLEMAP"> 1 <!-- ############################################################ --> <TABLE name="PERSON"> 2 <TABLE-PATH syntax="XPATH">/PHARMACY/PERSON</TABLE-PATH> <COLUMN name="NAME"> 3 <PATH syntax="XPATH">/PHARMACY/PERSON/NAME</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>11</LENGTH> </COLUMN> <COLUMN name="STREET"> 3 <PATH syntax="XPATH">/PHARMACY/PERSON/STREET</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>18</LENGTH> </COLUMN> <COLUMN name="CITY"> 3 <PATH syntax="XPATH">/PHARMACY/PERSON/CITY</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>6</LENGTH> </COLUMN> </TABLE> <!-- ############################################################ --> <TABLE name="PRESCRIPTION"> 4 <TABLE-PATH syntax="XPATH">/PHARMACY/PERSON/PRESCRIPTION</TABLE-PATH> <COLUMN name="NAME" retain="YES"> 5 <PATH syntax="XPATH">/PHARMACY/PERSON/NAME</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>11</LENGTH> </COLUMN> <COLUMN name="NUMBER"> 6 <PATH syntax="XPATH">/PHARMACY/PERSON/PRESCRIPTION/NUMBER</PATH> <TYPE>numeric</TYPE> <DATATYPE>integer</DATATYPE> </COLUMN> <COLUMN name="DRUG"> 6 <PATH syntax="XPATH">/PHARMACY/PERSON/PRESCRIPTION/DRUG</PATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>12</LENGTH> </COLUMN> </TABLE> </SXLEMAP>
SXLEMAP is the root-enclosing element for the two SAS data set definitions.
COLUMN elements contain the attributes for the Name, Street, and City variables in the Person data set.
COLUMN element contains the attributes for the Name variable in the Prescription data set. Specifying the retain="YES" attribute causes the name to be held for each observation until it is replaced by a different value. (The retain= attribute is like the SAS DATA step RETAIN statement, which causes a variable to retain its value from one iteration of the DATA step to the next.)
COLUMN elements contain the attributes for the Number and Drug variables in the Prescription data set.
The following SAS statements import the XML document and specify the XMLMap:
filename pharm 'c:\My Documents\XML\Pharmacy.xml'; filename map 'c:\My Documents\XML\Pharmacy.map'; libname pharm xml xmlmap=map;
The DATASETS procedure verifies that SAS interprets the XML document Pharmacy.XML as two SAS data sets: PHARM.PERSON and PHARM.PRESCRIPTION.
proc datasets library=pharm;
PROC DATASETS Output for the PHARM Data Library
5 proc datasets library=pharm; Directory Libref PHARM Engine XML Physical Name PHARM XMLType GENERIC XMLMap MAP Member # Name Type 1 PERSON DATA 2 PRESCRIPTION DATA
Here is PROC PRINT output for both of the imported SAS data sets.
PROC PRINT Output for PHARM.PERSON
The SAS System 1 Obs NAME STREET CITY 1 Brad Martin 11900 Glenda Court Austin 2 Jim Spano 1611 Glengreen Austin
PROC PRINT Output for PHARM.PRESCRIPTION
The SAS System 2 Obs NAME NUMBER DRUG 1 Brad Martin 1234 Tetracycline 2 Brad Martin 1245 Lomotil 3 Jim Spano 1268 Nexium
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.