XML documents often
contain hierarchical data in that the data is structured into different
levels like a company organization chart. Hierarchical structures
are one-to-many relationships. Top items have one or more items below
it (for example, customer to orders).
This example explains
how to define an XMLMap in order to import an XML document as two
data sets that have related information.
Here is the XML document
Pharmacy.XML. The file contains hierarchical data with related entities
in the form of individual customers and their prescriptions. Each
customer can have one or multiple prescriptions. Notice that PRESCRIPTION
elements are nested within each <PERSON> start tag and </PERSON>
end tag:
<?xml version="1.0" ?>
<PHARMACY>
<PERSON>
<NAME>Brad Martin</NAME>
<STREET>11900 Glenda Court</STREET>
<CITY>Austin</CITY>
<PRESCRIPTION>
<NUMBER>1234</NUMBER>
<DRUG>Tetracycline</DRUG>
</PRESCRIPTION>
<PRESCRIPTION>
<NUMBER>1245</NUMBER>
<DRUG>Lomotil</DRUG>
</PRESCRIPTION>
</PERSON>
<PERSON>
<NAME>Jim Spano</NAME>
<STREET>1611 Glengreen</STREET>
<CITY>Austin</CITY>
<PRESCRIPTION>
<NUMBER>1268</NUMBER>
<DRUG>Nexium</DRUG>
</PRESCRIPTION>
</PERSON>
</PHARMACY>
To import separate data
sets, one describing the customers and the other containing prescription
information, a relation between each customer and associated prescriptions
must be designated in order to know which prescriptions belong to
each customer.
An XMLMap defines how
to translate the XML markup into two SAS data sets. The Person data
set imports the name and address of each customer, and the Prescription
data set imports the customer's name, prescription number, and drug.
Notations in the XMLMap syntax are explained below.
Note: The XMLMap was generated
by using SAS XML Mapper.
<?xml version="1.0" encoding="UTF-8"?>
<!-- ############################################################ -->
<!-- 2011-01-10T14:39:38 -->
<!-- SAS XML Libname Engine Map -->
<!-- Generated by XML Mapper, 903000.1.0.20101208190000_v930 -->
<!-- ############################################################ -->
<!-- ### Validation report ### -->
<!-- ############################################################ -->
<!-- XMLMap validation completed successfully. -->
<!-- ############################################################ -->
<SXLEMAP name="AUTO_GEN" version="2.1"> 1
<NAMESPACES count="0"/>
<!-- ############################################################ -->
<TABLE description="PERSON" name="PERSON"> 2
<TABLE-PATH syntax="XPath">/PHARMACY/PERSON</TABLE-PATH>
<COLUMN name="NAME"> 3
<PATH syntax="XPath">/PHARMACY/PERSON/NAME</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>11</LENGTH>
</COLUMN>
<COLUMN name="STREET"> 3
<PATH syntax="XPath">/PHARMACY/PERSON/STREET</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>18</LENGTH>
</COLUMN>
<COLUMN name="CITY"> 3
<PATH syntax="XPath">/PHARMACY/PERSON/CITY</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>6</LENGTH>
</COLUMN>
</TABLE>
<!-- ############################################################ -->
<TABLE description="PRESCRIPTION" name="PRESCRIPTION"> 4
<TABLE-PATH syntax="XPath">/PHARMACY/PERSON/PRESCRIPTION</TABLE-PATH>
<COLUMN name="NUMBER" retain="YES"> 6
<PATH syntax="XPath">/PHARMACY/PERSON/PRESCRIPTION/NUMBER</PATH>
<TYPE>numeric</TYPE>
<DATATYPE>integer</DATATYPE>
</COLUMN>
<COLUMN name="DRUG"> 6
<PATH syntax="XPath">/PHARMACY/PERSON/PRESCRIPTION/DRUG</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>12</LENGTH>
</COLUMN>
</TABLE>
</SXLEMAP>
1 |
SXLEMAP
is the root-enclosing element for the two SAS data set definitions.
|
2 |
First
TABLE element defines the Person data set.
|
3 |
COLUMN
elements contain the attributes for the Name, Street, and City variables
in the Person data set.
|
4 |
Second
TABLE element defines the Prescription data set.
|
5 |
COLUMN
element contains the attributes for the Name variable in the Prescription
data set. Specifying the retain="YES" attribute
causes the name to be held for each observation until it is replaced
by a different value. (The retain= attribute is like the SAS DATA
step RETAIN statement, which causes a variable to retain its value
from one iteration of the DATA step to the next.)
|
6 |
COLUMN
elements contain the attributes for the Number and Drug variables
in the Prescription data set.
|
The following SAS statements
import the XML document and specify the XMLMap:
filename pharm 'c:\My Documents\Pharmacy.xml';
filename map 'c:\My Documents\Pharmacy.map';
libname pharm xmlv2 xmlmap=map;
quit;
The DATASETS procedure
verifies that SAS interprets the XML document Pharmacy.XML as two
SAS data sets: PHARM.PERSON and PHARM.PRESCRIPTION.
proc datasets library=pharm;
quit;
DATASETS Procedure Output for PHARM Library
Here is the PRINT procedure
output for both of the imported SAS data sets.
PRINT Procedure Output for PHARM.PERSON
PRINT Procedure Output for PHARM.PRESCRIPTION