Previous Page | Next Page

Importing XML Documents Using an XMLMap

Importing Hierarchical Data as Related Data Sets

XML documents often contain hierarchical data in that the data is structured into different levels like a company organization chart. Hierarchical structures are one-to-many relationships, with top items having one or more items below it (for example, customer to orders).

This example explains how to define an XMLMap in order to import an XML document as two data sets that have related information.

First, here is the XML document Pharmacy.XML. The file contains hierarchical data with related entities in the form of individual customers and their prescriptions. Each customer can have one or multiple prescriptions. Notice that PRESCRIPTION elements are nested within each <PERSON> start tag and </PERSON> end tag:

<?xml version="1.0" ?>
<PHARMACY>
  <PERSON>
    <NAME>Brad Martin</NAME>
    <STREET>11900 Glenda Court</STREET>
    <CITY>Austin</CITY>
   <PRESCRIPTION>
    <NUMBER>1234</NUMBER>
    <DRUG>Tetracycline</DRUG>
  </PRESCRIPTION>
  <PRESCRIPTION>
    <NUMBER>1245</NUMBER>
    <DRUG>Lomotil</DRUG>
  </PRESCRIPTION>
  </PERSON>
 <PERSON>
    <NAME>Jim Spano</NAME>
    <STREET>1611 Glengreen</STREET>
    <CITY>Austin</CITY>
  <PRESCRIPTION>
    <NUMBER>1268</NUMBER>
    <DRUG>Nexium</DRUG>
  </PRESCRIPTION>
 </PERSON>
</PHARMACY>
         

To import separate data sets, one describing the customers and the other containing prescription information, a relation between each customer and associated prescriptions must be designated in order to know which prescriptions belong to each customer.

An XMLMap defines how to translate the XML markup into two SAS data sets. The customer table imports the name and address of each customer, and the prescription table imports the customer's name, prescription number, and drug. Notations in the XMLMap syntax are explained below.

Note:   The XMLMap was generated by using SAS XML Mapper.  [cautionend]

<?xml version="1.0" encoding="UTF-8"?>

<!-- ############################################################ -->
<!-- 2003-04-08T15:03:16 -->
<!-- SAS XML Libname Engine Map -->
<!-- Generated by XML Mapper, 9.1.10.20030407.1378 -->
<!-- ############################################################ -->

<SXLEMAP version="1.2" name="SXLEMAP"> 1 

  <!-- ############################################################ -->
  <TABLE name="PERSON"> 2 
    <TABLE-PATH syntax="XPATH">/PHARMACY/PERSON</TABLE-PATH>

    <COLUMN name="NAME"> 3 
      <PATH  syntax="XPATH">/PHARMACY/PERSON/NAME</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>11</LENGTH>
    </COLUMN>

    <COLUMN name="STREET"> 3 
      <PATH  syntax="XPATH">/PHARMACY/PERSON/STREET</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>18</LENGTH>
    </COLUMN>

    <COLUMN name="CITY"> 3 
      <PATH  syntax="XPATH">/PHARMACY/PERSON/CITY</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>6</LENGTH>
    </COLUMN>

  </TABLE>

  <!-- ############################################################ -->
  <TABLE name="PRESCRIPTION"> 4 
    <TABLE-PATH syntax="XPATH">/PHARMACY/PERSON/PRESCRIPTION</TABLE-PATH>

    <COLUMN name="NAME" retain="YES"> 5 
      <PATH  syntax="XPATH">/PHARMACY/PERSON/NAME</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>11</LENGTH>
    </COLUMN>

    <COLUMN name="NUMBER"> 6 
      <PATH  syntax="XPATH">/PHARMACY/PERSON/PRESCRIPTION/NUMBER</PATH>
      <TYPE>numeric</TYPE>
      <DATATYPE>integer</DATATYPE>
    </COLUMN>

    <COLUMN name="DRUG"> 6 
      <PATH  syntax="XPATH">/PHARMACY/PERSON/PRESCRIPTION/DRUG</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>12</LENGTH>
    </COLUMN>

  </TABLE>

</SXLEMAP>

  1. SXLEMAP is the root-enclosing element for the two SAS data set definitions.

  2. First TABLE element defines the Person data set.

  3. COLUMN elements contain the attributes for the Name, Street, and City variables in the Person data set.

  4. Second TABLE element defines the Prescription data set.

  5. COLUMN element contains the attributes for the Name variable in the Prescription data set. Specifying the retain="YES" attribute causes the name to be held for each observation until it is replaced by a different value. (The retain= attribute is like the SAS DATA step RETAIN statement, which causes a variable to retain its value from one iteration of the DATA step to the next.)

  6. COLUMN elements contain the attributes for the Number and Drug variables in the Prescription data set.

The following SAS statements import the XML document and specify the XMLMap:

filename pharm 'c:\My Documents\XML\Pharmacy.xml';
filename map 'c:\My Documents\XML\Pharmacy.map';
libname pharm xml xmlmap=map;

The DATASETS procedure verifies that SAS interprets the XML document Pharmacy.XML as two SAS data sets: PHARM.PERSON and PHARM.PRESCRIPTION.

proc datasets library=pharm;

PROC DATASETS Output for the PHARM Data Library

5    proc datasets library=pharm;
                                           Directory

                                     Libref         PHARM
                                     Engine         XML
                                     Physical Name  PHARM
                                     XMLType        GENERIC
                                     XMLMap         MAP


                                                      Member
                                     #  Name          Type

                                     1  PERSON        DATA
                                     2  PRESCRIPTION  DATA

Here is PROC PRINT output for both of the imported SAS data sets.

PROC PRINT Output for PHARM.PERSON

                          The SAS System                          1

    Obs    NAME               STREET                CITY      

      1    Brad Martin        11900 Glenda Court    Austin    
      2    Jim Spano          1611 Glengreen        Austin    

PROC PRINT Output for PHARM.PRESCRIPTION

                          The SAS System                          2

            Obs    NAME               NUMBER    DRUG

              1    Brad Martin         1234     Tetracycline
              2    Brad Martin         1245     Lomotil
              3    Jim Spano           1268     Nexium
              

Previous Page | Next Page | Top of Page