Using an XMLMap to Import an XML Document as One SAS Data Set

This example explains how to create and use an XMLMap in order to tell the XML engine how to map XML markup to a SAS data set, variables, and observations.
Here is the XML document NHL.XML to be imported. Although simply constructed and relatively easy for you to read, it does not import successfully because its XML markup is not in the required physical structure:
<?xml version="1.0" encoding="iso-8859-1" ?>
<NHL>
  <CONFERENCE> Eastern
    <DIVISION> Southeast
      <TEAM name="Thrashers"  abbrev="ATL" />
      <TEAM name="Hurricanes" abbrev="CAR" />
      <TEAM name="Panthers"   abbrev="FLA" />
      <TEAM name="Lightning"  abbrev="TB" />
      <TEAM name="Capitals"   abbrev="WSH" />
   </DIVISION>
 </CONFERENCE>

 <CONFERENCE> Western
   <DIVISION> Pacific
     <TEAM name="Stars"   abbrev="DAL" />
     <TEAM name="Kings"   abbrev="LA" />
     <TEAM name="Ducks"   abbrev="ANA" />
     <TEAM name="Coyotes" abbrev="PHX" />
     <TEAM name="Sharks"  abbrev="SJ" />
   </DIVISION>
  </CONFERENCE>
</NHL>
To successfully import the XML document, an XMLMap is needed. After familiarizing yourself with the data to be imported, you can code the XMLMap syntax so that the data is successfully imported. Here is the XMLMap used to import the XML document, with notations for the data investigation:
<?xml version="1.0" ?>
<SXLEMAP version="2.1">
  <TABLE name="TEAMS"> 1
        <TABLE-PATH syntax="XPath"> 2
           /NHL/CONFERENCE/DIVISION/TEAM
         </TABLE-PATH>

        <COLUMN name="NAME"> 3
          <PATH> 5
             /NHL/CONFERENCE/DIVISION/TEAM@name
            </PATH>
            <TYPE>character</TYPE>
            <DATATYPE>STRING</DATATYPE>
            <LENGTH>30</LENGTH>
         </COLUMN>

        <COLUMN name="ABBREV"> 3
          <PATH> 5
           /NHL/CONFERENCE/DIVISION/TEAM/@abbrev
            </PATH>
            <TYPE>character</TYPE>
            <DATATYPE>STRING</DATATYPE>
            <LENGTH>3</LENGTH>
         </COLUMN>

       <COLUMN name="CONFERENCE" retain="YES"> 4
          <PATH>/NHL/CONFERENCE</PATH> 5
            <TYPE>character</TYPE>
            <DATATYPE>STRING</DATATYPE>
            <LENGTH>10</LENGTH>
        </COLUMN>

       <COLUMN name="DIVISION" retain="YES"> 4
           <PATH> 5
              /NHL/CONFERENCE/DIVISION
            </PATH>
            <TYPE>character</TYPE>
            <DATATYPE>STRING</DATATYPE>
            <LENGTH>10</LENGTH>
        </COLUMN>
  </TABLE>
</SXLEMAP>
The previous XMLMap syntax defines how to translate the XML markup as explained below using the following data investigation steps:
Note: The retain= attribute in the column definition forces retention of processed data values after an observation is written to the output data set. Because the foreign key fields occur outside the observation boundary (that is, they are more sparsely populated in the hierarchical XML data than in the SAS observation), their values for additional rows need to be retained as they are encountered.
1 Locate and identify distinct tables of information.
You want a SAS data set (table) that contains some of the teams of the National Hockey League. Because that is the only information contained in the XML document, you can define a single data set named TEAMS in the XMLMap. (Note that other XML documents might contain more than one table of related information. Importing multiple tables is supported by the XMLMap syntax as shown in Using an XMLMap to Import an XML Document as Multiple SAS Data Sets.)
2 Identify the SAS data set observation boundary, which translates into a collection of rows with a constant set of columns.
In the XML document, information about individual teams occurs in a <TEAM> tag located with <CONFERENCE> and <DIVISION> enclosures. You want a new observation generated each time a TEAM element is read.
3 Collect column definitions for each table.
For this XML document, the data content form is mixed. Some data occurs as XML PCDATA (for example, CONFERENCE), and other data is contained in attribute-value pairs (for example, NAME). Data types are all string values. The constructed observation will also include the team NAME and ABBREV. A length of 30 characters is sufficient for the NAME, and three characters is enough for the ABBREV field contents.
4 Add foreign keys or required external context.
You want to include information about the league orientation for the teams. Also, you want to extract CONFERENCE and DIVISION data.
Note: The retain= attribute in the column definition forces retention of processed data values after an observation is written to the output data set. Because the foreign key fields occur outside the observation boundary (that is, they are more sparsely populated in the hierarchical XML data than in the SAS observation), their values for additional rows need to be retained as they are encountered.
5 Define a location path for each variable definition.
The PATH element identifies a position in the XML document from which to extract data for each column. Element-parsed character data is treated differently than attribute values. There is no conditional selection criteria involved.
The following SAS statements import the XML document NHL.XML:
filename NHL 'C:\My Documents\XML\NHL.xml'; 1
filename MAP 'C:\My Documents\XML\NHL.map'; 2

libname NHL xmlv2 xmlmap=MAP; 3

proc print data=NHL.TEAMS; 4
run;
1 The first FILENAME statement assigns the file reference NHL to the physical location (complete pathname, filename, and file extension) of the XML document named NHL.XML.
2 The second FILENAME statement assigns the file reference MAP to the physical location of the XMLMap named NHL.MAP.
3 The LIBNAME statement uses the file reference NHL to reference the XML document. It specifies the XMLV2 engine and uses the file reference MAP to reference the XMLMap.
4 PROC PRINT produces output, verifying that the import was successful.
PRINT Procedure Output for NHL.TEAMS
PRINT Procedure Output for NHL.TEAMS