Using an XMLMap to Import an XML Document as Multiple SAS Data Sets

This example explains how to create and use an XMLMap in order to define how to map XML markup into two SAS data sets. The example uses the XML document RSS.XML, which does not import successfully because its XML markup is incorrectly structured for the XML engine to translate successfully.
Note: The XML document RSS.XML uses the XML format RSS (Rich Site Summary), which was designed by Netscape originally for exchange of content within the My Netscape Network (MNN) community. The RSS format has been widely adopted for sharing headlines and other Web content and is a good example of XML as a transmission format.
Here is the XML document RSS.XML to be imported:
  <?xml version="1.0" encoding="ISO-8859-1" ?> 
- <rss version="0.91">
- <channel>
  <title>WriteTheWeb</title> 
  <link>http://writetheweb.com</link> 
  <description>News for web users that write back</description> 
  <language>en-us</language> 
  <copyright>Copyright 2000, WriteTheWeb team.</copyright> 
  <managingEditor>editor@writetheweb.com</managingEditor> 
  <webMaster>webmaster@writetheweb.com</webMaster> 
- <image>
  <title>WriteTheWeb</title> 
  <url>http://writetheweb.com/images/mynetscape88.gif</url> 
  <link>http://writetheweb.com</link> 
  <width>88</width> 
  <height>31</height> 
  <description>News for web users that write back</description> 
  </image>
- <item>
  <title>Giving the world a pluggable Gnutella</title> 
  <link>http://writetheweb.com/read.php?item=24</link> 
  <description>WorldOS is a framework on which to build programs that work like 
     Freenet or Gnutella -allowing distributed applications using peer-to-peer 
     routing.
     </description> 
  </item>
- <item>
  <title>Syndication discussions hot up</title> 
  <link>http://writetheweb.com/read.php?item=23</link> 
  <description>After a period of dormancy, the Syndication mailing list has become 
     active again, with contributions from leaders in traditional media and Web 
     syndication.
     </description> 
  </item>
- <item>
  <title>Personal web server integrates file sharing and messaging</title> 
  <link>http://writetheweb.com/read.php?item=22</link> 
  <description>The Magi Project is an innovative project to create a combined personal 
     web server and messaging system that enables the sharing and synchronization of 
     information across desktop, laptop and palmtop devices.</description> 
  </item>
- <item>
  <title>Syndication and Metadata</title> 
  <link>http://writetheweb.com/read.php?item=21</link> 
  <description>RSS is probably the best known metadata format around. RDF is probably 
     one of the least understood. In this essay, published on my O'Reilly Network 
     weblog, I argue that the next generation of RSS should be based on RDF.
  </description> 
  </item>
- <item>
  <title>UK bloggers get organised</title> 
  <link>http://writetheweb.com/read.php?item=20</link> 
  <description>Looks like the weblogs scene is gathering pace beyond the shores of the 
     US. There's now a UK-specific page on weblogs.com, and a mailing list at egroups.
     </description> 
  </item>
- <item>
  <title>Yournamehere.com more important than anything</title> 
  <link>http://writetheweb.com/read.php?item=19</link> 
  <description>Whatever you're publishing on the web, your site name is the most 
     valuable asset you have, according to Carl Steadman.</description> 
  </item>
  </channel>
  </rss>
The XML document can be successfully imported by creating an XMLMap that defines how to map the XML markup. The following is the XMLMap named RSS.MAP, which contains the syntax that is needed to successfully import RSS.XML. The syntax tells the XML engine how to interpret the XML markup as explained in the subsequent descriptions. The contents of RSS.XML results in two SAS data sets: CHANNEL to contain content information and ITEMS to contain the individual news stories.
<?xml version="1.0" encoding="UTF-8"?>

<SXLEMAP name="SXLEMap" version="2.1"> 1

        <TABLE name="CHANNEL"> 2
        <TABLE-PATH syntax="XPath">/rss/channel</TABLE-PATH> 3
        <TABLE-END-PATH beginend="BEGIN" syntax="XPath">
              /rss/channel/item</TABLE-END-PATH> 4

        <COLUMN name="title"> 5
            <PATH syntax="XPath">/rss/channel/title</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>200</LENGTH>
        </COLUMN>

        <COLUMN name="link"> 6
            <PATH syntax="XPath">/rss/channel/link</PATH>
            <DESCRIPTION>Story link</DESCRIPTION>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>200</LENGTH>
        </COLUMN>

        <COLUMN name="description">
            <PATH syntax="XPath">/rss/channel/description</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>1024</LENGTH>
        </COLUMN>

        <COLUMN name="language">
            <PATH syntax="XPath">/rss/channel/language</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>8</LENGTH>
        </COLUMN>

        <COLUMN name="version"> 7
            <PATH syntax="XPath">/rss@version</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>8</LENGTH>
        </COLUMN>

    </TABLE>

     <TABLE description="Individual news stories" name="ITEMS"> 8
        <TABLE-PATH syntax="XPath">/rss/channel/item</TABLE-PATH>

        <COLUMN name="title"> 9
            <PATH syntax="XPath">/rss/channel/item/title</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>200</LENGTH>
        </COLUMN>

        <COLUMN name="URL"> 10
            <PATH syntax="XPath">/rss/channel/item/link</PATH>
            <DESCRIPTION>Story link</DESCRIPTION>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>200</LENGTH>
        </COLUMN>

        <COLUMN name="description"> 10
            <PATH syntax="XPath">/rss/channel/item/description</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>1024</LENGTH>
        </COLUMN>

    </TABLE>

</SXLEMAP>
The previous XMLMap defines how to translate the XML markup as explained below:
1 Root-enclosing element for SAS data set definitions.
2 Element for the CHANNEL data set definition.
3 Element specifying the location path that defines where in the XML document to collect variables for the CHANNEL data set.
4 Element specifying the location path that specifies when to stop processing data for the CHANNEL data set.
5 Element containing the attributes for the TITLE variable in the CHANNEL data set. The XPath construction specifies where to find the current tag and to access data from the named element.
6 Subsequent COLUMN elements define the variables LINK, DESCRIPTION, and LANGUAGE for the CHANNEL data set.
7 Element containing the attributes for the last variable in the CHANNEL data set, which is VERSION. This XPath construction specifies where to find the current tag and uses the attribute form to access data from the named attribute.
8 Element for the ITEMS data set definition.
9 Element containing the attributes for the TITLE variable in the ITEMS data set.
10 Subsequent COLUMN elements define other variables for the ITEMS data set, which are URL and DESCRIPTION.
The following SAS statements import the XML document RSS.XML and specify the XMLMap named RSS.MAP. The DATASETS procedure then verifies the import results.
filename rss 'C:\My Documents\rss.xml';
filename map 'C:\My Documents\rss.map';

libname rss xmlv2 xmlmap=map access=readonly;

proc datasets library=rss;
run;
quit;
DATASETS Procedure Output for RSS Library Showing Two Data Sets
DATASETS Procedure Output for RSS Library Showing Two Data Sets