| Return to previous page
|
If your XML document does not import successfully, rather than transform the XML document, you can now use the enhanced SAS XML LIBNAME engine, which is available as an add on to Release 8.2. The enhanced SAS XML LIBNAME engine provides additional syntax in order to successfully import an XML document. With the new syntax, you tell SAS how to map XML markup into SAS format. This document describes the new XMLMap option and explains how to code the XMLMap file.
XMLMAP=fileref | 'external-file'
The first example specifies XMLMap as an option on the LIBNAME, XML statement :
libname test xml 'C:\XMLdata\my.xml' xmlmap='C:\XMLdata\my.map'; proc print data=test.my; run;
This example uses XMLMap as a data set option and also uses a fileref that is assigned to the XMLMap file:
filename map 'C:\XMLdata\my.map'; libname test xml 'C:\XMLdata\my.xml'; proc print data=test.my (xmlmap=map); run;
TIP: For the LIBNAME, XML statement, you can also use a fileref that is associated with the physical location of the XML document. For example:
filename myxml 'C:\XMLdata\my.xml'; filename map 'C:\XMLdata\my.map'; libname myxml xml xmlmap=map; proc print data=myxml.my; run;
TIP: To display an XMLMap file (with the file name extension .map) with Microsoft Internet Explorer5, follow these steps so that the file can be viewed with Internet Explorer as XML, which provides a minimal validation for the syntax:
* The XML document RSS.XML uses the XML format RSS (Rich Site Summary), which was designed by Netscape originally for exchange of content within the My Netscape Network (MNN) community. The RSS format has been widely adopted for sharing headlines and other Web content and is a good example of XML as a transmission format. For more on RSS, see Introduction to RSS.
The SXLEMAP element can contain one or more TABLE elements. For example:
<SXLEMAP version="1.0">
<TABLE name="test1">
.
.
.
</TABLE>
<TABLE name="test2">
.
.
.
</TABLE>
</SXLEMAP>
<TABLE name="channel">. TABLE has this attribute:
The TABLE element can contain one or more of the following elements that describe the data set attributes: TABLE_XPATH, TABLE_END_XPATH, TABLE_LABEL, and COLUMN.
<TABLE_XPATH> /rss/channel </TABLE_XPATH>
The above example causes SAS to do the following:
NOTE: Whether SAS resets to MISSING is determined by the DEFAULT element as well as the RETAIN= attribute for the COLUMN element.
* A location path tells how to locate and access specific elements in the XML document. Specify a valid Xpath construction in conformance with the World Wide Web Consortium (W3C). Note that Xpath syntax is case sensitive. All paths must begin with the root enclosing element (denoted by a slash '/') or with the "any parent" variant (denoted by double slashes '//'). Other W3C documented forms are not currently supported.
For example, in the XML document RSS.XML, there is only one <CHANNEL> begin tag and one </CHANNEL> end tag. With the TABLE_XPATH location path
<TABLE_XPATH> /rss/channel </TABLE_XPATH>
SAS would process the entire XML document, even though it does not store new data into the input buffer after it encounters the first <ITEM> begin tag, because the remaining elements no longer qualify. The following tells SAS to stop processing when the <ITEM> begin tag is encountered:
<TABLE_END_XPATH BeginEnd="Begin"> /rss/channel/item </TABLE_END_XPATH>
Therefore, with the two location path specifications, SAS would process only the highlighted data in the RSS.XML document for the CHANNEL data set, rather than the entire XML document:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<rss version="0.91">
<channel>
<title>WriteTheWeb</title>
<link>http://writetheweb.com</link>
<description>News for web users that write back
</description>
<language>en-us</language>
<copyright>Copyright 2000, WriteTheWeb team.
</copyright>
<managingEditor>editor@writetheweb.com
</managingEditor>
<webMaster>webmaster@writetheweb.com</webMaster>
<image>
<title>WriteTheWeb</title>
<url>http://writetheweb.com/images/mynetscape88.gif
</url>
<link>http://writetheweb.com</link>
<width>88</width>
<height>31</height>
<description>News for web users that write back
</description>
</image>
<item>
.
.
.
</channel>
</rss>
* A location path tells how to locate and access specific elements in the XML document. Specify a valid Xpath construction in conformance with the World Wide Web Consortium (W3C). Note that Xpath syntax is case sensitive. All paths must begin with the root enclosing element (denoted by a slash '/') or with the "any parent" variant (denoted by double slashes '//'). Other W3C documented forms are not currently supported.
<TABLE_LABEL>Data Set contains TV channel information </TABLE_LABEL>
<COLUMN name="title">
COLUMN has these attributes:
COLUMN can contain one or more of the following elements that describe the variable attributes: DATATYPE, DEFAULT, ENUM, FORMAT, INFORMAT, LABEL, LENGTH, TYPE, and XPATH.
<DATATYPE> string </DATATYPE>
The type of data specification can be:
yyy-mm-ddThh:mm:ss.nnnnnn.
yyy-mm-dd.
hh:mm:ss.nnnnnn.
single
when a missing value occurs:
<DEFAULT> single </DEFAULT>
By using ENUM, values in the XML document are verified against the list of values. If a value is not valid, then it is either set to MISSING (by default) or set to the value specified by the DEFAULT element. Note that a value specified for DEFAULT must be one of the ENUM values in order to be valid. For example:
<COLUMN name="filing_status">
.
.
.
<DEFAULT> single </DEFAULT>
.
.
.
<ENUM>
<VALUE> single </VALUE>
<VALUE> married filing joint return </VALUE>
<VALUE> married filing separate return </VALUE>
<VALUE> head of household </VALUE>
<VALUE> qualifying widow(er) </VALUE>
</ENUM>
</COLUMN>
For example,
<FORMAT> IS8601DA </FORMAT> <FORMAT WIDTH="8"> best </FORMAT> <FORMAT WIDTH="8" NDEC="2"> dollar </FORMAT>
For example,
<INFORMAT> IS8601DA </INFORMAT> <INFORMAT WIDTH="8"> best </INFORMAT> <INFORMAT WIDTH="8" NDEC="2"> dollar </INFORMAT>
<LABEL>Story link</LABEL>
<LENGTH> 200 </LENGTH>
TIP: You can use LENGTH to truncate a long field.
TIP: To assign a floating point type, use <DATATYPE> float </DATATYPE>.
TIP: To apply formating, use the FORMAT element.
<TYPE>numeric</TYPE>
<XPATH> /rss/channel/title </XPATH>
The above example tells SAS to scan the XML markup until it finds the specific TITLE element. SAS retrieves the value between the <TITLE> begin tag and the </TITLE> end tag That is, for the TITLE variable in the CHANNEL data set, SAS would retrieve the highlighted value in the RSS.XML document:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<rss version="0.91">
<channel>
<title>WriteTheWeb</title>
<link>http://writetheweb.com</link>
<description>News for web users that write back
</description>
<language>en-us</language>
<copyright>Copyright 2000, WriteTheWeb team.
</copyright>
<managingEditor>editor@writetheweb.com
</managingEditor>
<webMaster>webmaster@writetheweb.com</webMaster>
<image>
<title>WriteTheWeb</title>
<url>http://writetheweb.com/images/mynetscape88.gif
</url>
<link>http://writetheweb.com</link>
<width>88</width>
<height>31</height>
<description>News for web users that write back
</description>
</image>
<item>
.
.
.
</channel>
</rss>
<XPATH> /rss@version </XPATH>
The above example tells SAS to scan the XML markup until it finds the specific RSS element. SAS retrieves the value from the VERSION attribute in the RSS element. That is, for the VERSION variable in the CHANNEL data set, SAS would retrieve the highlighted value in the RSS.XML document:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<rss version="0.91">
<channel>
<title>WriteTheWeb</title>
<link>http://writetheweb.com</link>
<description>News for web users that write back
</description>
<language>en-us</language>
<copyright>Copyright 2000, WriteTheWeb team.
</copyright>
<managingEditor>editor@writetheweb.com
</managingEditor>
<webMaster>webmaster@writetheweb.com</webMaster>
<image>
<title>WriteTheWeb</title>
<url>http://writetheweb.com/images/mynetscape88.gif
</url>
<link>http://writetheweb.com</link>
<width>88</width>
<height>31</height>
<description>News for web users that write back
</description>
</image>
<item>
.
.
.
</channel>
</rss>
<XPATH> /constant[@name="PI"] </XPATH>
If the XML contains the following,
the above example tells SAS
to scan the XML markup until it finds the specific CONSTANT
element where the value of the NAME= attribute is PI.
SAS would retrieve the value 3.14159.
<constant name="PI">3.14159</constant>
* A location path tells how to locate and access specific elements in the XML document. Specify a valid Xpath construction in conformance with the World Wide Web Consortium (W3C). Note that Xpath syntax is case sensitive. All paths must begin with the root enclosing element (denoted by a slash '/') or with the "any parent" variant (denoted by double slashes '//'). Other W3C documented forms are not currently supported.
Rather than transform RSS.XML using XSL, the XML document can be successfully imported by creating an XMLMap file that tells SAS how to translate the XML markup.
| Map Syntax | Description |
|---|---|
<SXLEMAP version="1.0"> |
Root enclosing element for SAS data set definitions.
|
<TABLE name="CHANNEL"> |
Element for the CHANNEL data set definition.
|
<TABLE_XPATH>/rss/channel</TABLE_XPATH> |
Element specifying the location path that defines
where in the XML document to collect variables for the
CHANNEL data set.
|
<TABLE_END_XPATH BeginEnd="Begin">/rss/channel/item </TABLE_END_XPATH> |
Element specifying the location path that tells SAS
when to stop processing data for the CHANNEL data set.
|
<COLUMN name="title"> <XPATH>/rss/channel/title</XPATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>200</LENGTH> </COLUMN> |
Element containing
the attributes for the TITLE variable in
the CHANNEL data set.
The Xpath construction tells SAS where to find the
current tag and to access data from
the named element.
|
<COLUMN> . . . </COLUMN> |
Subsequent COLUMN elements
define the variables LINK, DESCRIPTION, and LANGUAGE for the
CHANNEL data set.
|
<COLUMN name="version"> <XPATH>/rss@version</XPATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>8</LENGTH> </COLUMN> |
Element containing
the attributes for the last variable in
the CHANNEL data set, which is VERSION.
This Xpath construction tells SAS where to find the
current tag and uses the attribute form to access data from
the named attribute.
|
<TABLE name="ITEMS"> <TABLE_XPATH>/rss/channel/item</TABLE_XPATH> |
When the second-level element changes, SAS interprets
a different SAS data set, which is ITEMS.
|
<COLUMN name="title"> <XPATH>/rss/channel/item/title</XPATH> <TYPE>character</TYPE> <DATATYPE>string</DATATYPE> <LENGTH>200</LENGTH> </COLUMN> |
Element containing the attributes for the TITLE variable in the
ITEMS data set.
|
<COLUMN> . . . </COLUMN> |
Subsequent COLUMN elements define other variables for the ITEMS data set, which are URL and DESCRIPTION. |
filename map 'C:\My Documents\xml\rss.map'; libname rss xml 'C:\My Documents\xml\rss.xml' xmlmap=map;
proc datasets library=rss;
-----Directory-----
Libref: RSS
Engine: XML
Physical Name: C:\My Documents\xml\rss.xml
XMLType: GENERIC
XMLMap: MAP
# Name Memtype
-------------------
1 CHANNEL DATA
2 ITEMS DATA
|
proc contents data=rss.channel; run; proc contents data=rss.items; run;
Here is the PROC CONTENTS output for RSS.CHANNEL and RSS.ITEMS.