Using the SAS 9.0 XML LIBNAME Engine
This article summarizes the SAS 9.0 enhancements for the XML LIBNAME engine and introduces
the new XML Atlas application.
Contents
Since the introduction of the SAS XML LIBNAME engine (SXLE) in
SAS Release 8.1,
each subsequent SAS release has improved
the importing and exporting capabilities,
providing enhancements and new functionality.
SAS Version 9 continues the trend!
Using SXLE to import and export an XML document
offers these enhancements in SAS Version 9:
-
SXLE imports a broader variety of XML documents.
Introduced as an add on to SAS Release 8.2, the XMLMAP= option,
which enables you to specify a separate XML document called an XMLMap file,
is production in SAS Version 9.
The XMLMap syntax is upgraded from Version 1.0 to Version 1.1.
-
XML Atlas, which is preproduction in Version 9, is a new Java
application to help you create an XMLMap file.
To successfully import an XML document, SXLE requires a specific
XML physical structure so that the engine can identify columns
of data from collections of rows. If your XML document does not import
successfully, you can tell SXLE how to interpret the XML markup
in order to successfully import the XML document. You create a
separate XML document, called an
XMLMap file, that contains specific
XMLMap syntax, which is XML markup. The XMLMap syntax tells SXLE
how to interpret the XML markup into SAS data set(s), variables (columns),
and observations (rows).
For SAS Version 9, the XMLMap syntax is Version 1.1, with several
enhancements that are summarized below.
The XMLMap elements now comply with the
World Wide Consortium (W3C) recommended usage guidelines:
-
XMLMap markup, like XML itself, is case sensitive. The tag names must be
uppercase, and attributes must be lowercase. For example:
<TABLE name="channel">
-
Underscores (_) used in tag names for XMLMap elements
are changed to hyphens (-). For example the XMLMap element
TABLE_DESCRIPTION is now TABLE-DESCRIPTION.
-
Several elements have new attributes.
The SXLEMAP element, which is the primary (root) enclosing element
to contain the definition of the data set(s),
accepts attributes for the syntax version number,
the name of the XMLMap file, and a description.
For example:
<SXLEMAP version="1.1" name="Myxmlmap" description="sample XMLMap">
The version attribute specifies the version of the XMLMap syntax.
SAS Version 9 upgrades XMLMap syntax to Version 1.1.
However, to use the
Version 1.1 syntax, you must specify the attribute
version="1.1". The default is
1.0 and is retained for compatibility with the prior release of XMLMap.
It is recommended that you update existing XMLMap files to Version 1.1.
Tip: To automatically update an XMLMap file
to Version 1.1, load the Version 1.0 XMLMap file into XML Atlas,
then save the file.
The TABLE-PATH element is renamed from TABLE_XPATH.
The element specifies a location path that tells SXLE where in the XML
document to locate and access specific elements in order
to collect variables for the SAS data set. The location path
defines the repeating element instances in the XML document,
which is the SAS data set observation boundary. The observation
boundary is translated into a collection of rows with a constant
set of columns.
TABLE-PATH accepts a
syntax type attribute.
For Version 1.1, the supported syntax is a valid XPath
construction in compliance with the W3C.
For example:
<TABLE-PATH syntax="xpath"> /rss/channel </TABLE-PATH>
CAUTION:
Specifying the table location path, which is the
observation boundary, can be tricky due to
start-tag and end-tag pairing.
The table location path determines which end tag causes SXLE
to write the completed input buffer to the SAS data set.
If you do not identify the appropriate end tag,
the result could be concatenated data instead
of separate observations or an unexpected set of columns.
See Why is specifying the observation boundary tricky?.
The TABLE-END-PATH element is renamed from TABLE_END_XPATH.
It is an optional, optimization element that saves
resources by stopping the processing of the XML document
before the end of file.
By default, processing continues until the last end tag in the
XML document.
If you specify TABLE-END-PATH, the location path tells SXLE
where in the XML document to locate and access a specific element
in order to stop processing the XML document.
Specifying a location to stop processing is useful for
XML documents that are hierarchical, but generally not
appropriate for repeating instance data.
Note that the TABLE-END-PATH element does not affect
the observation boundary; that is
determined with the TABLE-PATH element.
TABLE-END-PATH accepts a
syntax type attribute and an attribute to specify to stop processing
when either the element start tag or element end tag is encountered.
For Version 1.1, the supported syntax is a valid XPath
construction in compliance with the W3C.
For example:
<TABLE-END-PATH syntax="xpath" beginend="Begin">
/rss/channel/item </TABLE-END-PATH>
The COLUMN element has a new attribute
ordinal="NO|YES".
The attribute determines whether the variable is a counter
variable (similar to the _N_ automatic variable in
SAS DATA step processing) that keeps track of the number of times
the location path specified by the INCREMENT-PATH element is encountered.
The counter variable increments its count by 1 each time
the path is matched. Counters can be useful for identifying individual
occurrences of like-named data elements or for counting observations.
The value for the
ordinal= attribute also determines which column location path
to use for collecting the column's values. The default is NO.
- NO
-
determines that the variable is not a counter variable, requires the
PATH element, and does not allow the INCREMENT-PATH and RESET-PATH elements.
- YES
-
determines that the variable is a counter variable, requires the
INCREMENT-PATH with the RESET-PATH element optional, and does not
allow the PATH element.
See Creating a Counter Variable
for an example. NEW - Example Corrected!
The PATH element is renamed from XPATH.
The element specifies a location path that tells SXLE where in the XML
document to locate and access a specific tag for the
current variable, then perform a function as determined
by the location path form (three forms are supported) in order to
retrieve the value for the variable.
PATH accepts a
syntax type attribute.
For Version 1.1, the supported syntax is a valid XPath
construction in compliance with the W3C. For example:
<PATH syntax="xpath"> /rss/channel/title </PATH>
For Version 9, whether PATH is required or not allowed is determined by the
ordinal=
attribute for the COLUMN element: if ordinal="NO",
which is the default,
PATH is required; if ordinal="YES", PATH is not allowed and
the INCREMENT-PATH element is required.
For more information on the PATH element, see
For the XMLMap PATH element, what XPath forms
are supported?.
The INCREMENT-PATH element is a new element that
specifies a location path for a counter variable.
The location path tells SXLE where in
the XML document to increment the accumulated value for the counter
variable by 1.
The element accepts
a syntax type attribute and an attribute that specifies to stop
processing when either the element start tag or end tag is encountered.
For Version 1.1, the supported syntax for the location path
is a valid XPath construction in compliance with the W3C.
For example:
<INCREMENT-PATH syntax="xpath" beginend="Begin">
You establish the counter variable by specifying the COLUMN element attribute
ordinal="YES".
See Column Element.
The RESET-PATH element is a new element that
specifies a location path for a counter variable.
The location path tells SXLE where in
the XML document to reset the accumulated value for the counter
variable to 0. The RESET-PATH element is optional.
The element accepts
a syntax type attribute and an attribute that specifies to stop
processing when either the element start tag or end tag is encountered.
For Version 1.1, the supported syntax for the location path
is a valid XPath construction
in compliance with the W3C.
For example:
<INCREMENT-PATH syntax="xpath" beginend="Begin">
You establish a counter variable by specifying the COLUMN
element attribute ordinal="YES".
See Column Element.
What if you don't want to type the XML markup in order to
create an XMLMap file?
With Version 9, you can use the new
XML Atlas Java application
in order to generate the XMLMap file.
Note: XML Atlas is preproduction for SAS Version 9.
SAS developers really like acronyms, so the name Atlas actually stands for
Assistive Technology for Leveraging Acquisition via SXLE!
XML Atlas assists you in creating and
modifying XMLMap files for use by SXLE.
XML Atlas provides a graphical interface that you use to generate
the appropriate XML markup.
XML Atlas analyzes the structure of an XML document
and generates basic XML markup for the XMLMap file.
The interface consists of windows, a menu bar,
and a tool bar. Using XML Atlas, you can display an XML document, create
and modify an XMLMap file, and generate example SAS programs.
The XML window and the Map window are the two primary windows.
The XML window, which is on the left, displays an XML document
in a tree structure. The Map window, which is on the right, displays
an XMLMap file in a tree structure. The map tree displays three layers:
top level is the map itself, second tier are tables, and the leaf nodes
are columns. The detail area at the top displays information
about the currently selected item, such as attributes for the table or column.
The information is subdivided into tabs.
The small windows on the bottom display generated SAS code, the
XMLMap file, and an XML document.
The menu bar provides pull-down menus in order to request
functionality. For example, select the
File menu, then
Open XML
in order to display a browser so that you can select an XML document to open.
- File menu
-
provides options for opening, saving, and closing
an XML document, an XMLMap file,
and a SAS code file, and an option to close XML Atlas.
- Edit menu
-
provides options for deleting, copying, and pasting items in an XMLMap file
- View menu
-
provides options for opening and closing the three source windows (at
the bottom of the interface) and an option to generate code.
- Window menu
-
provides options for controlling the arrangement of the
three source windows (at the bottom of the interface), that is,
arranging horizontally, vertically, or cascading.
- Help menu
-
provides options for displaying the online help and
XML Atlas version information.
The tool bar contains shortcuts for several items on the menu bar.
For example, the first icon from the left is the
Open XML icon. Select it
in order to display a browser to so that you can select an XML document
to open.
Here's a simple example that walks you through using XML Atlas
in order to create an XMLMap file.
For the following XML document, an XMLMap file is necessary because
the XML does not adhere to the physical structure that SXLE requires.
Without an XMLMap file, SXLE would import a
data set named FORD with columns ROW0, MODEL0, YEAR0, ROW1, MODEL1, YEAR1, and
so on.
(For an explanation as to why an XMLMap file is
needed and more information on creating the
XMLMap file, see
Determining the Observation Boundary
in Order to Avoid Concatenated Data.)
<?xml version="1.0" encoding="windows-1252" ?>
<VEHICLES>
<FORD>
<ROW>
<Model>Mustang</Model>
<Year>1965</Year>
</ROW>
<ROW>
<Model>Explorer</Model>
<Year>1982</Year>
</ROW>
<ROW>
<Model>Taurus</Model>
<Year>1998</Year>
</ROW>
<ROW>
<Model>F150</Model>
<Year>2000</Year>
</ROW>
</FORD>
</VEHICLES>
Display the XML Document in XML Atlas
-
From the menu bar, select File, then Open XML. A browser displays.
-
From the displayed browser, select the XML document.
The XML document displays in the primary XML window as well
as in the XML source window at the bottom.
In addition, XML Atlas begins generating the SAS code
and the XMLMap file.
Tip: The primary XML window displays the document in a tree structure.
Click the + signs in order to open the elements.
Create the XMLMap File
These steps will generate an XMLMap file for the displayed XML document:
-
In the primary Map window, XML Atlas automatically
provides the SXLEMAP item, which
corresponds to the XMLMap syntax SXLEMAP element.
To specify attributes for SXLEMAP, select the item, then use the tabs
at the top of the Map window.
For example, from the Properties tab,
enter a description for the XMLMap file, and
set the XMLMap syntax version to 1.1 with the Validation tab.
-
Create a TABLE element by dragging an item from the XML window and dropping
it on the SXLEMAP element in the Map window.
For example, drag and drop ROW on SXLEMAP.
-
To specify attributes for the TABLE element, select the item, then
use the tabs at the top of the Map window.
For example, from the Properties tab, enter a description.
XML Atlas fills in the XPath location, which corresponds
to the TABLE-PATH element.
-
Create a COLUMN element by dragging an item from the XML window and
dropping it on the desired TABLE element in the Map window.
For example, drag and drop Model on ROW.
Tip: To create the columns, use the condensed display
in the XML window by selecting the Condensed tab.
The condensed display contains metadata that is not available in the full
display. For example, in the full display, the length property, which
is an estimate based on the length of a single instance of data, is
useful for structured fields such as ID numbers, codes, and so on, but
it is not helpful for free-form text. Using the full display tends to
result in clipped text, whereas the condensed display will calculate
the maximum length.
-
To specify attributes for the column, select Model,
then use the tabs.
For example, from the Properties tab,
enter a description. The other attributes are fine.
-
Create another COLUMN element by dragging and dropping Year on ROW. Then
specify attributes for the column.
-
Save the XMLMap file by selecting File from the menu bar,
Save Map As, and specifying a name for the XMLMap file.
Generate XMLMap File and SAS Code
In addition to generating the XMLMap file,
XML Atlas generates basic FILENAME and LIBNAME statements in order to
use an XMLMap file. XML Atlas also generates some sample usage
statements for the DATASETS procedure and the CONTENTS procedure. The
generated SAS code displays in the SAS code window at the bottom.
Tip: So that the generated SAS code includes the location of the
XMLMap file, be sure to save the file first.
-
Generate the XMLMap file and the associated SAS code by
selecting View from
the menu bar, then Update text files.
The XMLMap file displays in the Map text window,
and XML Atlas generates SAS code in the SAS code window.
-
Save the generated SAS code by selecting File from the menu bar,
Save SAS As, then specify a name for the SAS code.
Here is the generated SAS code:
/************************************************************
* Generated by XMLAtlas, v. 9.0.1
************************************************************/
/*
* ENVIRONMENT
*/
filename path 'C:\Documents and Settings\sasdxw\My Documents\XML\path.xml';
filename SXLEMAP 'c:\documents and settings\sasdxw\my documents\xml\test.map';
libname path xml xmlmap=SXLEMAP access=READONLY;
/*
* CATALOG
*/
proc datasets lib=path; run;
/*
* SAMPLE USAGE
*/
proc contents data=path.ROW varnum; run;
proc print data=path.ROW; run;
Submit SAS Code
Read the SAS code into SAS and submit.
Here is the results of the PRINT procedure:
The SAS System 2
Obs Model Year
1 Mustang 1965
2 Explore 1982
3 Taurus 1998
4 F150 2000
XML Atlas is available for installation from your
SAS Installation Kit. XML Atlas is on the
SAS Client-Side Components CD.
After XML Atlas is installed, you should have an icon
available on your desktop for the application. Simply double click the
XML Atlas icon in order to invoke it.
XML Atlas has online help attached. From the menu bar, select Help, then
Help Topics.
When SXLE was introduced in SAS Release 8.1, the documentation for using
the engine was available only through the SAS System Help. Subsequently,
two topics about using SXLE were provided from the Base Communities web
site.
For Version 9, all information about invoking and using SXLE
is available in one document.
There are several ways that you can locate the latest version
of the documentation for SXLE:
-
From SAS, on the menu bar, select Help, then
SAS Help and Documentation. From the Contents
tab, select the following:
-
SAS Products
-
Base SAS
-
SAS Language Elements
-
Statements
-
LIBNAME Statement, XML
-
Rather than use the Contents tab, select the Index tab and
enter
LIBNAME statement, XML.
-
On the stand-alone CD or hardcopy,
see the LIBNAME Statement, XML in the
SAS Language Reference: Dictionary.
-
You can also freely access SAS 9 online documentation on
the Web.
Here's a list of some frequently asked questions...
Is SXLE available on all hosts supported by SAS?
Yes, SXLE is available on
Windows, UNIX, OpenVMS Alpha, and recently OS/390.
Note, however, that for Release 9.0, the preproduction XML Atlas is
available only for Windows.
Is SXLE production?
Yes. For Release 9.0, the preproduction XMLMap facility is production as well.
XML Atlas, though, is preproduction for 9.0.
Is SXLE a DOM or SAX application?
Currently, SXLE uses a SAX (
Simple
API for
XML) model.
SAX does not provide a random access lookup to the document's contents; it scans
the document sequentially and presents each item to the application
only one time.
In contrast, the Document Object Model (DOM) converts the document's
contents into a node tree that can be traversed back and forth via the
programming interface (API).
The observation boundary determines the repeating element
instances in the XML document, which translates into a collection of
rows with a constant set of columns.
You determine the observation boundary
by specifying a table location path that tells SXLE where in the
XML document to locate and access specific elements in order to
collect variables for the SAS data set.
Specifying the table location path can be tricky due
to start-tag and end-tag pairing.
The table location path determines which end tag
causes SXLE to write the completed input buffer to the SAS data set.
If you do not identify the appropriate observation
boundary, the result could be concatenated data
instead of separate observations, or an unexpected set of columns.
For examples, see
Determining the Observation Boundary
in Order to Avoid Concatenated Data and
Determining the Observation Boundary
in Order to Select the Best Columns.
The PATH element specifies a location path that tells SXLE
where in the XML document
to locate and access a specific tag for the current variable,
then perform a function as determined by the location path form
(three forms are supported) in order to retrieve the value for
the variable. The XPath forms that are supported allow
elements and attributes to be individually selected for
inclusion in the generated (rectangular) SAS data set.
To specify the PATH location path, use one of the following forms. These
forms are the only Xpath forms that SXLE supports. If you use any other
valid W3C form, the results will be unpredictable.
- element-form
-
accesses PCDATA (parsable character data) from the named element.
<PATH syntax="xpath"> /rss/channel/title </PATH>
The above example tells SXLE to scan the XML markup
until it finds the specific TITLE element.
SXLE retrieves the value between the <TITLE> start tag and the
</TITLE> end tag.
- attribute-form
-
accesses data from the named attribute (of the form
NAME="value").
<PATH syntax="xpath"> /rss@version </PATH>
The above example tells SXLE to scan the XML markup
until it finds the specific RSS element.
SXLE retrieves the value
from the version= attribute in the RSS element.
- value-form
-
accesses PCDATA from the named element with a specific
attribute value.
<PATH syntax="xpath"> /constant[@name="PI"] </PATH>
If the XML contains the following,
the above example tells SXLE to scan the XML markup
until it finds the specific CONSTANT element
where the value of the name= attribute is PI.
SXLE would retrieve the value 3.14159.
<constant name="PI"> 3.14159 </constant>
Your Turn
The developers, testers, and documenters that bring you SXLE
are very interested in your feedback. You can send electronic mail to
XMLEngine@sas.com with your comments.
Last Updated: 14 May 2004