This example imports
an XML document that illustrates how to determine the observation
boundary so that the result is separate observations and not concatenated
data.
The observation boundary
translates into a collection of rows with a constant set of columns.
Using an XMLMap, you determine the observation boundary with the TABLE-PATH
element by specifying a location path. The end tag for the location
path determines when data is written to the SAS data set as an observation.
Identifying the observation
boundary can be tricky due to sequences of start-tag and end-tag pairing.
If you do not identify the appropriate observation boundary, the result
could be a concatenated data string instead of separate observations.
This example illustrates pairing situations that can cause unwanted
results.
For the following XML
document, an XMLMap is necessary to import the file successfully.
Without an XMLMap, the XML engine would import a data set named FORD
with columns ROW0, MODEL0, YEAR0, ROW1, MODEL1, YEAR1, and so on.
<?xml version="1.0" ?>
<VEHICLES>
<FORD>
<ROW>
<Model>Mustang</Model>
<Year>1965</Year>
</ROW>
<ROW>
<Model>Explorer</Model>
<Year>1982</Year>
</ROW>
<ROW>
<Model>Taurus</Model>
<Year>1998</Year>
</ROW>
<ROW>
<Model>F150</Model>
<Year>2000</Year>
</ROW>
</FORD>
</VEHICLES>
Looking at the above
XML document, there are three sequences of element start tags and
end tags: VEHICLES, FORD, and ROW. If you specify the following table
location path and column locations paths, the XML engine processes
the XML document as follows:
<TABLE-PATH syntax="XPath"> /VEHICLES/FORD </TABLE-PATH>
<PATH syntax="XPath"> /VEHICLES/FORD/ROW/Model </PATH>
<PATH syntax="XPath"> /VEHICLES/FORD/ROW/Year </PATH>
-
The XML engine reads
the XML markup until it encounters the <FORD> start tag, because
FORD is the last element specified in the table location path.
-
The XML engine clears
the input buffer and scans subsequent elements for variables based
on the column location paths. As a value for each variable is encountered,
it is read into the input buffer. For example, after reading the first
ROW element, the input buffer contains the values
Mustang
and
1965
.
-
The XML engine continues
reading values into the input buffer until it encounters the </FORD>
end tag, at which time it writes the completed input buffer to the
SAS data set as an observation.
-
The end result is one
observation, which is not what you want.
Here is the PRINT procedure
listing output showing the concatenated observation. (The data in
the observation is truncated due to the LENGTH element.)
PRINT Procedure Output Showing Unacceptable FORD Data Set
The SAS System 1
Model Year
Mustang Explorer Tau 1965
To get separate observations,
you must change the table location path so that the XML engine writes
separate observations to the SAS data set. Here are the correct location
paths and the process that the engine would follow:
<TABLE-PATH syntax="XPath"> /VEHICLES/FORD/ROW </TABLE-PATH>
<PATH syntax="XPath"> /VEHICLES/FORD/ROW/Model </PATH>
<PATH syntax="XPath"> /VEHICLES/FORD/ROW/Year </PATH>
-
The XML engine reads
the XML markup until it encounters the <ROW> start tag, because
ROW is the last element specified in the table location path.
-
The XML engine clears
the input buffer and scans subsequent elements for variables based
on the column location paths. As a value for each variable is encountered,
it is read into the input buffer.
-
The XML engine continues
reading values into the input buffer until it encounters the </ROW>
end tag, at which time it writes the completed input buffer to the
SAS data set as an observation. That is, one observation is written
to the SAS data set that contains the values
Mustang
and
1965
.
-
The process is repeated
for each <ROW> start-tag and </ROW> end-tag sequence.
-
The result is four observations.
Here is the complete
XMLMap syntax:
<?xml version="1.0" ?>
<SXLEMAP version="2.1" name="path" description="XMLMap for path">
<TABLE name="FORD">
<TABLE-PATH syntax="XPath"> /VEHICLES/FORD/ROW </TABLE-PATH>
<COLUMN name="Model">
<DATATYPE> string </DATATYPE>
<LENGTH> 20 </LENGTH>
<TYPE> character </TYPE>
<PATH syntax="XPath"> /VEHICLES/FORD/ROW/Model </PATH>
</COLUMN>
<COLUMN name="Year">
<DATATYPE> string </DATATYPE>
<LENGTH> 4 </LENGTH>
<TYPE> character </TYPE>
<PATH syntax="XPath"> /VEHICLES/FORD/ROW/Year </PATH>
</COLUMN>
</TABLE>
</SXLEMAP>
The following SAS statements
import the XML document and specify the XMLMap. The PRINT procedure
verifies the results.
filename PATH 'c:\My Documents\XML\path.xml';
filename MAP 'c:\My Documents\XML\path.map';
libname PATH xmlv2 xmlmap=MAP;
proc print data=PATH.FORD noobs;
run;
PRINT Procedure Output Showing FORD Data Set