|Importing XML Documents Using an XMLMap|
First, here is the XML document NHL.XML to be imported. Although simply constructed and relatively easy for you to read, it does not import successfully because its XML markup is not in the required physical structure:
<?xml version="1.0" encoding="iso-8859-1" ?> <NHL> <CONFERENCE> Eastern <DIVISION> Southeast <TEAM name="Thrashers" abbrev="ATL" /> <TEAM name="Hurricanes" abbrev="CAR" /> <TEAM name="Panthers" abbrev="FLA" /> <TEAM name="Lightning" abbrev="TB" /> <TEAM name="Capitals" abbrev="WSH" /> </DIVISION> </CONFERENCE> <CONFERENCE> Western <DIVISION> Pacific <TEAM name="Stars" abbrev="DAL" /> <TEAM name="Kings" abbrev="LA" /> <TEAM name="Ducks" abbrev="ANA" /> <TEAM name="Coyotes" abbrev="PHX" /> <TEAM name="Sharks" abbrev="SJ" /> </DIVISION> </CONFERENCE> </NHL>
To successfully import the XML document, an XMLMap is needed. After familiarizing yourself with the data to be imported, you can code the XMLMap syntax so that the data is successfully imported. Here is the XMLMap used to import the XML document, with notations for the data investigation:
<?xml version="1.0" ?> <SXLEMAP version="1.2"> <TABLE name="TEAMS"> 1 <TABLE-PATH syntax="XPATH"> 2 /NHL/CONFERENCE/DIVISION/TEAM </TABLE-PATH> <COLUMN name="NAME"> 3 <PATH> 5 /NHL/CONFERENCE/DIVISION/TEAM@name </PATH> <TYPE>character</TYPE> <DATATYPE>STRING</DATATYPE> <LENGTH>30</LENGTH> </COLUMN> <COLUMN name="ABBREV"> 3 <PATH> 5 /NHL/CONFERENCE/DIVISION/TEAM/@abbrev </PATH> <TYPE>character</TYPE> <DATATYPE>STRING</DATATYPE> <LENGTH>3</LENGTH> </COLUMN> <COLUMN name="CONFERENCE" retain="YES"> 4 <PATH>/NHL/CONFERENCE</PATH> 5 <TYPE>character</TYPE> <DATATYPE>STRING</DATATYPE> <LENGTH>10</LENGTH> </COLUMN> <COLUMN name="DIVISION" retain="YES"> 4 <PATH> 5 /NHL/CONFERENCE/DIVISION </PATH> <TYPE>character</TYPE> <DATATYPE>STRING</DATATYPE> <LENGTH>10</LENGTH> </COLUMN> </TABLE> </SXLEMAP>
You want a SAS data set (table) that contains some of the teams of the National Hockey League. Because that is the only information contained in the XML document, you can define a single data set named TEAMS in the XMLMap. (Note that other XML documents might contain more than one table of related information. Importing multiple tables is supported by the XMLMap syntax as shown in Using an XMLMap to Import an XML Document as Multiple SAS Data Sets.)
In the XML document, information about individual teams occurs in a <TEAM> tag located with <CONFERENCE> and <DIVISION> enclosures. You want a new observation generated each time a TEAM element is read.
For this XML document, the data content form is mixed. Some data occurs as XML PCDATA (for example, CONFERENCE), and other data is contained in attribute-value pairs (for example, NAME). Data types are all string values. The constructed observation will also include the team NAME and ABBREV. A length of 30 characters is sufficient for the NAME, and three characters is enough for the ABBREV field contents.
Note: The retain= attribute in the column definition forces retention of processed data values after an observation is written to the output data set. Because the foreign key fields occur outside the observation boundary (that is, they are more sparsely populated in the hierarchical XML data than in the SAS observation), their values for additional rows need to be retained as they are encountered.
The PATH element identifies a position in the XML document from which to extract data for each column. Element-parsed character data is treated differently than attribute values. There is no conditional selection criteria involved.
filename NHL 'C:\My Documents\XML\NHL.xml'; filename MAP 'C:\My Documents\XML\NHL.map'; libname NHL xml xmlmap=MAP; proc print data=NHL.TEAMS; run;
The SAS System 1 Obs NAME ABBREV CONFERENCE DIVISION 1 Thrashers ATL Eastern Southeast 2 Hurricanes CAR Eastern Southeast 3 Panthers FLA Eastern Southeast 4 Lightning TB Eastern Southeast 5 Capitals WSH Eastern Southeast 6 Stars DAL Western Pacific 7 Kings LA Western Pacific 8 Ducks ANA Western Pacific 9 Coyotes PHX Western Pacific 10 Sharks SJ Western Pacific