Previous Page | Next Page

Importing XML Documents Using an XMLMap

Understanding the Required Physical Structure for an XML Document to Be Imported Using the GENERIC Markup Type


What Is the Required Physical Structure?

For an XML document to be successfully imported, the requirements for well-formed XML must translate as follows:

Here is an example of an XML document that illustrates the physical structure that is required:

<?xml version="1.0" encoding="windows-1252" ?> 
<LIBRARY> 1 
   <STUDENTS> 2 
      <ID> 0755 </ID>
      <NAME> Brad Martin </NAME>
      <ADDRESS> 1611 Glengreen </ADDRESS>
      <CITY> Huntsville </CITY>
      <STATE> Texas </STATE>
   </STUDENTS>

   <STUDENTS> 3  
      <ID> 1522 </ID>
      <NAME> Zac Harvell </NAME>
      <ADDRESS> 11900 Glenda </ADDRESS>
      <CITY> Houston </CITY>
      <STATE> Texas </STATE>
   </STUDENTS>
.
.  more instances of <STUDENTS> 
.
</LIBRARY>

When the previous XML document is imported, the following happens:

  1. The XML engine recognizes <LIBRARY> as the root-enclosing element.

  2. The engine goes to the second-level instance tag, which is <STUDENTS>, translates it as the data set name, and begins scanning the elements that are nested (contained) between the <STUDENTS> start tag and the </STUDENTS> end tag, looking for variables.

  3. Because the instance tags <ID>, <NAME>, <ADDRESS>, <CITY>, and <STATE> are contained within the <STUDENTS> start tag and </STUDENTS> end tag, the XML engine interprets them as variables. The individual instance tag names become the data set variable names. The repeating element instances are translated into a collection of rows with a constant set of columns.

These statements result in the following SAS output:

libname test xml 'C:\My Documents\test\students.xml';  

proc print data=test.students;
run;

PROC PRINT of TEST.STUDENTS

                            The SAS System                                         1

     Obs    STATE    CITY          ADDRESS           NAME                 ID

       1    Texas    Huntsville    1611 Glengreen    Brad Martin         755
       2    Texas    Houston       11900 Glenda      Zac Harvell        1522
            .
            .
            .

Why Is a Specific Physical Structure Required?

Well-formed XML is determined by structure, not content. Therefore, while the XML engine can assume that the XML document is valid, well-formed XML, the engine cannot assume that the root element encloses only instances of a single node element, that is, only a single data set. Therefore, the XML engine has to account for the possibility of multiple nodes, that is, multiple SAS data sets.

For example, when the following correctly structured XML document is imported, it is recognized as containing two SAS data sets: HIGHTEMP and LOWTEMP.

<?xml version="1.0" encoding="windows-1252" ?> 
<CLIMATE> 1 
   <HIGHTEMP> 2 
      <PLACE> Libya </PLACE>
      <DATE> 1922-09-13 </DATE>
      <DEGREE-F> 136 </DEGREE-F>
      <DEGREE-C> 58 </DEGREE-C>
   </HIGHTEMP>
.
.  more instances of <HIGHTEMP> 
.
   <LOWTEMP> 3 
      <PLACE> Antarctica </PLACE>
      <DATE> 1983-07-21 </DATE>
      <DEGREE-F> -129 </DEGREE-F>
      <DEGREE-C> -89 </DEGREE-C>
   </LOWTEMP>
.
.  more instances of <LOWTEMP> 
.
</CLIMATE>

When the previous XML document is imported, the following happens:

  1. The XML engine recognizes the first instance tag <CLIMATE> as the root-enclosing element, which is the container for the document.

  2. Starting with the second-level instance tag, which is <HIGHTEMP>, the XML engine uses the repeating element instances as a collection of rows with a constant set of columns.

  3. When the second-level instance tag changes, the XML engine interprets that change as a different SAS data set.

The result is two SAS data sets: HIGHTEMP and LOWTEMP. Both happen to have the same variables, but of course, different data.

To ensure that an import result is what you expect, use the DATASETS procedure. For example, these SAS statements result in the following:

libname climate xml 'C:\My Documents\xml\climate.xml';  

proc datasets library=climate;
quit;

PROC DATASETS Output for CLIMATE Library

                                           Directory

          Libref         CLIMATE
          Engine         XML
          Physical Name  C:\My Documents\xml\climate.xml
          XMLType        GENERIC
          XMLMap         NO XMLMAP IN EFFECT


                                                    Member
                                       #  Name      Type

                                       1  HIGHTEMP  DATA
                                       2  LOWTEMP   DATA

Handling XML Documents That Are Not in the Required Physical Structure

If your XML document is not in the required physical structure, you can tell the XML engine how to interpret the XML markup in order to successfully import the document. See Importing XML Documents Using an XMLMap.

Previous Page | Next Page | Top of Page