SAS Institute. The Power to Know

FOCUS AREAS

Return to previous page

Base SAS

XML Engine

The XML engine processes an XML document, which is a file that is both application and machine independent. The engine can

The XML engine supports Version 7 and later features, such as long data set and variable names. For moving SAS data sets across operating environments, the XML engine does not replace the XPORT transport engine; however, the XPORT engine does not support Version 7 and later features.

You can specify the XML engine in the

The syntax for specifying the XML engine in the LIBNAME statement is as follows:

LIBNAME libref XML 'external-file' <XML-engine-options>;


Arguments

libref
a name to be associated with the physical location of the XML document. The libref (library reference) must be a valid SAS name.

external-file
the physical location of the XML document to be generated or imported. Include the complete pathname and the file name. It is suggested that you include the .xml extension on the file name, for example, myfile.xml.


XML Engine Options

CHARSET=character-set
specifies the character set used for display and printing.

Restriction: Use this option when generating an output XML document only.
Restriction: This option is for National Language Support (NLS), which is the ability of a software program to handle more than one language, country, and cultural setting. This option should be used with caution. If you are unfamiliar with character sets, do not use this option withod ut proper technical advice.

ENCODING=encoding
specifies the encoding for converting text data into a numbering system that computers recognize. The encoding can be a single-byte character set or a multibyte character set.

Restriction: Use this option when generating an output XML document only.
Restriction: This option is for NLS and should be used with caution. If you are unfamiliar with encoding methods, do not use this option without proper technical advice.

INDENT=integer
specifies the number of columns to indent each nested element in the generated output. The value can be from 0 (which specifies no indentation) through 32. This is a cosmetic specification, which is ignored by an XML-enabled browser.

Default: 3
Restriction: Use this option when generating an output XML document only.

OIMSTART=nnn
specifies a beginning reference number, which in the generated output will increment sequentially for catalog, schema, table, and column identification. For example:
.
.
.
   <dbm:Catalog oim:id="_1">
      <dbm:CatalogSchemas>
         <dbm:Schema oim:id="_2">
            <dbm:SchemaTables>

               <Table oim:href="#_3">
                  <ColumnSetColumns>
                     <Column oim:href="#_4"> Fred </Column>
.
.
.

Default: 1
Restriction: Use this option when generating an output XML document for the OIMDBM format only.
Tip: Specifying a value other than 1 can be used, for example, to generate XML output for multiple catalogs and to have each output continue sequential numbering rather than restarting with 1 for each catalog.

TAGSET=tagset-name
specifies the name of a tag set in order to override the default tag set that is used by the specified XMLTYPE=. For example, for the GENERIC format type, the default tag set uses the variable name to enclose the contents of a SAS variable (for example, <STUDENT> and </STUDENT>) and the name of the data set to enclose the contents of a SAS observation (for example, <GRADES> and </GRADES>). To change the tags that are produced, create a new tag set definition (with the TEMPLATE procedure), store it in an item store (a SAS file that stores style definitions, and so on), and specify the name of the store/file with the TAGSET= option.

Restriction: Use this option when generating an output XML document only.
Restriction: This option should be used with caution. If you are unfamiliar with XML output formats, do not use this option.

CAUTION:
If you alter the tag set when generating an output XML document and then attempt to import an XML file generated by that altered tag set, the XML engine may not be able to translate the XML markup back to SAS proprietary format.   

TRANTAB=table-name
specifies the name of a translation table for character conversions. The table-name must be the name of a catalog entry in either the SASUSER.PROFILE catalog or the SASHELP.HOST catalog.

Restriction: Use this option when generating an output XML document only.
Restriction: This option is for NLS and should be used with caution. If you are unfamiliar with translation tables, do not use this option without proper technical advice.

XMLDATAFORM=ELEMENT | ATTRIBUTE
specifies whether the tag for the element to contain SAS variable information (name and data) is in open element or enclosed attribute format. For example, if the variable name is PRICE and one observation's value is 1.98, the generated output for ELEMENT is <PRICE> 1.98 </PRICE> and for ATTRIBUTE is <COLUMN name="PRICE" value="1.98" />.

Default: ELEMENT
Restriction: Use this option when generating an output XML document only.

XMLSCHEMA=NONE | NO | IGNORE | FULL | YES
specifies whether to include schema-related information in the generated output markup, or for the OIMDBM format, specifies whether to import schema-related information included in the input XML document.

Schema-related information is metadata that describes the characteristics (rules and presentation) for the XML format syntax. For example, the information specifies which tags can be used, what order they should appear in, which tags can appear inside other tags, which tags have attributes, and so on. Including the schema-related information can be useful when generating an output XML document from a SAS data set to process on an external product.

For examples of schema-related information for the OIMDBM format, see Generating an XML Document Containing a SAS User-Defined Format and Generating an XML Document Containing SAS Dates, Times, and Datetimes. For an example of schema-related information for the HTML format, see Generating an HTML Document.

Default: NONE (NO, IGNORE)
Restriction: Use this option for the OIMDBM and HTML formats only.

XMLTYPE=GENERIC | ORACLE | OIMDBM | EXPORT | HTML
specifies the format type.

Default: GENERIC

GENERIC
a simple, well-formed XML format. The XML document consists of a root (enclosing) element and repeating instance elements as shown in the following XML document:

XML Document for GENERIC Format

<?xml version="1.0" ?>
<TABLE>
   <GRADES>
      <STUDENT> Fred </STUDENT>
      <TEST1> 66 </TEST1>
      <TEST2> 80 </TEST2>
      <FINAL> 70 </FINAL>
   </GRADES>
   <GRADES>
      <STUDENT> Wilma </STUDENT>
      <TEST1> 97 </TEST1>
      <TEST2> 91 </TEST2>
      <FINAL> 98 </FINAL>
   </GRADES>
</TABLE>

Tip: You can control the markup by specifying options such as INDENT=, XMLDATAFORM=, and TAGSET=.

ORACLE
the XML format for the markup standards equivalent to the Oracle8iXML implementation, as shown in the following XML document. The number of columns to indent each nested element is 1, and the enclosing element tag for the contents of the SAS data set is ROWSET.

XML Document for ORACLE Format

<?xml version="1.0" ?>
<ROWSET>
 <ROW>
  <STUDENT> Fred </STUDENT>
  <TEST1> 66 </TEST1>
  <TEST2> 80 </TEST2>
  <FINAL> 70 </FINAL>
 </ROW>
 <ROW>
  <STUDENT> Wilma </STUDENT>
  <TEST1> 97 </TEST1>
  <TEST2> 91 </TEST2>
  <FINAL> 98 </FINAL>
 </ROW>
</ROWSET>

Tip: You can control the markup by specifying options such as INDENT= and XMLDATAFORM=.

OIMDBM
the XML format for the markup standards supported by the Open Information Model (Database Schema Model) proposed by the Metadata Coalition (MDC) as vendor- and technology-independent, conforming to the 1.0 specification. The XML markup contains metadata that is used in operational and data warehousing environments.

XML Document for OIMDBM Format

<?xml version="1.0" ?>
<oim:Transfer xmlns:oim="http://www.mdcinfo.com/oim/oim.dtd"
              xmlns:dbm="http://www.mdcinfo.com/oim/dbm.dtd"
              xmlns:tfm="http://www.mdcinfo.com/oim/tfm.dtd">
<!-- VersionHeader OimVersion="1.0" OimStatus="Draft" -->
<oim:TransferHeader Exporter="SAS Proprietary Software Release 8.2(8.02.02M0D08272000)"
                    ExporterVersion="8.2"
                    TransferDateTime="2000-08-28T11:06:10" />

   <dbm:Catalog oim:id="_1">
      <dbm:CatalogSchemas>
         <dbm:Schema oim:id="_2">
            <dbm:SchemaTables>

               <Table oim:href="#_3">
                  <ColumnSetColumns>
                     <Column oim:href="#_4"> Fred </Column>
                     <Column oim:href="#_5"> 66 </Column>
                     <Column oim:href="#_6"> 80 </Column>
                     <Column oim:href="#_7"> 70 </Column>
                  </ColumnSetColumns>
                  <ColumnSetColumns>
                     <Column oim:href="#_4"> Wilma </Column>
                     <Column oim:href="#_5"> 97 </Column>
                     <Column oim:href="#_6"> 91 </Column>
                     <Column oim:href="#_7"> 98 </Column>
                  </ColumnSetColumns>
               </Table>

            </dbm:SchemaTables>
         </dbm:Schema>
      </dbm:CatalogSchemas>
   </dbm:Catalog>


</oim:Transfer>

Tip: You can control the markup by specifying options such as INDENT=, OIMSTART=, XMLDATAFORM=, and XMLSCHEMA=.

EXPORT
an alias to specify the XML format that is most commonly used in the industry. For Releases 8.1 and 8.2, specifying XMLTYPE=EXPORT is the same as specifying XMLTYPE=OIMDBM. New releases of the XML engine will upgrade this format specification as needed.

HTML
the Hypertext Markup Language format. The XML engine generates HTML table markup, intended to facilitate viewing data in a tabular format. There are no formatting controls beyond the basic table construction directives shown in the following output document:

XML Document for HTML Format

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
   <BODY>
      <TABLE border="1" width="100%">
         <TBODY>
            <TR>
               <TD> Fred </TD>
               <TD> 66 </TD>
               <TD> 80 </TD>
               <TD> 70 </TD>
            </TR>
            <TR>
               <TD> Wilma </TD>
               <TD> 97 </TD>
               <TD> 91 </TD>
               <TD> 98 </TD>
            </TR>
         </TBODY>
      </TABLE>
   </BODY>
</HTML>

Restriction: Do not specify XMLTYPE=HTML to import an external HTML file. Use XMLTYPE=HTML to generate an output document only.
Tip: You can control the markup by specifying options such as INDENT=, XMLDATAFORM=, XMLSCHEMA=, and TAGSET=.


Details


Understanding XML

XML (Extensible Markup Language) provides syntax that structures data by tagging it for content, meaning, or use. Structured information contains both content (for example, words) and an indication of what role the content plays (for example, content in a section heading has a different meaning from content in a database table). XML tells you what the data means, rather than how to display the data. With XML, you define the data structure using generalized markup tags, and you can also define your own customized tags.

XML provides the ability to exchange data between applications or from machine to machine. Data that is tagged with XML markup is independent of hardware and software. That is, different hosts can access the same XML document.

Note that the XML engine does not use a DTD (Document Type Declaration).

Moving SAS Data Sets across Operating Environments

Moving a SAS data set is the process of putting the file in a format in order to move it between incompatible hosts, for example, to move a SAS data set from CMS to Windows. The process consists of the following steps:

  1. Generate an output XML document on the source host. The XML document contains the data and file attributes of one or more SAS data sets in XML markup. To generate an output XML document, use the LIBNAME statement and specify the XML engine, then use either the DATA step or COPY procedure.

  2. Transfer the XML document to the target host. Transferring is the process of moving a file between hosts across a network. Various third-party products are available for performing this operation. For example, you can use FTP (File Transfer Protocol) to transfer a file in the following ways:
    pushing a file from the source host, use the FTP PUT command to copy a file from the source host to the target host. Your ability to push a file may depend on your permission to write to the target host. For details, see your network documentation.
    pulling a file from the target host, use the FTP GET command to copy a file from the source host to the target host.

  3. Translate the XML document to SAS proprietary format on the target host. To translate XML markup to SAS proprietary format, use the LIBNAME statement, specify the XML engine, then use either the DATA step or COPY procedure.

Note:   If the source and target hosts run different versions of SAS, the process automatically converts the file from an earlier SAS version to a later SAS version, for example, from Version 6 to Version 8.  

For more information on moving SAS files, see Moving and Accessing SAS Data Files across Operating Environments. Note that the book discusses the XPORT engine, not the XML engine; however, much of the information is applicable to the XML engine when moving data.


Examples