| Importing XML Documents |
W3C specifications (section 4.6 Predefined Entities) state that for character data, certain characters such as the left angle bracket (<), the ampersand (&), and the apostrophe (') must be escaped using character references or strings like <, &, and '. For example, to allow attribute values to contain both single and double quotation marks, the apostrophe or single-quotation character (') can be represented as ' and the double-quotation character (") as ".
To import an XML document that contains non-escaped characters, you can specify the LIBNAME statement option XMLPROCESS=RELAX in order for the XML engine to accept character data that does not conform to W3C specifications. That is, non-escaped characters like the apostrophe, double quotation marks, and the ampersand are accepted in character data.
Note: Use XMLPROCESS=RELAX cautiously. If an XML document consists of
non-escaped characters, the content is not standard XML construction. The
option is provided for convenience, not to encourage invalid XML format. ![[cautionend]](../common.hlp/images/cautend.gif)
This example imports the following XML document named Relax.XML, which contains non-escaped character data:
<?xml version="1.0" ?>
<RELAX>
<CHARS>
<accept>OK</accept>
<status>proper escape sequence</status>
<ampersand>&</ampersand>
<squote>'</squote>
<dquote>"</dquote>
<less><</less>
<greater>></greater>
</CHARS>
<CHARS>
<accept>OK</accept>
<status>unescaped character in CDATA</status>
<ampersand><![CDATA[Abbott & Costello] ]></ampersand>
<squote><![CDATA[Logan's Run] ]></squote>
<dquote><![CDATA[This is "realworld" stuff] ]></dquote>
<less><![CDATA[ e <pi ] ]></less>
<greater><![CDATA[ pen > sword ] ]></greater>
</CHARS>
<CHARS>
<accept>NO</accept>
<status>single unescaped character</status>
<ampersand>&</ampersand>
<squote>'</squote>
<dquote>"</dquote>
<!-- purposely left out the less tag here -->
<greater/>
</CHARS>
<CHARS>
<accept>NO</accept>
<status>unescaped character in string</status>
<ampersand>Dunn & Bradstreet</ampersand>
<squote>Isn't this silly?</squote>
<dquote>Quoth the raven, "Nevermore!"</dquote>
<less></less>
<!-- purposely left out the greater tag here -->
</CHARS>
</RELAX>
First, using the default XML engine behavior, which expects XML markup to conform to W3C specifications, the following SAS program imports only the first two observations, which contain valid XML markup, and produces errors for the last two records, which contain non-escaped characters:
libname relax xml 'c:\My Documents\XML\relax.xml'; proc print data=relax.chars; run;
ERROR: There is an illegal character in the entity name.
encountered during XMLInput parsing
occurred at or near line 24, column 22
NOTE: There were 2 observations read from the data set RELAX.CHARS.
Specifying the LIBNAME statement option XMLPROCESS=RELAX enables the XML engine to import the XML document:
libname relax xml 'c:\My Documents\XML\relax.xml' xmlprocess=relax;
proc print data=relax.chars;
run;
The SAS System 1 Obs GREATER LESS DQUOTE SQUOTE 1 > < " ' 2 pen > sword e < pi This is "realworld" stuff Logan's Run 3 " ' 4 Quoth the raven, "Nevermore!" Isn't this silly? Obs AMPERSAND STATUS ACCEPT 1 & proper escape sequence OK 2 Abbott & Costello unescaped character in CDATA OK 3 & single unescaped character NO 4 Dunn & Bradstreet unescaped character in string NO
Copyright © 2007 by SAS Institute Inc., Cary, NC, USA. All rights reserved.