W3C specifications (section 4.6 Predefined
Entities) state that for character data, certain characters such as
the left angle bracket (<), the ampersand (&), and the apostrophe
(') must be escaped using character references or strings like
<
,
&
,
and
'
. For example, to allow
attribute values to contain both single and double quotation marks,
the apostrophe or single-quotation character (') can be represented
as
'
and the double-quotation
character (") as
"
.
To import an XML document that
contains non-escaped characters, you can specify the LIBNAME statement
option XMLPROCESS=PERMIT in order for the XML engine to accept character
data that does not conform to W3C specifications. That is, non-escaped
characters like the apostrophe, double quotation marks, and the ampersand
are accepted in character data.
Note: Use XMLPROCESS=PERMIT cautiously.
If an XML document consists of non-escaped characters, the content
is not standard XML construction. The option is provided for convenience,
not to encourage invalid XML markup.
This example imports
the following XML document named Permit.XML, which contains non-escaped
character data:
<?xml version="1.0" ?>
<PERMIT>
<CHARS>
<accept>OK</accept>
<status>proper escape sequence</status>
<ampersand>&</ampersand>
<squote>'</squote>
<dquote>"</dquote>
<less><</less>
<greater>></greater>
</CHARS>
<CHARS>
<accept>OK</accept>
<status>unescaped character in CDATA</status>
<ampersand><![CDATA[Abbott & Costello]]></ampersand>
<squote><![CDATA[Logan's Run]]></squote>
<dquote><![CDATA[This is "realworld" stuff]]></dquote>
<less><![CDATA[ e < pi ]]></less>
<greater><![CDATA[ pen > sword ]]></greater>
</CHARS>
<CHARS>
<accept>NO</accept>
<status>single unescaped character</status>
<ampersand>&</ampersand>
<squote>'</squote>
<dquote>"</dquote>
<less></less>
<greater></greater>
</CHARS>
<CHARS>
<accept>NO</accept>
<status>unescaped character in string</status>
<ampersand>Dunn & Bradstreet</ampersand>
<squote>Isn't this silly?</squote>
<dquote>Quoth the raven, "Nevermore!"</dquote>
<less></less>
<greater></greater>
</CHARS>
</PERMIT>
First, using the default
XML engine behavior, which expects XML markup to conform to W3C specifications,
the following SAS program imports only the first two observations,
which contain valid XML markup, and produces errors for the last two
records, which contain non-escaped characters:
libname permit xmlv2 'c:\My Documents\XML\permit.xml';
proc print data=permit.chars;
run;
SAS Log Output
ERROR: There is an illegal character in the entity name.
encountered during XMLInput parsing
occurred at or near line 24, column 22
NOTE: There were 2 observations read from the data set PERMIT.CHARS.
Specifying the LIBNAME
statement option XMLPROCESS=PERMIT enables the XML engine to import
the XML document:
libname permit xmlv2 'c:\My Documents\XML\permit.xml' xmlprocess=permit;
proc print data=permit.chars;
run;
PRINT Procedure Output for PERMIT.CHARS