Publish/Retrieve Encoding Behavior
This section covers the following topics:
Default Publish/Retrieve Behavior
All HTML files are published with a file encoding that indicates the
character set of the HTML file. This encoding is either automatically
generated or user-specified. All published files are read as binary data.
When retrieved, all HTML files are written as binary data.
By default, no translation occurs. However, translation does occur when a
file encoding is specified in the retrieve CALL routine (such as
RETRIEVE_PACKAGE, for example).
User-Specified Encoding in PACKAGE_PUBLISH
You may specify an encoding on the PACKAGE_PUBLISH CALL
routine to indicate the file's character set. The encoding
values of ASCII, EBCDIC_R15 and EBCDIC_R25 are treated
as special cases in the encoding rules below.
Rules for Determining File Encoding
The file encoding that is published with each HTML file is
determined by the following rules.
-
The HTML file is searched for
charset=
within
the META tags. The following rules govern the search:
- The search covers only the META tags found within the HEAD
portion of the document.
- META tags within comments are ignored.
- By default, the search uses the encoding of the native session. If a
special encoding is specified (ASCII, EBCDIC_RS25 or EBCDIC_RS15), the
search uses that encoding rather than the native session encoding.
- The encoding specified within the META tag always takes
precedence over user-specified encodings on the INSERT_HTML CALL routine.
-
If the encoding value is found within the HTML file, that value is
published as the encoding value.
-
If the encoding value is not found within the HTML, and if a user-specified
encoding value was not provided on the INSERT_HTML CALL routine, the native
session encoding is published as the encoding value.
-
If the encoding value is not found within the HTML, and if the user-specified
encoding is not a special case (not ASCII, EBCDIC_RS25, or EBCDIC_RS15),
then the user-specified encoding value is published as the encoding value.
- If the encoding value is not found within the HTML file, and if a special
encoding value of ASCII was specified, the following rules apply:
- If running on an ASCII host at publish time, an attempt is made to use
the current locale information to determine the flavor of ASCII encoding. If
the locale information is unavailable, the native session encoding is used.
- If running on an EBCDIC host at publish time, an attempt will be made to use
the current locale information to determine the transport format. If set, the
transport format is the encoding that is used. If not set, the default
becomes ISO-8859-1.
- If the encoding value is not found within the HTML file, and if a special
encoding value of EBCDIC_RS15 is specified, an encoding value of OPEN_ED-1047
is used, regardless of the host operating environment.
- If the encoding value is not found within the HTML file, and if a special
encoding value of EBCDIC_RS25 is specified, an encoding value of EBCDIC1047
is used, regardless of the host operating environment.
Specifying an Encoding on the Retrieve
By default, no translation occurs when HTML files are retrieved; the files
are written as binary data. To override the default at retrieve time, supply an
encoding property. This property indicates that the HTML files should
be translated into the specified character set encoding. The encoding that is
published with the file is used as the source encoding, and the user-specified
encoding is used as the destination encoding.