Publish and Retrieve Encoding Behavior

Default Publish and Retrieve Behavior

All HTML files are published with a file encoding that indicates the character set of the HTML file. This encoding is either automatically generated or user-specified. All published files are read as binary data.
When retrieved, all HTML files are written as binary data. By default, no translation occurs. However, translation does occur when a file encoding is specified in the retrieve CALL routine (such as RETRIEVE_PACKAGE, for example).

Rules for Determining File Encoding

You can specify an encoding on the INSERT_HTML CALL routine to indicate the file's character set. The encoding values of ASCII, EBCDIC_R15, and EBCDIC_R25 are treated as special cases in the following encoding rules. The file encoding that is published with each HTML file is determined by the following rules:
  1. The HTML file is searched for charset= within the META tags. The following rules govern the search:
    • The search covers only the META tags found within the HEAD portion of the document.
    • META tags within comments are ignored.
    • By default, the search uses the encoding of the native session. If a special encoding is specified (ASCII, EBCDIC_RS25, or EBCDIC_RS15), the search uses that encoding rather than the native session encoding.
    • The encoding specified within the META tag always takes precedence over user-specified encodings on the INSERT_HTML CALL routine.
  2. If the encoding value is found within the HTML file, then that value is published as the encoding value.
  3. If the encoding value is not found within the HTML, and if a user-specified encoding value was not provided on the INSERT_HTML CALL routine, then the native session encoding is published as the encoding value.
  4. If the encoding value is not found within the HTML, and if the user-specified encoding is not a special case (not ASCII, EBCDIC_RS25, or EBCDIC_RS15), then the user-specified encoding value is published as the encoding value.
  5. If the encoding value is not found within the HTML file, and if a special encoding value of ASCII was specified, then the following rules apply:
    • If running on an ASCII host at publish time, then an attempt is made to use the current locale information to determine the flavor of ASCII encoding. If the locale information is unavailable, then the native session encoding is used.
    • If running on an EBCDIC host at publish time, then an attempt is made to use the current locale information to determine the transport format. If set, then the transport format is the encoding that is used. If not set, then the default becomes ISO-8859-1.
  6. If the encoding value is not found within the HTML file, and if a special encoding value of EBCDIC_RS15 is specified, then an encoding value of OPEN_ED-1047 is used, regardless of the host operating environment.
  7. If the encoding value is not found within the HTML file, and if a special encoding value of EBCDIC_RS25 is specified, then an encoding value of EBCDIC1047 is used, regardless of the host operating environment.

Specifying an Encoding on the Retrieve

By default, no translation occurs when HTML files are retrieved; the files are written as binary data. To override the default at retrieve time, supply an encoding property. This property indicates that the HTML files should be translated into the specified character set encoding. The encoding that is published with the file is used as the source encoding, and the user-specified encoding is used as the destination encoding.