SAS 9.1.3 Integration Technologies » Developer's Guide


Publish Package Interface
Publish/Retrieve Encoding Behavior
Publishing Packages
INSERT_CATALOG
INSERT_DATASET
INSERT_FDB
INSERT_FILE
INSERT_HTML
INSERT_MDDB
INSERT_PACKAGE
INSERT_REF
INSERT_SQLVIEW
INSERT_VIEWER
PACKAGE_BEGIN
PACKAGE_END
PACKAGE_PUBLISH
LDAP Channel Store Syntax
SAS Metadata Repository Channel Store Syntax
Retrieving Packages
COMPANION_NEXT
ENTRY_FIRST
ENTRY_NEXT
PACKAGE_DESTROY
PACKAGE_FIRST
PACKAGE_NEXT
PACKAGE_TERM
RETRIEVE_CATALOG
RETRIEVE_DATASET
RETRIEVE_FDB
RETRIEVE_FILE
RETRIEVE_HTML
RETRIEVE_MDDB
RETRIEVE_NESTED
RETRIEVE_PACKAGE
RETRIEVE_REF
RETRIEVE_SQLVIEW
RETRIEVE_VIEWER
Filtering Packages
Publishing Examples
Publishing in the Data Step
Publishing in a Macro
Publishing with FTP
Publishing Framework

Publish/Retrieve Encoding Behavior

This section covers the following topics:

Default Publish/Retrieve Behavior

All HTML files are published with a file encoding that indicates the character set of the HTML file. This encoding is either automatically generated or user-specified. All published files are read as binary data.

When retrieved, all HTML files are written as binary data. By default, no translation occurs. However, translation does occur when a file encoding is specified in the retrieve CALL routine (such as RETRIEVE_PACKAGE, for example).

User-Specified Encoding in PACKAGE_PUBLISH

You may specify an encoding on the PACKAGE_PUBLISH CALL routine to indicate the file's character set. The encoding values of ASCII, EBCDIC_R15 and EBCDIC_R25 are treated as special cases in the encoding rules below.

Rules for Determining File Encoding

The file encoding that is published with each HTML file is determined by the following rules.

  1. The HTML file is searched for charset= within the META tags. The following rules govern the search:
    • The search covers only the META tags found within the HEAD portion of the document.
    • META tags within comments are ignored.
    • By default, the search uses the encoding of the native session. If a special encoding is specified (ASCII, EBCDIC_RS25 or EBCDIC_RS15), the search uses that encoding rather than the native session encoding.
    • The encoding specified within the META tag always takes precedence over user-specified encodings on the INSERT_HTML CALL routine.
  2. If the encoding value is found within the HTML file, that value is published as the encoding value.
  3. If the encoding value is not found within the HTML, and if a user-specified encoding value was not provided on the INSERT_HTML CALL routine, the native session encoding is published as the encoding value.
  4. If the encoding value is not found within the HTML, and if the user-specified encoding is not a special case (not ASCII, EBCDIC_RS25, or EBCDIC_RS15), then the user-specified encoding value is published as the encoding value.
  5. If the encoding value is not found within the HTML file, and if a special encoding value of ASCII was specified, the following rules apply:
    • If running on an ASCII host at publish time, an attempt is made to use the current locale information to determine the flavor of ASCII encoding. If the locale information is unavailable, the native session encoding is used.
    • If running on an EBCDIC host at publish time, an attempt will be made to use the current locale information to determine the transport format. If set, the transport format is the encoding that is used. If not set, the default becomes ISO-8859-1.
  6. If the encoding value is not found within the HTML file, and if a special encoding value of EBCDIC_RS15 is specified, an encoding value of OPEN_ED-1047 is used, regardless of the host operating environment.
  7. If the encoding value is not found within the HTML file, and if a special encoding value of EBCDIC_RS25 is specified, an encoding value of EBCDIC1047 is used, regardless of the host operating environment.

Specifying an Encoding on the Retrieve

By default, no translation occurs when HTML files are retrieved; the files are written as binary data. To override the default at retrieve time, supply an encoding property. This property indicates that the HTML files should be translated into the specified character set encoding. The encoding that is published with the file is used as the source encoding, and the user-specified encoding is used as the destination encoding.