Character Transcoding and National Language Support

Character transcoding is the process of translating characters from one encoding to another. Transcoding ensures the integrity of your data when it is displayed in a browser. It provides national language support and enables users to display your HTML files that contain characters not included in the ISO Latin 1 encoding.

You may want to implement transcoding if

Your data may not require transcoding, but it may require a specific encoding (or character set) to display correctly. You can specify the character set in your HTML file to ensure that all browsers attempt to use the necessary encoding.

Transcoding support for the HTML Formatters is offered only with Version 1.1 and higher.

How Does Transcoding Work?

Character transcoding takes data that exists in one encoding and makes that data available in another encoding. You can think of this process as moving data from its native state to its displayable state.

All Web browsers have a default setting for document encoding. Users can change this setting to any encoding supported by their browser. You cannot be sure that all users viewing your HTML pages have the appropriate setting; therefore, you may also need to specify the character set (or encoding) in your HTML file.

To make transcoding work for you, determine the native state of your data and the desired displayable state for your output. Then find or create a transcoding list for the two encodings. The transcoding list acts as a translate table between two encodings. It supplies the Numeric Character References (NCR) that should be used for each character.

When a character is transcoded, it is replaced by its Numeric Character Reference (NCR). An NCR has the general format &#nnnnn;, where nnnnn is the NCR decimal value of the character. The formatters include the NCR in the HTML file if the value of nnnnn is greater than 127 with one exception: if nnnnn is 0, the character is transcoded to  , which corresponds to a non-breaking space. If the NCR value is less than or equal to 127, the formatters put the actual character in the HTML file.

To help you understand when and how to use character transcoding, see Examples of Transcoding and National Language Support.

Implementing Character Transcoding

To implement character transcoding, complete the following steps:

  1. Determine which encodings you are transcoding from and to.

    When you choose the encoding for the resulting HTML file, be sure to verify that the users of the page have browsers that support the selected encoding. For example, Unicode is supported only by the latest browsers, which include version 4.x of Microsoft Internet Explorer and Netscape Communicator.

  2. Look for a transcoding list that addresses your needs.

    We provide a variety of transcoding lists that you can use. If we have provided the transcoding list that you need, skip to step 5. If not, continue with the next step.

  3. Create a data set or TRANTAB that contains the necessary transcoding information.

    Example 5 shows how to create the necessary data set when there is not an appropriate transcoding list.

  4. Use the MAKETL macro to create your transcoding list.

  5. Specify the transcoding list.
  6. You may also need to indicate a character set to be used when displaying the HTML page. The character set is specified using the <META> tag in your HTML file. See Specifying the Character Set for more information.

Specifying a Transcoding List

You can perform transcoding when using the formatters in either batch or interactive mode. If you are working in interactive mode, be sure to provide a value for the Transcoding List Name field. You may also want to complete the Character Set Name entry field. If you are using the formatters in batch mode, you use the TRANLIST and CHARSET arguments in your macro call.

tranlist=transcoding-list-name

specifies the name and location of an existing transcoding list. This argument is required only if you are implementing character transcoding. The transcoding list name must be a four-level name, and the fourth level must be SLIST.

The transcoding list can be one of the lists we provided for you, or it can be a transcoding list that you create for your specific needs. You may also need to specify the character set that you want the browser to use when displaying your HTML page.

charset=character-set-name
specifies the character set name that should appear in the <META> tag in your HTML file. HTML Formatting Tools do not perform any error checking on this value. Error handling of bad or unsupported character set names is provided by the user's browser.

Specifying the Character Set

All Web browsers have a default setting for encoding (or character set). The browser uses the specified encoding to render pages that the user requests. If you, as the page creator, want to override the default setting, you can include the <META> tag with a character set designation at the top of your file.

In order to fully support national characters, the formatters give you an easy way to include this tag in your HTML files -- the CHARSET argument. If you provide a character set name in the Character Set Name entry field or by using the CHARSET argument, the meta information is added to the top of your file.

Character set support and names will vary across browsers and even releases of browsers. For this reason, the formatters do not perform any error checking on the value you provide for the character set name. Please check your HTML pages using your target browsers whenever possible.

You might find this list of character set names helpful:

http://www.iana.org/assignments/character-sets.

___________

For more information, see the following topics: