Editing the Sample Pronunciations Scheme in the Scheme Builder
Note: The sample pronunciations scheme should be edited only by advanced users.
If the optional sample pronunciations scheme is used in a Suggestions node, it must be in the following format:
- The first column of the scheme has a word in lowercase only; any uppercase characters will be forcibly lowercased when the speller is built.
- The second column contains the word's pseudo-phonetic equivalent (according to some system of pronunciation respelling, subject to the restrictions in the following text), which represents how the word is pronounced.
Not all known words must be present in this scheme, but a sufficient number of words should be included to cover any pronunciation nuances of the words likely to be associated with the token. As a general guideline, at least 50 distinct words should be included.
Note that entries in the first or second column need not be unique within the scheme. The same word can be present multiple times with different pronunciations. The same pronunciation can also be present multiple times for different words.
The main challenge with editing the sample pronunciations scheme is in the pseudo-phonetic representation:
- The pseudo-phonetic equivalent of a word can be expressed in any ASCII characters except whitespace. The character used to represent a sound need not bear any relation to the printed character to which it corresponds.
- It is not necessary to conform to any published pronunciation respelling system or transcription standard, as long as the usage of a character is internally consistent within the scheme.
- The pseudo-phonetic equivalent must be the same length as the word itself.
- The - (hyphen) character indicates "no sound" and is used for padding the length.
- Sounds that are normally considered to be made by a combination of adjacent characters are assigned to only one character. The other characters are assigned "no sound".
- Each pseudo-phonetic character can (in principle) stand for any sound, but there are conventions that the speller-builder code relies on to create automatic equivalence classes, so these conventions should either be honored or, if deemed undesirable, worked around.
- The character
&
should never be used. - The following characters form an equivalence class that is intended to represent vowel sounds: {
a
,e
,i
,o,
u
,A
,E
,I
,O
,U
,x
,W
,@
,c
,^
}. - The following characters form an equivalence class that is intended to represent alveolar fricatives: {
s
,z
,!
}. Note that the!
character actually represents an alveolar plosive plus fricative, as in the "z" of "pizza". - The following characters form an equivalence class that is intended to represent dental fricatives: {
T
,D
}. - Glottal fricatives and "no sound" form an equivalence class {
h
,-
}.
- The character
Illustrative Examples
The following examples are taken from a sample English pseudo-phonetics file. For additional examples, see the EN Name Pronunciation scheme in QKB CI 2011A.