DataFlux Data Management Studio 2.6: User Guide

Vocabulary Editor

The Vocabulary Editor allows you to build a vocabulary. When the parsing system needs to categorize a word, it can then easily search a single Vocabulary rather than multiple text files. We recommend you build one Vocabulary per parse definition. Before using the Customize Vocabulary Editor, you should define all of your basic categories and implement them in a Grammar, create your parse definitions, and create a text file for each basic category.

Each word in the Vocabulary is defined as belonging to one or more categories, which are defined in an associated Grammar. Each word is also assigned a likelihood, which is a score indicating a level of confidence (i.e. VERY HIGH, HIGH, MEDIUM, LOW, VERY LOW) that a word belongs to a certain category. For example, the next display shows the likelihood associated with each category for the word "Kim," a word in the EN Gender Analysis Vocabulary.

In the example above, the word "Kim" has two categories: FGW (Female Given Name Word) and MGW (Male Given Name Word). Both of these categories have a likelihood of MEDIUM. In some contexts, both the category and likelihood values are used to make a determination about a word, such as the gender associated with the word. In other contexts, only one of these values might be used to make a determination.

Within the Vocabulary Editor, you must specify which input text sources you want to combine to develop a Vocabulary, and indicate which category's data each library represents. Note that a Vocabulary is stored in a proprietary format. To help ensure Vocabulary integrity, you should not attempt to create or edit a Vocabulary directly, but rather through the Vocabulary Editor.

Building Vocabularies

You can use the Vocabulary Editor to build Vocabularies. We recommend you build one Vocabulary for each parse definition.

  1. Open the Vocabulary Editor. On the Customize dialog, choose Tools > Vocabulary Editor.
  2. Create a New Vocabulary. On the Vocabulary Editor dialog, choose File > New, and then specify a locale in the Select Locale(s) dialog.
  3. Import Categories From a Grammar. You import categories and likelihoods so that you can associate those values with the words that you add to your vocabulary. Each word must be associated with one or more categories from the Grammar.



    To import categories, choose Options > Categories. The Categories dialog appears. Click Import to display the Select Grammar dialog. Select the Grammar that you want to associate with your new Vocabulary, and then click OK. On the Categories screen, the Grammar's standard category abbreviations and descriptions appear. (Derived categories are not imported.) Use the Delete button to delete any unwanted categories, and then click Close.
  4. Import Words Into the Vocabulary. After you import categories, you import the words that fit those categories. Choose File > Import to display the Import Words dialog. Use the Type field to import words from a text file, a vocabulary, or a scheme. Click Select QKB to Import From to import a vocabulary or scheme from a Quality Knowledge Base other than your current QKB.



    To add categories to imported words in the Import Words dialog, click Add under the heading Add these categories. For each category that you select, you can select Overwrite likelihood. Choosing this option adds your specified likelihood in place of a different value that may already exist for that word and category in your new vocabulary.



    NoteNote: The remainder of this step applies only to the import of words from Vocabularies, when you select Vocabulary for the Type field.



    When you import words from a Vocabulary, then you can also filter the words that will be imported. Click Filter Word List to display the Import Filter dialog. Use the Import Filter dialog to import only those words that have been assigned all of your selected categories.



    In the Import Words dialog, to import categories as well as words from a vocabulary file, select the check box Use categories from imported vocabulary words. If an imported word already exists in your new Vocabulary, then any new categories are added to the existing word.



    If an imported word already exists, and if both words share categories, and if likelihood values differ, you need to decide how to resolve the likelihood conflict. In the Import Words dialog, select a button under the heading In case of likelihood conflict during merge. You can choose to import all likelihood values (overwrite local likelihood), keep your existing likelihood values (refuse imported likelihood), or receive a prompt to decide each conflict individually.
  5. Build the Vocabulary. After you specify how you want to import words and categories from a particular file, click Import in the Import Words dialog. The new words appear in the Vocabulary Editor dialog. To add more words from another file, select another file in the Import Words dialog. You can also add words individually by choosing Edit > Add Word.



    When the import is complete, click Close to return to the Vocabulary Editor dialog.
  6. Modify Categories in the Vocabulary. Now that you are looking at the imported words and categories in your new Vocabulary, you can add or delete individual categories, or change likelihood values.

    Here is an example of when you might want to update a likelihood value. If your vocabulary contains the name Scott, then that word might have the categories Family Name Word (FNW) and Given Name Word (GNW). You might determine that Scott is more likely to be a Given Name Word than a Family Name Word. You could then increase the likelihood value of the Given Name category for the word Scott.

    To change a likelihood value, select the word and click Edit.

    NoteNote: When you add a category to one or more selected words, you receive the Overwrite Category dialog when that word and category are present in one or more words in your Vocabulary. To resolve likelihood conflicts, click Yes to accept the change in likelihood for the current category. Click Yes to All to accept all remaining likelihood changes. Click No to refuse all overwrites and keep your existing likelihood values. Click Cancel to not add the category.

    Although there may be some adjustments that you want to make to the likelihoods at this point, later testing with the Parse Test Tool will probably reveal other necessary adjustments to give the desired result.
  7. Save the Vocabulary. Now that your Vocabulary is built, you need to save it. Select File > Save. If this is a newly built Vocabulary, the Vocabulary Editor will prompt you for a name.

Modifying Vocabularies

Other than altering the likelihood for specific words in a Vocabulary, we recommend you not make many other modifications. However, certain situations may warrant it, so the Vocabulary Editor does allow these operations. This may provide a good way to temporarily make changes for testing purposes.

Add a word to a Vocabulary

  1. On the Vocabulary Editor dialog, select File > Open. The Open dialog opens.
  2. Select the Vocabulary to which you want to add a word, and then click Open. The Vocabulary's details appear on the Vocabulary Editor dialog.
  3. Select Edit > Add Word. The Add Word dialog appears.
  4. Enter your new word, and then click OK. The word now appears selected under Word on the Vocabulary Editor dialog.
  5. On the right side of the screen, add at least one category with a likelihood value.

Note Note: The Vocabulary Editor will alert you if you try to add a word that already exists in the Vocabulary.

Modify a word in a Vocabulary

  1. On the Vocabulary Editor dialog, select File > Open. The Open dialog appears.
  2. Select the Vocabulary that contains the word you want to modify, and then click Open. The Vocabulary's details appear on the Vocabulary Editor dialog.
  3. Under Word, select the word or words that you want to modify. The word's categories appear on the right side of the dialog.
  4. Click Add to display the Add Word Category dialog. If you change the likelihood of a category that already belongs to the selected word or words, you will receive the Overwrite Category dialog. The Overwrite Category dialog enables you to accept or refuse one or more changed likelihood values.
  5. Select a category and click Edit to change category settings and likelihood values.
  6. Select a category and click Delete to remove that category from the selected word or words.

Delete a word from a Vocabulary

  1. On the Vocabulary Editor dialog, select File > Open. The Open dialog appears.
  2. Select the Vocabulary that contains the word you want to delete, and then click Open. The Vocabulary's details appear on the Vocabulary Editor dialog.
  3. Under Word, select the word you want to delete.
  4. Select Edit > Delete Word.

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfU_Cstm_Vocab_14000.html