Working with Word Clouds

About Word Clouds

A word cloud displays a set of words from a character data item. Depending on the type of word cloud and your data roles, the size of each word in the cloud can indicate the relevance of the word, the frequency of the word, or the value of a measure.
You can create two types of word cloud:
Word clouds that use text analytics
Word clouds that use text analytics analyze each value in a document collection data item as a text document that can contain multiple words. Words that often appear together in the document collection are identified as topics. The word cloud displays the most relevant terms for each topic, where the size of each term indicates the relevance.
To enable text analytics, you must set a unique row identifier and define one or more categories as document collections. See Managing Data.
Note: Depending on the number of rows in your data source and the length of the values in your document collection, a word cloud with text analytics might require a significant amount of time to display.
Note: Text analytics in SAS Visual Analytics uses a different algorithm from SAS Text Miner. Your results might be different from the results that SAS Text Miner produces.
Word clouds that use category values
Word clouds that use category values analyze each value in a category data item as a single text string. The word cloud can display either the string values that have the highest frequency or the string values that have the greatest value for a measure. The color of each word can indicate the value of a measure.

Data Roles for a Word Cloud

About Data Roles for a Word Cloud

The data roles for a word cloud are dependent on the type of word cloud that you select.
The Show Word Cloud option selects whether the word cloud is generated by using text analytics or by using category values.

Data Roles for a Word Cloud That Uses Text Analytics

For a word cloud that uses text analytics, the basic role is a Document collection. A document collection is a category data item that contains the words that you will analyze.
Note: To enable text analytics, you must set a unique row identifier and define one or more categories as document collections. See Define Data Items for Text Analytics.
In addition to the basic role, you can specify Document details. Document details adds additional data items to the Documents tab of the details table.

Data Roles for a Word Cloud That Uses Category Values

For a word cloud that uses category values, the basic role is Words. Specify a category whose values are used in the word cloud.
In addition to the basic role, you can specify these roles:
Size
specifies a measure that determines the size of each word. If you do not specify a measure, then the word size indicates the frequency of each word.
Color
specifies a measure that determines the color of each word.

Specify Properties for a Word Cloud

On the Properties tab, you can specify the following options:
Name
specifies the name of the visualization.
Title
specifies the title that appears above the graph.
Note: The Title option is disabled if you select Generate graph title.
Generate graph title
specifies that the graph title is generated automatically based on the data items in the visualization.
Frequency (for category values only)
specifies whether the frequency is displayed as a count (Count) or as a percentage (Percent).
Note: The frequency values are based on the data that is shown in the visualization (after filters and other data selections have been applied).
Note: This option has no effect if a measure is assigned to the Size role.
Word display limit
specifies the maximum number of words that are displayed in the word cloud.
Font scale
specifies the amount of difference in font sizes between the largest and smallest words in the cloud. The number value specifies the ratio in points of the largest font size to the smallest font size.
For word clouds that use text analytics, you can specify the following additional options:
Maximum topics
specifies the maximum number of topics to create. Specify a number from 4 to 20.
Resolution
specifies the resolution that is used to identify topics. A Low resolution identifies fewer topics. A High resolution identifies more topics.
Cell weight
specifies whether to weight the frequency of each term for every document that it appears in. Selecting Logarithmic de-emphasizes terms that appear many times in relatively few documents.
Term weight
specifies a weighting algorithm for the terms in the document collection. The Entropy weighting algorithm emphasizes terms that have a low frequency across the document collection.
Entity extraction
specifies a method that is used to identify text entities such as names, addresses, telephone numbers, and so on. The Standard method identifies each text entity as a term. If you select None, then text entities are not treated differently from other text.
Minimum term frequency
specifies the minimum number of documents that a term must appear in. Specify a number from 1 to 20. If a term does not appear in the minimum number of documents, then it is not included in the word cloud.
Topic label term count
specifies the number of terms that are included in a topic name. Specify a number from 2 to 8. This property does not affect the number of terms that are used to select topics; only the topic names are changed.
Extract noun groups
specifies whether to identify groups of nouns as terms.
Stem words
specifies whether all forms of a given word are identified as a single term. For example, if you select Stem words, then the words “sell,” “sells,” “selling,” and “sold” are identified as a single term “sell.”
Use stop list (if available)
specifies whether to use a stop list to exclude common words such as “the,” “with,” and “is” when identifying terms. If no stop list is available, then a message appears at the bottom of the word cloud.