Working with Word Clouds

About Word Clouds

A word cloud displays a set of words from a character data item. Depending on the type of word cloud and your data roles, the size of each word in the cloud can indicate the importance (topic term weight) of the word, the frequency of the word, or the value of a measure.
You can create two types of word cloud:
Word clouds that use text analytics
Word clouds that use text analytics analyze each value in a document collection data item as a text document that can contain multiple words. Words that often appear together in the document collection are identified as topics. For the selected topic, the word cloud displays the terms with the greatest topic term weight values. The topic term weight indicates the importance of the term within the topic.
A word cloud that uses text analytics can also display whether the documents in a topic express positive, negative, or neutral sentiment.
The details table for a text analytics word cloud contains additional information about the terms, topics, and documents in the word cloud. For more information, see Explore Text Analytics Results.
To enable text analytics, you must set a unique row identifier and define one or more categories as document collections. See Define Data Items for Text Analytics.
Note: Text analytics can be applied only to English or German text.
Note: Depending on the number of rows in your data source and the length of the values in your document collection, a word cloud with text analytics might require a significant amount of time to display.
Note: Text analytics in SAS Visual Analytics uses a different algorithm from SAS Text Miner. Your results might be different from the results that SAS Text Miner produces.
Word clouds that use category values
Word clouds that use category values analyze each value in a category data item as a single text string. The word cloud can display either the string values that have the highest frequency or the string values that have the greatest value for a measure. The color of each word can indicate the value of a measure.
Note: If you view the word cloud as an automatic chart, then any changes to the Roles tab might cause the visualization to reset. It is recommended that you view it as a word cloud.

Data Roles for a Word Cloud

About Data Roles for a Word Cloud

The data roles for a word cloud are dependent on the type of word cloud that you select.
The Show Word Cloud option selects whether the word cloud is generated by using text analytics or by using category values.

Data Roles for a Word Cloud That Uses Text Analytics

For a word cloud that uses text analytics, the basic role is a Document collection. A document collection is a category data item that contains the words that you will analyze.
Note: To enable text analytics, you must set a unique row identifier and define one or more categories as document collections. See Define Data Items for Text Analytics.
In addition to the basic role, you can specify the following role:
Document details
specifies data items that are displayed as columns in the Documents tab of the details table.

Data Roles for a Word Cloud That Uses Category Values

For a word cloud that uses category values, the basic role is Words. Specify a category whose values are used in the word cloud.
In addition to the basic role, you can specify these roles:
Size
specifies a measure that determines the size of each word. If you do not specify a measure, then the word size indicates the frequency of each word.
Color
specifies a measure that determines the color of each word.

Specify Properties for a Word Cloud

On the Properties tab, you can specify the following options:
Name
specifies the name of the visualization.
Title
specifies the title that appears above the graph.
Note: The Title option is disabled if you select Generate graph title.
Generate graph title
specifies that the graph title is generated automatically based on the data items in the visualization.
Frequency (for category values only)
specifies whether the frequency is displayed as a count (Count) or as a percentage (Percent).
Note: The frequency values are based on the data that is shown in the visualization (after filters and other data selections have been applied).
Note: This option has no effect if a measure is assigned to the Size role.
Word display limit
specifies the maximum number of words that are displayed in the word cloud.
Font scale
specifies the amount of difference in font sizes between the largest and smallest words in the cloud. The number value specifies the ratio in points of the largest font size to the smallest font size.
For word clouds that use category values, you can specify the following addition option:
Color gradient
selects the gradient colors for the visualization.
You can click the Edit color gradient button to select the values that are used to assign the colors. See Specify a Custom Data Range.
For word clouds that use text analytics, you can specify the following additional basic options:
Analyze document sentiment
enables sentiment analysis for the word cloud.
Sentiment analysis determines whether a document has a positive sentiment, negative sentiment, or neutral sentiment based on the content of the document.
When sentiment analysis is enabled, the number of positive, neutral, and negative documents in the topic is displayed at the top of the word cloud. In addition, sentiment values are displayed on the Topics and Documents tabs of the details table.
Identify term roles
identifies terms by their parts of speech. In addition, this option identifies groups of nouns as single terms and identifies text entities such as names, addresses, telephone numbers, and so on.
Note: This option is equivalent to the advanced options Include parts of speech, Extract noun groups, and Use entity extraction.
Maximum topics
specifies the maximum number of topics to create. Specify a number from 4 to 20.
For word clouds that use text analytics, you can specify the following additional advanced options:
Analyze document sentiment
enables sentiment analysis for the word cloud.
Sentiment analysis determines whether a document has a positive sentiment, negative sentiment, or neutral sentiment based on the content of the document.
When sentiment analysis is enabled, the number of positive, neutral, and negative documents in the topic is displayed at the top of the word cloud. In addition, sentiment values are displayed on the Topics and Documents tabs of the details table.
Maximum topics
specifies the maximum number of topics to create. Specify a number from 4 to 20.
Resolution
specifies the resolution that is used to identify topics. A Low resolution identifies fewer topics. A High resolution identifies more topics.
Cell weight
specifies whether to weight the frequency of each term for every document that it appears in. Selecting Logarithmic de-emphasizes terms that appear many times in relatively few documents.
Term weight
specifies a weighting algorithm for the terms in the document collection. The Entropy weighting algorithm emphasizes terms that have a low frequency across the document collection.
Document threshold
specifies the minimum number of documents that a term must appear in. Specify a number from 1 to 20. If a term does not appear in the minimum number of documents, then it is not included in the word cloud.
Topic label length
specifies the number of terms that are included in a topic name. Specify a number from 2 to 8. This property does not affect the number of terms that are used to select topics; only the topic names are changed.
Include parts of speech
specifies that terms are classified by parts of speech (for example, a noun, a verb, or an adjective.) The part of speech for each term is displayed in the data tip for the term.
Extract noun groups
specifies whether to identify groups of nouns as terms.
Use entity extraction
specifies whether to identify text entities such as names, addresses, telephone numbers, and so on. If this option is disabled, then text entities are not treated differently from other text.
Stem words
specifies whether all forms of a given word are identified as a single term. For example, if you select Stem words, then the words “sell,” “sells,” “selling,” and “sold” are identified as a single term “sell.”
Use stop list (if available)
specifies whether to use a stop list to exclude common words such as “the,” “with,” and “is” when identifying terms. If no stop list is available, then a message appears at the bottom of the word cloud.
Stop list
specifies the stop list that is used, if the Use stop list option is enabled.

Explore Text Analytics Results

For a word cloud visualization that uses text analytics, a large amount of additional information is available in the details table. To display the details table, click the Options drop-down list from the visualization toolbar, and then select Show Details.
The details table for a text analytics word cloud contains the following tabs:
Results
displays all of the terms in the current topic. For each term, the Topic Term Weight value indicates the importance of the term in the current topic.
If the Identify term roles property or the Include parts of speech property is enabled, then the Role value identifies the grammatical role of each term.
Note: You can sort any column by clicking the column heading.
Topics
displays all of the topics in the document collection. If sentiment analysis is enabled, then the number of positive, neutral, and negative documents for each topic is displayed.
Note: You can sort any column by clicking the column heading.
Documents
displays each of the documents that contains the selected term. For each document, the Relevance value indicates how relevant the document is to the current topic.
To view the full text for a document, right-click the document, and then select View Full Document.
If sentiment analysis is enabled, then the Sentiment value identifies how positive or negative the document is. You can filter the documents to exclude documents with positive, negative, or neutral sentiment.
Note: You can sort any numeric column by clicking the column heading.
Analysis
provides definitions of the key concepts for text analytics.

Explore Selected Documents as a New Visualization

You can explore a set of selected documents as a new table visualization. To create a new visualization from your selected documents, follow these steps:
  1. Select the topic and the term that you want to explore.
  2. On the Documents tab in the details table, select the documents that you want to explore in a new visualization. To select all of the documents, right-click any document, and then select Select All.
  3. Right-click any document, and then select Create Visualization from Selected Documents.
A new table visualization appears with your selected document values.