Note: This example
assumes you completed the example in
Using The Text Filter Node, and builds
off the process flow diagram created there.
-
Select
the
Text Mining tab in the Enterprise Miner
Toolbar, and drag a Text Topic node into the diagram workspace.
-
Connect
the Text Filter node to the Text Topic node.
-
Before
running the Text Topic node, you are going to create a topic that
identifies abstracts that discuss dynamic Web sites. To do this, click
on the ellipsis button next to the User Topics property of the Text
Topic node. This opens the User Topics window where you can create
your own topics.
In
the User Topics window, there are four columns, labeled
_Role_,
_Term_,
_Weight_, and
_Topic_. For
this example, you can leave the
_Role_ column
empty. To add a row, click the
Add button
at the top of the window. To delete a term, select that row and click
the
Delete button at the top of the window.
To create
the topic about dynamic Web sites, enter the three terms, with their
corresponding weights and topics, as shown in the image below. The
weight of a term is relative and can be any value between 0 and 1.
More important terms should have a higher weighting than less important
terms.
After
you have entered the three terms shown above, click
OK to save your changes. You are now ready to run the Text Topic node.
-
Right-click
the Text Topic node and select
Run to run the
Text Topic node with all other settings at their default values. Click
Yes in the Confirmation dialog box. When the node finishes
running, select
Results in the Run Status dialog
box.
-
From
the
Results window, expand the
Number of Terms by Topic chart. The default settings
created twenty-five multi-term topics in addition to the one you defined
above. Close the Results window.
-
Select
the Text Topic node, and then click the ellipsis button for the
Topic Viewer property to open the
Interactive
Topic Viewer window. At the top of the
Topics list should be the topic
Internet, which
you created. Notice that the
Category of
this topic is given as
User while the rest
of the topics are given as
Multiple.
You can
view all of the documents in your topic by right-clicking anywhere
in the first row and selecting
Select Current Topic, if it is not already selected. The middle pane shows you that only
the terms
dynamic,
web page, and
web site are in this topic. It also shows how many documents each term is
in and how frequently each term appeared. The bottom pane contains
the observations from the Abstract data set that belong to your topic.
You can read each of these by right-clicking anywhere in the bottom
pane and selecting
Toggle Show Full Text from
the menu. Close the Interactive Topic Viewer.
-
One
problem that you might have noticed is that many of the topics appear
to be closely related because they have many terms in common. This
suggests that some topics can be merged together. In the
Topics list of the
Interactive Filter Viewer, there are four topics (topics 2, 3, 5, and 10) that contain both
user and
application as descriptive
terms. These topics can be merged to form one, user-created topic.
To merge
the topics, double-click the first topic you want to rename to make
it the active topic. You can rename the topic by deleting all of the
text in the
Topic field and replacing it
with
User Applications
. Repeat
this process for the other three topics that contain both
user and
application. When
you rename a topic, the
Category of the topic
changes to
User. Close the
Interactive
Filter Viewer and save the changes you made.
-
Right-click
the Text Topic node and select
Run to rerun
the Text Topic node. Click
Yes in the Confirmation
dialog box and click
OK in the Run Status dialog
box when the node is finished running.
-
Now,
click the ellipsis next to the
Topic Viewer property to see the new topics that have been created. As you can
see, there are now two user-defined topics,
Internet and
User Applications. However, the Text Topic node still created 25 topics. If you are
not satisfied with these topics, you can continue to merge multiple
topics together until the topics are distinct enough for your needs.