Using the Text Import Node :: Getting Started with SAS(R) Text Miner 12.1

The following examples show you how you can use the Text Import node to import documents from a directory or the Web. These examples assume that SAS Enterprise Miner is running, the SAS Document Conversion server is running, and a diagram workspace has been opened in a project. For information about creating a project and a diagram, see Setting Up Your Project.

Import Documents from a Directory

To import documents from a directory:

Select the Text Mining tab, and drag a Text Import node into the diagram workspace.
Click the for the Import File Directory property of the Text Import node.

A Select Server Directory dialog box appears.
Navigate to a folder that contains documents that you want to create a data set from, select it, and then click OK.

Note: To see the file types that you want to select, you might need to select All Files in the type drop-down menu.
Click the for the Language property.

The Languages dialog box appears.
Select one or more licensed languages in which to require the language identifier to assign each document’s language, and then click OK.
(Optional) Specify the file types to process for the Extensions property. For example, if you want to look at only documents with a .txt and a .pdf extension, specify .txt .pdf for the Extensions property, and click Enter.

Note: If you do not specify file types to process, the Text Import node processes all file types in the specified import file directory.
Right-click the Text Import node, and select Run.
Click Yes in the Confirmation dialog box.
Click Results in the Run Status dialog box when the node has finished running.

The Results window appears.
Examine results from the documents that you imported.

You can now use the Text Import node as an input data source for your text mining analysis.
Select the Text Mining tab, and drag a Text Parsing node into the diagram workspace.
Connect the Text Import node to the Text Parsing node.
Right-click the Text Parsing node, and select Run.
Click Yes in the Confirmation dialog box.
Click OK in the Run Status dialog box when the node has finished running.

Import Documents from the Web

To import documents from the Web:

Note: Web crawling is supported only on Windows operating systems.

Select the Text Mining tab, and drag a Text Import node into the diagram workspace.
Click the for the Import File Directory property of the Text Import node.

A Select Server Directory dialog box appears.
Navigate to a folder, select it, and then click OK.

The documents are first written to the Import File Directory location. The files are processed from the Import File Directory location, and then are written to the Destination Directory location.
Enter the uniform resource locator (URL) of a Web page that you want to crawl in the URL property of the Text Import node. For example, try www.sas.com.
Type 1 as the number of levels to crawl in the Depth property.
Set the Domain property to Unrestricted.

Note: If you want to crawl a password-protected Web site, set the Domain property to Restricted, and provide a user name for the User Name property and a password for the Password property.
Right-click the Text Import node and select Run.
Click Yes in the Confirmation dialog box.
Click Results in the Run Status dialog box when the node has finished running.
Examine results from the Web site.

You can now use the Text Import node as an input data source for your text mining analysis.
Select the Text Mining tab, and drag a Text Parsing node into the diagram workspace.
Connect the Text Import node to the Text Parsing node.
Right-click the Text Parsing node, and select Run.
Click Yes in the Confirmation dialog box.
Click OK in the Run Status dialog box when the node has finished running.

Using the Text Import Node

Contents

Import Documents from a Directory

Import Documents from the Web