To import documents
from the Web:
Note: Web crawling is supported
only on Windows operating systems.
-
Select the
Text
Mining tab, and drag a
Text Import node
into the diagram workspace.
-
Click the
for the
Import File Directory property
of the
Text Import node.
A
Select
Server Directory dialog box appears.
-
Navigate to a folder,
select it, and then click
OK.
The documents are first
written to the
Import File Directory location.
The files are processed from the
Import File Directory location,
and then are written to the
Destination Directory location.
-
Enter the uniform resource
locator (URL) of a Web page that you want to crawl in the
URL property
of the
Text Import node. For example, try
www.sas.com.
-
Type
1 as
the number of levels to crawl in the Depth property.
-
Set the
Domain property
to
Unrestricted
.
Note: If you want to crawl a password-protected
Web site, set the
Domain property to
Restricted,
and provide a user name for the
User Name property
and a password for the
Password property.
-
Right-click the
Text
Import node and select
Run.
-
Click
Yes in
the
Confirmation dialog box.
-
Click
Results in
the
Run Status dialog box when the node has
finished running.
-
Examine results from
the Web site.
You can now use the
Text
Import node as an input data source for your text mining
analysis.
-
Select the
Text
Mining tab, and drag a
Text Parsing node
into the diagram workspace.
-
Connect the
Text
Import node to the
Text Parsing node.
-
Right-click the
Text
Parsing node, and select
Run.
-
Click
Yes in
the
Confirmation dialog box.
-
Click
OK in
the
Run Status dialog box when the node has
finished running.