The
Text
Import node serves as a replacement for an
Input
Data node. It enables you to create data sets dynamically
from files contained in a directory or from the Web. The
Text
Import node takes an import directory that contains text
files in potentially proprietary formats such as MS Word and PDF files
as input. The tool traverses this directory and filters or extracts
the text from the files, places a copy of the text in a plain text
file, and places a snippet (or possibly even all) of the text in a
SAS data set.
If a URL is specified,
the node crawls Web sites, retrieves files from the Web, and puts
them in an import directory before doing this filtering process. The
output of a
Text Import node is a data set
that can be imported into the
Text Parsing node.
In addition to filtering the text, the
Text Import node
can also identify the language that the document is in and take care
of transcoding documents to the session encoding.
For more information
about the
Text Import node, see the SAS Text
Miner Help.
The rest of this chapter
presents two examples of how you can use the
Text Import node.