About the Text Import Node :: Getting Started with SAS(R) Text Miner 12.1

The Text Import node serves as a replacement for an Input Data node. It enables you to create data sets dynamically from files contained in a directory or from the Web. The Text Import node takes an import directory that contains text files in potentially proprietary formats such as MS Word and PDF files as input. The tool traverses this directory and filters or extracts the text from the files, places a copy of the text in a plain text file, and places a snippet (or possibly even all) of the text in a SAS data set.

If a URL is specified, the node crawls Web sites, retrieves files from the Web, and puts them in an import directory before doing this filtering process. The output of a Text Import node is a data set that can be imported into the Text Parsing node. In addition to filtering the text, the Text Import node can also identify the language that the document is in and take care of transcoding documents to the session encoding.

For more information about the Text Import node, see the SAS Text Miner Help.

The rest of this chapter presents two examples of how you can use the Text Import node.