The %TMFILTER Macro

The %TMFILTER macro is a SAS macro that enables you to convert files into SAS data sets. The %TMFILTER macro is provided with SAS Text Miner. It is supported in all operating systems for filtering and on Windows for crawling. The %TMFILTER macro relies on the SAS Document Conversion Server that is installed and running on a Windows machine. See SAS Document Conversion server for more information. You can use the macro to perform the following tasks:
  • filter a collection of documents that is saved in any supported file format and output a SAS data set that can be used to create a SAS Text Miner data source.
  • Web crawl and output a SAS data set that can be used to create a SAS Text Miner data source. Web crawling retrieves the text of a starting Web page, extracts the URL links within that page, and then repeats the process within the linked pages recursively. You can restrict a crawl to the domain of the starting URL, or you can let a crawl process any linked pages that are not in the domain of the starting URL. The crawl continues until a specified number of levels of drill-down is reached or until all the Web pages that satisfy the domain constraint are found. Web crawling is supported only on Windows operating systems.
  • identify the languages of all documents in a collection.
See the SAS Text Miner Help for more information about the %TMFILTER macro.