![]() | ![]() | ![]() | ![]() | ![]() |
The %TMFILTER macro, included with SAS® Text Miner, supports the URL= parameter for extracting text content from web pages. The filtering engine should begin at the page referenced by the URL= parameter, then "crawl" to each page linked.
However, if the HTTP reference tag quotes the URL address using single–quote characters, the link is ignored and its pages are omitted from the %TMFILTER output.
There are no errors or warnings indicating that links were overlooked. To determine whether the problem occurred, examine the %TMFILTER output to see if it contains fewer observations than you expected.
The problem does not occur for HTTP reference tags that use double–quotes.For example, %TMFILTER will not follow the link to the resources page if the HTTP reference tag is written like this:
<a href='/resources'>Company Resources</a>
The same link written with double-quotes works:
Product Family | Product | System | Product Release | SAS Release | ||
Reported | Fixed* | Reported | Fixed* | |||
SAS System | SAS Text Miner | 64-bit Enabled Solaris | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 |
64-bit Enabled AIX | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 | ||
Microsoft Windows XP Professional | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 | ||
Microsoft Windows Server 2003 Standard Edition | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 | ||
Microsoft Windows Server 2003 Enterprise Edition | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 | ||
Microsoft Windows Server 2003 Datacenter Edition | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 | ||
Microsoft Windows NT Workstation | 3.2 | 9.1 TS1M3 SP4 | ||||
Microsoft Windows 2000 Professional | 3.2 | 9.1 TS1M3 SP4 | ||||
Microsoft Windows 2000 Server | 3.2 | 9.1 TS1M3 SP4 | ||||
Microsoft Windows 2000 Datacenter Server | 3.2 | 9.1 TS1M3 SP4 | ||||
Microsoft Windows 2000 Advanced Server | 3.2 | 9.1 TS1M3 SP4 |