Problem Note 33261: %TMFILTER macro does not find links when URL= uses single quotes
The %TMFILTER macro, included with SAS® Text Miner, supports the URL= parameter for extracting text content from web pages. The filtering engine should begin at the page referenced by the URL= parameter, then "crawl" to each page linked.
However, if the HTTP reference tag quotes the URL address using single–quote characters, the link is ignored and its pages are omitted from the %TMFILTER output.
There are no errors or warnings indicating that links were overlooked. To determine whether the problem occurred, examine the %TMFILTER output to see if it contains fewer observations than you expected.
The problem does not occur for HTTP reference tags that use double–quotes.
For example, %TMFILTER will not follow the link to the resources page if the HTTP reference tag is written like this:
<a href='/resources'>Company Resources</a>
The same link written with double-quotes works:
<a href="/resources">Company Resources</a>
Operating System and Release Information
SAS System | SAS Text Miner | 64-bit Enabled Solaris | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 |
64-bit Enabled AIX | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 |
Microsoft Windows XP Professional | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 |
Microsoft Windows Server 2003 Standard Edition | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 |
Microsoft Windows Server 2003 Enterprise Edition | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 |
Microsoft Windows Server 2003 Datacenter Edition | 3.2 | 4.1 | 9.1 TS1M3 SP4 | 9.2 TS2M0 |
Microsoft Windows NT Workstation | 3.2 | | 9.1 TS1M3 SP4 | |
Microsoft Windows 2000 Professional | 3.2 | | 9.1 TS1M3 SP4 | |
Microsoft Windows 2000 Server | 3.2 | | 9.1 TS1M3 SP4 | |
Microsoft Windows 2000 Datacenter Server | 3.2 | | 9.1 TS1M3 SP4 | |
Microsoft Windows 2000 Advanced Server | 3.2 | | 9.1 TS1M3 SP4 | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Problem Note |
Priority: | high |
Date Modified: | 2008-09-16 15:30:48 |
Date Created: | 2008-09-11 13:49:25 |