Problem Note 37215: Output from Text Miner for Training and Scoring data sets might not match
In SAS® Text Miner, using multiword terms in the terms
data table can cause differing results between the training, and the scoring, of a document collection. The results affected include terms and documents, probabilities and scores.
Training is performed using PROC DOCPARSE (the underlying procedure), but scoring is performed using DOCSCORE (a function). PROC DOCPARSE handles multiword terms differently from the way that DOCSCORE handles multiword terms. The Train data set results might not match the Score data set results.
There are no errors to indicate these possible discrepancies.
Click the Hot Fix tab in this note to access the hot fix for this issue.
Operating System and Release Information
| SAS System | SAS Text Miner | Linux for x64 | 4.1 | 4.2 | 9.2 TS2M0 | 9.2 TS2M2 |
| HP-UX IPF | 4.1 | 4.2 | 9.2 TS2M0 | 9.2 TS2M2 |
| 64-bit Enabled Solaris | 4.1 | 4.2 | 9.2 TS2M0 | 9.2 TS2M2 |
| 64-bit Enabled AIX | 4.1 | 4.2 | 9.2 TS2M0 | 9.2 TS2M2 |
| Windows Vista | 4.1 | 4.2 | 9.2 TS2M0 | 9.2 TS2M2 |
| Microsoft Windows XP Professional | 4.1 | 4.2 | 9.2 TS2M0 | 9.2 TS2M2 |
| Microsoft Windows Server 2003 Standard Edition | 4.1 | 4.2 | 9.2 TS2M0 | 9.2 TS2M2 |
| Microsoft Windows Server 2003 Enterprise Edition | 4.1 | 4.2 | 9.2 TS2M0 | 9.2 TS2M2 |
| Microsoft Windows Server 2003 Datacenter Edition | 4.1 | 4.2 | 9.2 TS2M0 | 9.2 TS2M2 |
| Microsoft® Windows® for x64 | 4.1 | 4.2 | 9.2 TS2M0 | 9.2 TS2M2 |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
| Type: | Problem Note |
| Priority: | high |
| Topic: | Analytics ==> Data Mining Analytics ==> Text Mining
|
| Date Modified: | 2011-12-19 09:28:29 |
| Date Created: | 2009-09-18 14:47:10 |