Problem Note 49022: The Text Import node does not process more than 99,999 documents
In the Text Import node of SAS® Text Miner, the maximum number of documents that might be converted into text documents in the destination directory folder is 99,999. If you have more than 99,999 documents in the "Import File Directory", then a portion of the documents is not correctly processed. The destination directory folder might not contain all of the first 99,999 converted documents.
There are not warning or error messages.
The following steps provide a work-around:
Move the documents in the "Import File Directory" to multiple folders,
say k folders, so that each folder contains approximately 1/k of the total number of
documents. Make sure that each folder contains fewer than 99,999 documents.
Create k Text Import nodes that import documents from their corresponding folders.
Run each Text Import node.
After all of the k Text Import nodes are completed, create a SAS Code node and
connect it to each of those k Text Import nodes. Insert the following SAS code in the
Code Editor. The code concatenates all of the exported data sets that are created by
those k Text Import nodes into a new SAS data set. The new SAS data set is saved in a
system folder that you define:
libname mylib "E:\tracks\TM" /* a folder where you have permission to write */
data mylib.combined;
set EMWS<n>.TextImport_TRAIN
EMWS<n>.TextImport2_TRAIN
... (more lines skipped)
EMWS<n>.TextImportk_TRAIN
; /* where EMWS<n> is the ID value for the diagram */
run;
Submit the code and check the Log to make sure that there are no errors. Create a new data source for the "mylib.combined" data set.
Operating System and Release Information
SAS System | SAS Text Miner | Solaris for x64 | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Linux for x64 | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
HP-UX IPF | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
64-bit Enabled Solaris | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Microsoft Windows Server 2008 for x64 | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Microsoft Windows XP Professional | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Microsoft Windows Server 2003 for x64 | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Microsoft Windows Server 2003 Standard Edition | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
64-bit Enabled AIX | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Windows 7 Ultimate x64 | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Windows 7 Ultimate 32 bit | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Windows 7 Professional x64 | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Windows 7 Professional 32 bit | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Windows 7 Home Premium x64 | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Windows 7 Home Premium 32 bit | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Windows 7 Enterprise x64 | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Windows 7 Enterprise 32 bit | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Microsoft Windows Server 2008 | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Microsoft Windows Server 2003 Enterprise Edition | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Microsoft Windows Server 2003 Datacenter Edition | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
Microsoft® Windows® for x64 | 5.1 | 12.3 | 9.3 TS1M0 | 9.4 TS1M0 |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
The Text Import node does not process more than 99,999 documents
Type: | Problem Note |
Priority: | high |
Date Modified: | 2013-02-05 08:21:02 |
Date Created: | 2013-01-29 15:45:20 |