Processing a Large Collection of Documents

Using SAS Text Miner nodes to process a large collection of documents can require a lot of computing time and resources. If you have limited resources, it might be necessary to take one or more of the following actions:

Use a sample of the document collection.
Set some of the parse properties to No or None, such as Noun Groups or Find Entities.
Reduce the number of SVD dimensions or roll-up terms. If you are running into memory problems with the SVD approach, you can roll up a certain number of terms, and then the remaining terms are automatically dropped.
Limit parsing to high information words by turning off all parts of speech other than nouns, proper nouns, noun groups, and verbs.
Structure sentences properly for best results, including correct grammar, punctuation, and capitalization. Entity extraction does not always generate reasonable results.

Previous Page
|
Next Page
|
Top of Page