Usage Note 57915: How to parse a contraction word as a single term in SAS® Text Miner
In SAS Text Miner, contractions like "can't" are not detected as a single term even when they are added in the Multi-word Terms table in the Text Parsing node.
Example:
- 14.1 and later: "can't" is split into "can" and "not"
- prior to 14.1: "can't" is split into "can" and "n't"
To avoid this split, follow these steps:
Scan the input text with a SAS® character function, and convert the contraction to separate words throughout the text. Perform this step before using the data set in SAS Text Miner.
Example: this example code demonstrates using the TRANWORD() character function to convert "can't" to "can not" throughout the text:
data contraction_text_replace_ds;
set contraction_text_ds;
txt_col=tranwrd(txt_col, "can't", "can not");
run;
In the Text Parsing node, add the new word-pair to the Multi-word Terms table. Specify a blank role.
Example: add "can not".

Run the Text Parsing node again.
Operating System and Release Information
| SAS System | SAS Text Miner | Solaris for x64 | 12.1 | | 9.3 TS1M2 | |
| Linux for x64 | 12.1 | | 9.3 TS1M2 | |
| HP-UX IPF | 12.1 | | 9.3 TS1M2 | |
| 64-bit Enabled Solaris | 12.1 | | 9.3 TS1M2 | |
| 64-bit Enabled AIX | 12.1 | | 9.3 TS1M2 | |
| Windows Vista for x64 | 12.1 | | 9.3 TS1M2 | |
| Windows Vista | 12.1 | | 9.3 TS1M2 | |
| Windows 7 Ultimate x64 | 12.1 | | 9.3 TS1M2 | |
| Windows 7 Ultimate 32 bit | 12.1 | | 9.3 TS1M2 | |
| Windows 7 Professional x64 | 12.1 | | 9.3 TS1M2 | |
| Windows 7 Professional 32 bit | 12.1 | | 9.3 TS1M2 | |
| Windows 7 Home Premium x64 | 12.1 | | 9.3 TS1M2 | |
| Windows 7 Home Premium 32 bit | 12.1 | | 9.3 TS1M2 | |
| Windows 7 Enterprise x64 | 12.1 | | 9.3 TS1M2 | |
| Windows 7 Enterprise 32 bit | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows XP Professional | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2012 Std | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2012 R2 Std | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2012 R2 Datacenter | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2012 Datacenter | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2008 for x64 | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2008 R2 | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2008 | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2003 for x64 | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2003 Standard Edition | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2003 Enterprise Edition | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows Server 2003 Datacenter Edition | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows 8.1 Pro x64 | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows 8.1 Pro 32-bit | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows 8.1 Enterprise x64 | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows 8.1 Enterprise 32-bit | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows 8 Pro x64 | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows 8 Pro 32-bit | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows 8 Enterprise x64 | 12.1 | | 9.3 TS1M2 | |
| Microsoft Windows 8 Enterprise 32-bit | 12.1 | | 9.3 TS1M2 | |
| Microsoft® Windows® for x64 | 12.1 | | 9.3 TS1M2 | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
This note suggests workaround steps to follow to capture a contraction as a single term in SAS Text Miner.
| Type: | Usage Note |
| Priority: | low |
| Date Modified: | 2016-03-25 07:56:17 |
| Date Created: | 2016-03-23 06:44:35 |