The TEXTPARSE statement
generates a summary table and up to seven temporary tables. The following
program provides sample data and statements for getting started and
then showing the basic layout for each of the temporary tables.
data getstart;
infile cards delimiter='|' missover;
length text $150;
input text$ docid$;
cards;
High-performance analytics hold the key to |d01
unlocking the unprecedented business value of big data.|d02
Organizations looking for optimal ways to gain insights|d03
from big data in shorter reporting windows are turning to SAS.|d04
As the gold-standard leader in business analytics |d05
for more than 36 years,|d06
SAS frees enterprises from the limitations of |d07
traditional computing and enables them |d08
to draw instant benefits from big data.|d09
Faster Time to Insight.|d10
From banking to retail to health care to insurance, |d11
SAS is helping industries glean insights from data |d12
that once took days or weeks in just hours, minutes, or seconds.|d13
It's all about getting to and analyzing relevant data faster.|d14
Revealing previously unseen patterns, sentiments, and relationships.|d15
Identifying unknown risks.|d16
And speeding the time to insights.|d17
High-Performance Analytics from SAS Combining industry-leading |d18
analytics software with high-performance computing technologies|d19
produces fast and precise answers to unsolvable problems|d20
and enables our customers to gain greater competitive advantage.|d21
SAS In-Memory Analytics eliminate the need for disk-based processing|d22
allowing for much faster analysis.|d23
SAS In-Database executes analytic logic into the database itself |d24
for improved agility and governance.|d25
SAS Grid Computing creates a centrally managed,|d26
shared environment for processing large jobs|d27
and supporting a growing number of users efficiently.|d28
Together, the components of this integrated, |d29
supercharged platform are changing the decision-making landscape|d30
and redefining how the world solves big data business problems.|d31
Big data is a popular term used to describe the exponential growth,|d32
availability and use of information,|d33
both structured and unstructured.|d34
Much has been written on the big data trend and how it can |d35
serve as the basis for innovation, differentiation and growth.|d36
run;
options set=TKTXTANIO_BINDAT_DIR="/opt/TKTGDat";
libname example sasiola host="grid001.example.com" port=10010 tag=hps;
data example.getstart;
set getstart;
run;
proc imstat data=example.getstart;
textparse var=text docid=docid entities=std reducef=2
select=(_all_) save=txtsummary;
run;
Computing singular value
decomposition requires the input data to contain at least 25 documents
and at least as many documents as there are machines in the cluster.
By default, REDUCEF=4 but in this example is set to 2 to specify that
a word only needs to appear twice to be kept for generating the term-by-document
matrix. The default dimension for the singular-value decomposition
is k=10 and the server generates ten topics.
The TEXTPARSE statement
produces the following output. The names of the temporary tables are
reported and begin with _T_. These names (and the rest of the output
in the table) is stored in a temporary buffer that is named TXTSUMMARY.
Because the IMSTAT procedure
was used in interactive mode with a RUN statement instead of QUIT,
the STORE statement can be used to create macro variables for the
temporary table names as follows:
store txtsummary( 6,2) as Terms;
store txtsummary( 7,2) as Parent;
store txtsummary( 8,2) as V;
store txtsummary( 9,2) as U;
store txtsummary(10,2) as Topics;
store txtsummary(11,2) as TermsByTopics;
store txtsummary(12,2) as DocPro;
run;
Each of the tables can
then be accessed with a libref and the macro variable, such as example.&Terms.;
.
The following sections show how to access these tables.