Feeding SAS® Contents to the Index of “SAS Search and Indexing” Server using
Search Interface to SAS® Content
Search Interface to SAS Content supports feeding SAS contents to the index of
“SAS Search and Indexing” Server. To feed the contents follow the steps below.
Step 1 - Configure “SAS Search and Indexing” Server to Support Feeding SAS
Contents into the Index
This section assumes that the “SAS Search and Indexing” Server is
installed/configured and running.
Configuring “SAS Search and Indexing” Server to support feeding SAS Contents
into the index involves two steps:
1. Configure the parser in pipeline server.
One of the first stages in the document processing pipeline is XML parsing. You
need to specify the correct XML format to the “SAS Search and Indexing” pipeline
server so that the parser can parse the input documents.
Search Interface to SAS Content feeds the SAS Contents in the format defined
below. You will have to configure the pipeline server with this schema using the
“SAS Search and Indexing” admin module.
Note: The XML format specified above contains all elements. If you want to
exclude any of the elements from indexing, you can remove them from this xml
schema while configuring the pipeline parser.
2. Configure the indexing server.
The index configuration or schema is used by the search engine to determine
how to treat the different fields in the input XML documents (that is, which
among those are searchable, which are the metadata fields, and which fields are
used to set constraints on the results). Any changes to these fields would
require a restart of the module.
a. Standard Fields: These fields are searchable. In other words, the content of
these fields is indexed and when you query the index for any keyword, the
contents of these fields are searched for matches.
b. Boolean Fields: The contents of these fields can be used to set constraints
on the result set. For example, if you are searching for the keyword "Sales" and
you want to restrict the results to belong to only the category "Report" then
you can do so by setting the “sastype” field as a Boolean field.
c. Info Fields: These are generally metadata fields. These are the fields that
you want returned with the results, such as the “title” and “description.” For
our application, we send all the elements. We do not select the field “link”
because it has selected for URL. The URL element is always sent by default.
d. URL Field: The URL field is the field in the input documents that has a
unique value for each document. This is to be treated as the ID or identifier.
There will only be one URL field per index schema. In our case, it is “link.”
The following are the potential options that can be specified in the indexing
server for various XML elements. However this can be modified as needed.
Element Name Standard Field Boolean Field Info Fields URL Field
title Yes Yes
description Yes Yes
link Yes Yes
sastype Yes Yes Yes
metadatacreated Yes
metadataupdated Yes
sasowner Yes Yes Yes
keywords Yes Yes Yes
saspath Yes Yes
informationmaps Yes Yes Yes
dataitems Yes Yes Yes
charttypes Yes Yes Yes
datalabels Yes Yes Yes
loaddate Yes
Step 2 - Revise the Configuration Information in url_list.txt
To load SAS Contents to the index of “SAS Search and Indexing” Server, the
index loading application requires configuring the Search Interface to SAS
Content URL in the url_list.txt file available in the Search Interface to the
SAS Content installation home directory.
While configuring the URL, you can either specify the user credential in the
URL, or configure to load only public reports to the index of “SAS Search and
Indexing” Server. If you specify the user credential in the URL, the index
loading application will load all contents accessible for that user to the
index. If you configure to load the public reports, the index loading
application loads all contents accessible for SAS Web Anonymous user to the
index.
Note: If you provide the user name and password in the URL, all users will
have the access to see all the results accessible for this user.
To configure the URL for public reports or for a specific user, follow the
respective sections below.
For public user:
Uncomment the URL in the url_list.txt which has searchClient=feeder and
authType=none parameter appended in it. Ensure that all other URLs are commented
if you need to provide feed only to “SAS Search and Indexing” Server with public
user.
For a Specific user:
Uncomment the URL in the url_list.txt which has searchClient=feeder and
userName=&password= parameters appended in it. Ensure that
all other URLs are commented if you need to provide feed only to “SAS Search and
Indexing” Server for a specific user.
Replace with the metadata username.
Replace with the password for the specified user.
Note : It is recommended that the password is given in the encoded format
using the procedure pwencode. For example, if the password is Welcome123,
execute the following command in a SAS session to encode the password:
proc pwencode IN = ‘Welcome123’;
run;
Use the output produced by this proc for the password.
After you have uncommented the respective URLs for a public user or a specific
user, modify the hostname and port in the URL on which Search Interface to SAS
Content is deployed.
For example, if you configure the URL for a specific user with username
“myuser” and password “Welcome123”, and you decided to encode the password, and
the hostname for Search Interface to SAS Content is
http://searchsas.mycompany.com and the port is 8080, then the configured URL
will be as follows:
http://searchsas.mycompany.com:8080/SASSearchService/Controller?forward=Search&
userName=myuser&password={sas002}835DA53542E057BA07B02C302BDC76360963D41A&search
Client=feeder
Step 3 - Run the loadindex Script with Required Parameters
After the URL has been modified in the url_list.txt, you can run the loadindex
script available in the installation home directory of Search Interface to SAS
Content. The extension of this script file will vary based on the platform on
which the search interface to SAS Content is installed: loadindex.exe for
Windows, loadindex for UNIX-based systems and loadindex.rexx for z/OS.
The loadindex script accepts the following command line arguments:
-filename : the name of the configuration file containing the URL of
Search Interface to SAS Content. This parameter needs to be specified only if
you have changed the configuration file name to a name other than url_list.txt.
-sassearchserverhost : the host name on which “SAS Search
and Indexing” proxy server is deployed. This parameter needs to be specified
only if the “SAS Search and Indexing” proxy server is deployed in any machine
other than the machine deployed with Search Interface to SAS Content (defaulted
to localhost).
-sassearchserverport : the port of the above “SAS Search
and Indexing” Proxy Server. This parameter needs to be specified only if the
“SAS Search and Indexing” proxy server is listening to any other port other than
default port (4001). For example, if the “SAS Search and Indexing” proxy server
is deployed on the host sassearchserver.mycompany.com, the “SAS Search and
Indexing” proxy server is listening on default port, and the configuration file
is url_list.txt (default name), then run the following command in the command
line
loadindex –sassearchserverhost sassearchserver.mycompany.com
Note: You can schedule a job to load the contents to the “SAS Search and
Indexing” server periodically using Windows Scheduler or in crontab of your
UNIX-based systems.
To get help about the command line options for loadindex, run the following
command:
Windows:
loadindex.exe –help
UNIX-based systems:
loadindex –help
z/OS:
loadindex.rexx –help
When the index has been loaded with SAS Content, you can perform a search in
Teragram Search Engine’s Search UI. Make sure that you select a search string
which matches some SAS content.
Note: In case a blank space exists in the path for the file url_list.txt,
provide the path in double quotes. For example, if the path to the file is C:\my
files\url_list.txt, then use the following command:
loadindex -filename “C:\my files\url_list.txt”
Advanced Configuration Option
* While running the loadindex application, if the contents are more, you may
need to increase the java heap size to avoid OutOfMemoryError. To increase the
java heap size, open the loadindex.ini file located in the Search Interface to
SAS Content installation home directory and add the following commands:
JavaArgs_7=-Xmx512m
JavaArgs_8=-Xms512m
Note: To provide more java heap size, you can change the heap size value from
512 to more appropriate value.
* If Search Interface to SAS Content is deployed in a web-app server configured
with https, then the SSL certificate has to be imported in the JRE on which the
loadindex application is running.
SAS and all other SAS Institute product or service names are registered
trademarks or trademarks of SAS Institute Inc. in the USA and other countries.
Other brand and product names are registered trademarks or trademarks of their
respective companies.
® indicates USA registration.
Copyright (c) 2010 SAS Institute Inc., Cary, NC, USA. All rights reserved.
30 June 2010