Feeding SAS® Contents to the Index of “SAS Search and Indexing” Server using Search Interface to SAS® Content Search Interface to SAS Content supports feeding SAS contents to the index of “SAS Search and Indexing” Server. To feed the contents follow the steps below. Step 1 - Configure “SAS Search and Indexing” Server to Support Feeding SAS Contents into the Index This section assumes that the “SAS Search and Indexing” Server is installed/configured and running. Configuring “SAS Search and Indexing” Server to support feeding SAS Contents into the index involves two steps: 1. Configure the parser in pipeline server. One of the first stages in the document processing pipeline is XML parsing. You need to specify the correct XML format to the “SAS Search and Indexing” pipeline server so that the parser can parse the input documents. Search Interface to SAS Content feeds the SAS Contents in the format defined below. You will have to configure the pipeline server with this schema using the “SAS Search and Indexing” admin module. <description/> <link/> <sastype/> <metadatacreated/> <metadataupdated/> <sasowner/> <keywords/> <saspath/> <informationmaps/> <dataitems/> <charttypes/> <datalabels/> <loaddate/> </item> Note: The XML format specified above contains all elements. If you want to exclude any of the elements from indexing, you can remove them from this xml schema while configuring the pipeline parser. 2. Configure the indexing server. The index configuration or schema is used by the search engine to determine how to treat the different fields in the input XML documents (that is, which among those are searchable, which are the metadata fields, and which fields are used to set constraints on the results). Any changes to these fields would require a restart of the module. a. Standard Fields: These fields are searchable. In other words, the content of these fields is indexed and when you query the index for any keyword, the contents of these fields are searched for matches. b. Boolean Fields: The contents of these fields can be used to set constraints on the result set. For example, if you are searching for the keyword "Sales" and you want to restrict the results to belong to only the category "Report" then you can do so by setting the “sastype” field as a Boolean field. c. Info Fields: These are generally metadata fields. These are the fields that you want returned with the results, such as the “title” and “description.” For our application, we send all the elements. We do not select the field “link” because it has selected for URL. The URL element is always sent by default. d. URL Field: The URL field is the field in the input documents that has a unique value for each document. This is to be treated as the ID or identifier. There will only be one URL field per index schema. In our case, it is “link.” The following are the potential options that can be specified in the indexing server for various XML elements. However this can be modified as needed. Element Name Standard Field Boolean Field Info Fields URL Field title Yes Yes description Yes Yes link Yes Yes sastype Yes Yes Yes metadatacreated Yes metadataupdated Yes sasowner Yes Yes Yes keywords Yes Yes Yes saspath Yes Yes informationmaps Yes Yes Yes dataitems Yes Yes Yes charttypes Yes Yes Yes datalabels Yes Yes Yes loaddate Yes Step 2 - Revise the Configuration Information in url_list.txt To load SAS Contents to the index of “SAS Search and Indexing” Server, the index loading application requires configuring the Search Interface to SAS Content URL in the url_list.txt file available in the Search Interface to the SAS Content installation home directory. While configuring the URL, you can either specify the user credential in the URL, or configure to load only public reports to the index of “SAS Search and Indexing” Server. If you specify the user credential in the URL, the index loading application will load all contents accessible for that user to the index. If you configure to load the public reports, the index loading application loads all contents accessible for SAS Web Anonymous user to the index. Note: If you provide the user name and password in the URL, all users will have the access to see all the results accessible for this user. To configure the URL for public reports or for a specific user, follow the respective sections below. For public user: Uncomment the URL in the url_list.txt which has searchClient=feeder and authType=none parameter appended in it. Ensure that all other URLs are commented if you need to provide feed only to “SAS Search and Indexing” Server with public user. For a Specific user: Uncomment the URL in the url_list.txt which has searchClient=feeder and userName=<userName>&password=<password> parameters appended in it. Ensure that all other URLs are commented if you need to provide feed only to “SAS Search and Indexing” Server for a specific user. Replace <userName> with the metadata username. Replace <password> with the password for the specified user. Note : It is recommended that the password is given in the encoded format using the procedure pwencode. For example, if the password is Welcome123, execute the following command in a SAS session to encode the password: proc pwencode IN = ‘Welcome123’; run; Use the output produced by this proc for the password. After you have uncommented the respective URLs for a public user or a specific user, modify the hostname and port in the URL on which Search Interface to SAS Content is deployed. For example, if you configure the URL for a specific user with username “myuser” and password “Welcome123”, and you decided to encode the password, and the hostname for Search Interface to SAS Content is http://searchsas.mycompany.com and the port is 8080, then the configured URL will be as follows: http://searchsas.mycompany.com:8080/SASSearchService/Controller?forward=Search& userName=myuser&password={sas002}835DA53542E057BA07B02C302BDC76360963D41A&search Client=feeder Step 3 - Run the loadindex Script with Required Parameters After the URL has been modified in the url_list.txt, you can run the loadindex script available in the installation home directory of Search Interface to SAS Content. The extension of this script file will vary based on the platform on which the search interface to SAS Content is installed: loadindex.exe for Windows, loadindex for UNIX-based systems and loadindex.rexx for z/OS. The loadindex script accepts the following command line arguments: -filename <filename>: the name of the configuration file containing the URL of Search Interface to SAS Content. This parameter needs to be specified only if you have changed the configuration file name to a name other than url_list.txt. -sassearchserverhost <sassearchserverhost>: the host name on which “SAS Search and Indexing” proxy server is deployed. This parameter needs to be specified only if the “SAS Search and Indexing” proxy server is deployed in any machine other than the machine deployed with Search Interface to SAS Content (defaulted to localhost). -sassearchserverport <sassearchserverport>: the port of the above “SAS Search and Indexing” Proxy Server. This parameter needs to be specified only if the “SAS Search and Indexing” proxy server is listening to any other port other than default port (4001). For example, if the “SAS Search and Indexing” proxy server is deployed on the host sassearchserver.mycompany.com, the “SAS Search and Indexing” proxy server is listening on default port, and the configuration file is url_list.txt (default name), then run the following command in the command line loadindex –sassearchserverhost sassearchserver.mycompany.com Note: You can schedule a job to load the contents to the “SAS Search and Indexing” server periodically using Windows Scheduler or in crontab of your UNIX-based systems. To get help about the command line options for loadindex, run the following command: Windows: loadindex.exe –help UNIX-based systems: loadindex –help z/OS: loadindex.rexx –help When the index has been loaded with SAS Content, you can perform a search in Teragram Search Engine’s Search UI. Make sure that you select a search string which matches some SAS content. Note: In case a blank space exists in the path for the file url_list.txt, provide the path in double quotes. For example, if the path to the file is C:\my files\url_list.txt, then use the following command: loadindex -filename “C:\my files\url_list.txt” Advanced Configuration Option * While running the loadindex application, if the contents are more, you may need to increase the java heap size to avoid OutOfMemoryError. To increase the java heap size, open the loadindex.ini file located in the Search Interface to SAS Content installation home directory and add the following commands: JavaArgs_7=-Xmx512m JavaArgs_8=-Xms512m Note: To provide more java heap size, you can change the heap size value from 512 to more appropriate value. * If Search Interface to SAS Content is deployed in a web-app server configured with https, then the SSL certificate has to be imported in the JRE on which the loadindex application is running. SAS and all other SAS Institute product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. Other brand and product names are registered trademarks or trademarks of their respective companies. ® indicates USA registration. Copyright (c) 2010 SAS Institute Inc., Cary, NC, USA. All rights reserved. 30 June 2010