DataFlux Data Management Studio 2.7: User Guide

Prerequisites and Tips for the Address Verification (World 2) Node

Prerequisites

You must install Address Doctor 5 software before you can use a Address Verification (World 2) node in a data job. After you install Address Doctor 5 software, you must prepare two files that are required by the Address Verification (World 2) node : 

A sample parameters file, parameters.xml, is installed in the etc folder in the installation root for DataFlux Data Management Studio. Update that file with appropriate parameters for your site.

A sample configuration file, SetConfig.xml, is installed in the etc folder in the installation root for DataFlux Data Management Studio. Update that file with the unlock codes for Address Doctor 5 software and other information, as appropriate for your site.

There are some important items in the SetConfig.xml file. First, in the General section:

The unlock code section is the place for the AddressDoctor license. There is one line per Address Doctor license code provided.

The Database lines indicate: 

A quick note about preloading data. Partial preloading will load the metadata and indexing structures into memory. The reference data itself will remain on the hard drive. Partial preloading offers some performance enhancements and is an alternative when not enough memory is available to fully load the desired databases. Partial preloading may not be supported for all databases.

Full preloading will move the entire reference database into memory. This may need a significant amount of memory for countries with large databases such as the USA or the United Kingdom, but it will increase the processing speed significantly.

Since large amounts of memory may be allocated during preloading, with significant data amounts moved into memory, it might take some time to load the databases into memory. Databases will be preloaded in the order in which they are defined in the SetConfig.xml.

Performance Tips

Here is a summary of tips to optimize the performance of validation.

Install as much memory as possible to allow country databases to be fully pre-loaded into memory. At least as much memory as the size of the most often used country databases combined plus 256MB should be available. If all countries available from AddressDoctor are to be used simultaneously, add more memory to cover the entire size of all databases.

Preload at least the databases of frequently used countries with the proper parameters set in the SetConfig.xml.

When full preloading is not an option, store the database files on a fast hard disk or even better a SATA Solid State Disk (ideally exceeding 200MB/sec read transfer rate - for development purposes, high-speed USB or FireWire flash modules exceeding 30MB/sec read transfer rate might suffice). Especially the access latency (average seek time) should be minimized: Internal AddressDoctor benchmarks for “PreloadingType=NONE” with an Intel X25M G2 SATA SSD have shown a typical performance increase of a factor 20.

Keep the AddressDoctor reference databases on a separate hard drive. Read and write address data from other drives. Make absolutely sure to keep the database files defragmented, internal tests have shown that performance may easily decrease by as much as 35% when the files are heavily fragmented.

The AddressDoctor engine is very data-intensive, with a significant amount of non-localized memory accesses during processing: As such, it greatly benefits from direct multi-channel memory access (e.g. via Quick Path Interconnect or HyperTransport) with high bandwidth and low latency, combined with large processor caches, such as found in top-of-the line server processors.

Use high performance multi-core processors, like Intel Xeon X55xx/65xx/75xx and higher, AMD Opteron 24xx/84xx and higher or IBM POWER7 and higher. Provided there is enough memory available for full preloading, the processor clock frequency will directly determine the speed of address processing. See http://www.spec.org/cpu2006/results/rint2006.html for a comparison of integer processing throughput between different processor architectures.

When running batch processes without having a sufficient amount of memory installed, try to process records ordered by country with intermittent re-initialization of AddressDoctor using the appropriate pre-loading settings. The engine will also benefit from internal and OS caches for addresses sorted by country as compared to addresses in random order, as they would for instance occur in a Web Service environment.

See also any known issues for the Address Verification nodes in the Jobs section of the Usage Notes (FAQ) topic.

Logging XML Inputs and Outputs

You can capture XML inputs and outputs from the Address Verification (World 2) node in the Platform log. Add the following line of code to the platform.log.xml file:

<logger name="DF.StepEngine.Plugin.AD"> <level value="debug"/> </logger>

The same code can be added to the dmserver.log.xml file, the batch.log.xml file, or the service.log.xml file for server-side logging.

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfDMStd_Task_World2_Prereq.html