Usage Notes (FAQ)

You are here: Technical Support>Frequently Asked Questions

DataFlux Data Management Studio 2.6: User Guide

Usage Notes (FAQ)

The following questions and usage notes are designed to assist you when working with DataFlux Data Management Studio.

General

What causes this error? "You do not have permission to perform this action."

This message could mean that you do not have write access to DataFlux Data Management Studio resource files. For more information, see Read and Write Access to the etc Folder.

I cannot save global options for DataFlux Data Management Studio under Microsoft Vista or Windows Server 2008. Why?

DataFlux Data Management Studio saves global options to a user.config file that is hidden by default under Microsoft Vista and Windows Server 2008. You must un-hide this file in order to save global options. The physical path to the file is as follows:

C:\Documents and Settings\<UserID>\Local Settings\Application Data\DataFlux\ProDesigner.vshost.exe_Url_<some_hash_code>

Are there any options that I need to set for the DataFlux Data Management Server for SAS?

If you are using the DataFlux Data Management Server for SAS, then select the following global option:

From the main DataFlux Data Management Studio window, select Tools > Data Management Studio Options. The global options dialog will display.
Select the Data Management Server section.
Select the checkbox: Connect to Data Management Server for SAS.
Click OK to save your changes.

Why does my screen not repaint when I am prompted to log into a table with ODBC?

If you access a table with ODBC, you might be prompted to log in to the database. If your screen does not repaint properly, try setting the Show window contents while dragging option for Microsoft Windows. Consult the documentation for your version of Windows for details about setting this option.

Can Data Management Studio distinguish between file names that differ only in case?

Not in the current release. Data Management Studio will not permit you to add or update file names that differ in case only. This means, for example, that you cannot give names to file-based objects such as jobs and profiles that differ in case only, such as Job1 and JOB1.

How can I obtain a copy of the SAS XML Mapper?

You can download a free copy of the SAS XML Mapper, along with the documentation, from http://support.sas.com/demosdownloads/setupcat.jsp?cat=Base+SAS+Software. The XML Mapper works with XML input and XML output and XML column input and XML column output. The SAS 9.2 SAS XML Mapper creates version 1.9 XML Maps, and the SAS 9.3 XML Mapper creates version 2.1 XML Maps.

What version of Data Management Server must I use with Data Management Studio version 2.3 and later?

If you plan to access a DataFlux Data Management Server with DataFlux Data Management Studio 2.3, then the server must be version 2.3 or later. If you must access a DataFlux Data Management Server that is version 2.2 or earlier, then you must use a 2.2 or earlier version of DataFlux Data Management Studio.

Jobs, Profiles, Data Explorations

Performance Tips for Profiles

Do a sample profile first to make a determination of what metrics to use for what attributes.
The majority of repository storage space is used in storing the frequency distribution results.
Be judicious in which columns need frequency distributions.
Frequency distribution of unique attributes will grow your repository quickly
Pattern frequency distribution will grow your repository VERY quickly if there is a large variability of patterns.
Avoid frequency distribution on primary key. These measurements will consume more disk space, and they will typically will not be very helpful due to their high variability.
Avoid both types of frequency distributions if they yield results in excess of 40% of the data.
Primary/foreign key analysis requires some of the primary keys to have frequency distribution performed.
Redundancy analysis requires some of the primary keys to have frequency distribution performed.
Allocate 10-15% of the total repository space needed, Do not allocate repository space up front. Archive profile reports after they are no longer needed.
For large tables, use sampling (such as profile every 5th row ).

Estimating the space required for a single profile report. For most tables, we can assume the breakdown of “uniqueness” of values is as follows:

N = total number of columns

U = number of columns that contain 100% unique data

50% of N columns contain 40%-70% (55% avg.) unique data

25% of N columns contain 10% or less unique data

25% of M columns contain 90% or greater unique data

With those assumptions we can work out a calculation that the estimated space will be 52.5 + U/N = X, where X is the percent of the size of the source table.

Profiling Frequency Distributions with a Large number of Unique Values

If you use a Profile to calculate frequency distributions with a large number of unique values, use the BASE/REPOS_FREQ_CACHE_MAX configuration option to limit the number of values read into memory. Otherwise you might run out of memory.

Handling Large XML Files with XML Input Nodes or XML Output Nodes

DataFlux Data Management Studio jobs with an XML Input node or an XML Output node can run out of memory if you are accessing a large XML file. If this happens, add the following options to the JAVA/COMMAND configuration setting. This setting is specified in the DataFlux app.cfg file on the computer that executes the job. The following example shows how to set the minimum heap size to 256Mb and the maximum heap size to 1Gb:

JAVA/COMMAND="C:\Program Files (x86)\Java\jre7\bin\java.exe" -Xms256M -Xmx1G

Out-of-Memory Errors for Web Service Nodes and HTTP Request Nodes

DataFlux Data Management Studio jobs with a Web Service node or an HTTP Request node can run out of memory if you are using a Sun Java Virtual Machine (JVM). If this happens, add the following Java options to the JAVA/COMMAND configuration setting. This setting is specified in the DataFlux app.cfg file on the computer that executes the job.

The Java options to add are: -XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled

Web Service Node: Handling Input That Includes Characters Reserved for XML

Some characters are reserved characters for XML, such as '<', '>', and '&.' However, these characters can be used in other ways. For example, the phrase "Thomas & Thomas" might be a company name, not XML. There are two options that enable the Web Service node to distinguish between XML fragments and other strings that happen to use some of the reserved characters for XML. For example, these options would enable the string "Thomas & Thomas" to be passed to the Web Service node and not be mistaken for an invalid XML fragment.

WEBSERVICE/VALUE_IS_XML is a global option. The default is TRUE. If the value is TRUE, then values passed to a Web Service node are treated as XML fragments, and no additional processing is performed on the values. If the value is FALSE, then values passed to a Web Service node are treated as element content. In that case, '<', '>', and '&' characters are replaced with their corresponding references "<", ">", and "&". Typically you will not change the default setting for this option. This option can be set in the app.cfg files for DataFlux Data Management Studio or DataFlux Data Management Server. If the option is not set explicitly, then the default for this option is TRUE.

In addition to the global option, the Web Service node has an advanced property, VALUE_IS_XML. You can use this option to override the global setting in the WEBSERVICE/VALUE_IS_XML option. This will change the way a specific Web Service node will process XML reserved characters.

To set the local VALUE_IS_XML option, display the Advanced Properties for a Web Service node. Right-click the INPUT Advanced Property, then select Edit Default Value. The Default Values Input dialog displays. Click New Row to specify a default value. The VALUE_IS_XML option is specified in the column named VALUE_IS_XML. The possible are the same as those described for WEBSERVICE/VALUE_IS_XML.

Queue Jobs for Execution

By default, if you attempt to run more jobs in parallel than a DataFlux Data Management Server will allow, the excess jobs are terminated with an error. However, you can use the DMSERVER/JOBS_QUEUE option to create a queue for jobs on the server. Excess jobs are placed in the queue. The jobs will execute as soon as a slot becomes available on the server. This option applies to data jobs and process jobs but not profiles. For more information, see the documentation for the DMSERVER/JOBS_QUEUE option in the DataFlux Data Management Server: Administrator's Guide.

What Can I Do About Time-Out Errors in Data Jobs with the Web Service Node or the HTTP Request Node?

If you run a data job that includes either the Web Service node or the HTTP Request node, and you get time-out errors for either of these nodes, then you can do the following to increase the time-out value:

create a custom time-out configuration file (an XML file) in a location that is accessible to any Web Service nodes and HTTP Request nodes in your job; use this file as a model: Format for a Custom Timeout Configuration File
increase the connectionTimeout value in that file to some value over the default of 60000

To change the default timeout value for a specific job that contains the Web Service node or the HTTP Request node, use the WSCP_CONFIG advanced property for these nodes to specify the path to a custom time-out configuration file.

To change the default timeout value for all jobs that use a Web Service node or an HTTP Request node, use the WEBSERVICE/CONFIG_FILE option in a system configuration file to specify the path to a custom time-out configuration file. See also Format for a Custom Time-out Configuration File.

Where do DataFlux Data Management Studio jobs write temporary files?

Any temporary files that are generated by the job, such a sort files, are saved to the location that is specified in the BASE/TEMP option in a DataFlux configuration file, if this option has been set. If the BASE/TEMP option has not been set, temporary files will be written to the temporary file location that is specified for the operating system. To see if the BASE/TEMP option has been set, select Tools > Data Management Studio Options > Job > Advanced.

How can I run DataFlux Data Management Studio with remote access software?

When you use remote access software such as Citrix or Microsoft RemoteApps to run Data Management Studio as a published application (not streamed), you must set the BASE/MAINTAIN_GROUP=YES option in the app.cfg file on the computer where the Data Management Studio jobs are executed. If you do not set this option, the jobs will not execute due to a "CreateProcess() Access Denied" error.

What is the maximum length for character variables (such as column names) in DataFlux Data Management Studio?

In data jobs, Data Input nodes and Data Output nodes support very long character fields for SAS data. They successfully work with 32K (32767 bytes) fields, which is the maximum length for character fields in SAS data sets. QKB-related nodes only process the first 256 characters and ignore the rest. Expression node string functions should work, including the mid() and len() functions. The 256 character limitation applies to regular expressions or QKB-related-functions.

In profiles, report metrics such as Data Type, Data Length, Unique Count, and Frequency Distribution are correct for strings up to 32K in length. Pattern Frequency Distribution only uses the first 254 characters instead of 256.

How can I specify Quality Knowledge Base options for profiles and data explorations?

Configuration options for the QKB are set in the Quality Knowledge Base Engine section of the app.cfg file. For example, the QKB\PATH option enables you to specify the path to the QKB. The QKB/ON_DEMAND option determines whether the QKB is loaded on demand or all at once. By default, the option is set to YES. The QKB/ALLOW_INCOMPAT specifies how newer QKB definitions are handled. By default, this option is set to NO. You might want to change this option to YES if a profile or data exploration fails due to an incompatible (newer) QKB definition. The QKB\COMPATVER option enables you to specify the QKB version. Finally, the QKB\SURFACEALL determines whether all parse definitions are surfaced.

You can use DataFlux Data Management Studio to change the QKB/ALLOW_INCOMPAT option. Click Tools in the main menu and select Options to display the Data Management Studio Options dialog. Click the General section of the dialog and update the checkbox for Allow use of incompatible Quality Knowledge Base definitions.

To change other QKB options, you must edit the app.cfg file. See the "Configuration" section of the Data Management Studio Installation and Configuration Guide.

How are SAS Data Types Converted When DataFlux Software Reads or Writes SAS Data?

DataFlux software and SAS data sets support different data types. Accordingly, automatic data-type conversions will take place when Data Management Studio software reads or writes SAS data sets. Also, nulls and missing values will be converted to other values. These changes can impact features that depend on particular data types. For example, when a profile reads a SAS data set, SAS fields with a format that applies to datetime values will be reported as datetime. SAS fields with a format that applies to time values will be reported as a time and SAS fields with a format that applies to date values will be reported as a date. As a result, the profile will not calculate some metrics such as Blank Count or Maximum Length for those fields.

The following data-type conversions are made automatically when DataFlux software, such as a data job or a profile, reads SAS data.

For jobs: SAS numeric columns with a format that applies to date, time or datetime values will be converted to a DataFlux field of type date.
For profiles: SAS fields with a format that applies to datetime values will be reported as datetime. SAS fields with a format that applies to time values will be reported as a time and SAS fields with a format that applies to date values will be reported as a date. Other SAS numeric columns will be converted to a DataFlux field of type real.
SAS character columns will be converted to a DataFlux field of type string with the same length as the SAS character column.

Nulls and missing values will be converted to other values, as follows.

SAS missing values will be converted to DataFlux null values. SAS special numeric missing values, whether they are specified with the MISSING statement in a SAS DATA step or with a dot followed by a letter or underscore, are also converted to null values.
DataFlux null values will be converted to SAS missing values.
A DataFlux field of type string that contains a blank will be converted to a SAS character field containing a blank. This blank will be interpreted by SAS as a missing value.

The following data-type conversions are made automatically when a Data Management Studio data job writes SAS data.

DataFlux Input Data Type	SAS Output Data Type
	Type	Length	Format
Boolean	num	8
date	num	8	datetime19.2
integer	num	8
real	num	8
string	char	255

Why are my fields not propagating automatically even though I set the Tools > Options > Job > Output Fields to "All"?

Some data job nodes require you to manually add all of the output fields before they can be propagated. These nodes include, Data Sorting, Data Union, and Cluster analysis, among others. Manually add all available fields by selecting the double-right arrow in the Output fields section of the node's Properties window. Once you manually add the fields, the fields will propagate correctly.

I get an error that says that the "ASE has run out of LOCKS." What does this mean and how can I fix it?

In some cases, when you run a profile that is stored in a Sybase repository that uses a 32-BIT Sybase Wire Protocol DSN, you might get an error that says "DataFlux [name of the ODBC Sybase Wire Protocol driver] [name of the SQL Server] ASE has run out of LOCKS." If you get this error, one possible remedy is to raise the maximum lock count in Sybase. Another possible remedy is to raise the commit interval in Data Management Studio. To do that, open the properties dialog for the affected profile. Then select Tools > Profile Options. On the General tab, select the "Commit every N rows" checkbox. Specify a number such as 10.

I have a job that calls a fork, which calls 5 data jobs, and then the process job calls a parallel iterator that tries to use 9 threads. Is there a maximum number of available threads?

You can set the WFE/ENGINE_THREAD_LIMIT configuration option to limit the number of engine threads within the application. Otherwise, the default is 0, which means unbounded, and the application defers to the system for the thread pool limits.

When I run a profile job, it fails when writing long floating point numbers to Teradata float columns.

The reason for this behavior is that the float field is being bound as a string, which is typical behavior when the type of parameter can't be determined. The workaround is to turn on "Enable Extended Statement Information" in the Options tab of the Teradata driver. This will enable the driver to determine the parameter type and thus bind as float. Restart Data Management Studio for the changes to take effect.

How can I make the SAS XML Mapper work with the XML data job nodes?

You need to disable the auto-generation feature. which is enabled by default. You also need to make sure that you generate version 1.9 or version 2.1 maps. The default map version is 1.2.

Do Data Management Studio jobs support Extract, Transform , and Load (ELT)?

Yes, Extract, Transform , and Load (ELT) is supported by using nodes in which you write database-specific SQL. This SQL is then pushed to the database.

In a data job, you can use the SQL Execute and Parameterized SQL Query nodes. In a process job, you can use the Create Table, Insert Rows, SQL Execute and Update Rows nodes. You could also consider the SQL Execute node in process jobs to be a general purpose node that enables you to build any SQL construct. The other SQL-based process nodes provide templates for more specific ELT tasks.

Do the SQL nodes in Data Management Studio support multiple database formats in one query? One query would access tables in Oracle, DB2, and SAS, for example?

No. If you want to access tables in multiple database formats, use a DataFlux Federation Server data connection.

Can I use multiple Address Verification (World) nodes in a data job?

Yes, but it is better to use a single Address Verification (World 2) node. If you use multiple Address Verification (World) nodes in a job, all of the nodes will use the same country code as the first Address Verification (World) node in the flow. The best remedy is to use the Address Verification (World 2) node instead.

What is the maximum number of lines that can be entered into the Query Text Window in the SQL Execute node?

When you click the Add button in the properties window for the SQL Execute node, the Query Text window displays. This window has a limit of approximately 635 lines of SQL code.

When running an address verification job on Data Management Server on a UNIX platform, I received a memory error. How can I fix this?

If you are running an address verification job on Data Management Server on a UNIX platform, you might have run out of memory and received the following error: 'Job terminated due to error in one or more nodes: Verification Plugin Out of memory.' Also, when the VERIFY/PRELOAD option is active in the application configuration (app.cfg) file, it requires a large amount of memory. The workaround: If the VERIFY/PRELOAD=ALL option is active, and a substantial amount of memory is not allocated for this configuration, allocate more memory to prevent memory errors occurring on UNIX Data Management Servers. Note that this only applied to US or Canadian Address Verification nodes.

How can I specify the size of the frequency distribution memory cache for a profile?

The properties window for a profile enables you to specify the size of the frequency distribution memory cache for the profile. To do so, open a profile, and then select Tools > Profile Options. Specify the desired cache size.

What level of precision is required for frequency drill-through in a profile?

Frequency drill-through in profiles depends on the internal date object, which only supports millisecond precision. Drill-through does not work with higher levels of precision. To specify this level of precision, set the BASE/DATE_FORMAT option to ISO8601. Make sure that this setting is set to the same value on all of the systems that run and view the profiles. As a best practice, you should set your datetime fields to millisecond or lower precision and use the ISO8601 format.

How can I create a data job that has an inner data job that reads a data source?

You can include an External Data Provider node in the outer data job. This External Data Provider node must be used when you want to create a data job that has an inner data job that reads a data source. If you do not include an External Data Provider node in this scenario, the following error is displayed: Error: Data flow - Job has no external provider steps.

Can I join an Oracle NUMBER with an INTEGER (Job Specific Data) if Memory Load Option is set to Right or Left (Data Joining node)?

You cannot use the Data Joining node to join two columns from an Oracle table if one column is of type NUMBER and the other column is of type INTEGER. To make the join work, add a Field Layout node directly before the Data Joining node on the side that comes from the Oracle table. Add all the fields. Then access the Advanced Properties for the Field Layout node and double-click the FIELD_MAP default value. For the field (column) that you are joining, in the TYPE_OVERRIDE column, enter INTEGER.

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfU_FAQ.html