DataFlux Data Management Studio
The following questions and answers are designed to assist you when working with Data Management Studio. If you do not find your answer, please contact DataFlux Technical Support.
DataFlux Data Management Studio saves global options to a user.config file that is hidden by default under Microsoft Vista and Windows Server 2008. You must un-hide this file in order to save global options. The physical path to the file is as follows:
C:\Documents and Settings\<UserID>\Local Settings\Application Data\DataFlux\ProDesigner.vshost.exe_Url_<some_hash_code>
If you are using the DataFlux Data Management Server for SAS, then select the following global option:
If you access a table with ODBC, you might be prompted to log in to the database. If your screen does not repaint properly, try setting the Show window contents while dragging option for Microsoft Windows. Consult the documentation for your version of Windows for details about setting this option.
Not in the current release. Data Management Studio will not permit you to add or update file names that differ in case only. This means, for example, that you cannot give names to file-based objects – such as jobs and profiles – that differ in case only, such as Job1 and JOB1.
You can download a free copy of the SAS XML Mapper, along with the documentation, from http://support.sas.com/demosdownloads/setupcat.jsp?cat=Base+SAS+Software. The XML Mapper works with XML input and XML output and XML column input and XML column output. The SAS 9.2 SAS XML Mapper creates version 1.9 XML Maps, and the SAS 9.3 XML Mapper creates version 2.1 XML Maps.
If you plan to access a Data Management Server with Data Management Studio 2.3, then the server must be version 2.3 or later. If you must access a Data Management Server that is version 2.2 or earlier, then you must use a 2.2 or earlier version of Data Management Studio.
See Data Connection Usage Notes.
If you run a data job that includes either the Web Service node or the HTTP Request node, and you get time-out errors for either of these nodes, then you can do the following to increase the time-out value:
To change the default timeout value for a specific job that contains the Web Service node or the HTTP Request node, use the WSCP_CONFIG advanced property for these nodes to specify the path to a custom time-out configuration file.
To change the default timeout value for all jobs that use a Web Service node or an HTTP Request node, use the WEBSERVICE/CONFIG_FILE option in a system configuration file to specify the path to a custom time-out configuration file. See also Format for a Custom Time-out Configuration File.
Any temporary files that are generated by the job, such a sort files, are saved to the location that is specified in the BASE/TEMP option in a DataFlux configuration file, if this option has been set. If the BASE/TEMP option has not been set, temporary files will be written to the temporary file location that is specified for the operating system. To see if the BASE/TEMP option has been set, select Tools > Data Management Studio Options > Job > Advanced.
In data jobs, Data Input nodes and Data Output nodes support very long character fields for SAS data. They successfully work with 32K (32767 bytes) fields, which is the maximum length for character fields in SAS data sets. QKB-related nodes only process the first 256 characters and ignore the rest. Expression node string functions should work, including the mid() and len() functions. The 256 character limitation applies to regular expressions or QKB-related-functions.
In profiles, report metrics such as Data Type, Data Length, Unique Count, and Frequency Distribution are correct for strings up to 32K in length. Pattern Frequency Distribution only uses the first 254 characters instead of 256.
Configuration options for the QKB are set in the Quality Knowledge Base Engine section of the app.cfg file. For example, the QKB\PATH option enables you to specify the path to the QKB. The QKB/ON_DEMAND option determines whether the QKB is loaded on demand or all at once. By default, the option is set to YES. The QKB/ALLOW_INCOMPAT specifies how newer QKB definitions are handled. By default, this option is set to NO. You might want to change this option to YES if a profile or data exploration fails due to an incompatible (newer) QKB definition. The QKB\COMPATVER option enables you to specify the QKB version. Finally, the QKB\SURFACEALL determines whether all parse definitions are surfaced.
You can use Data Management Studio to change the QKB/ALLOW_INCOMPAT option. Click Tools in the main menu and select Options to display the Data Management Studio Options dialog. Click the General section of the dialog and update the checkbox for Allow use of incompatible Quality Knowledge Base definitions.
To change other QKB options, you must edit the app.cfg file. See the "Configuration" section of the Data Management Studio Installation and Configuration Guide.
DataFlux software and SAS data sets support different data types. Accordingly, automatic data-type conversions will take place when Data Management Studio software reads or writes SAS data sets. Also, nulls and missing values will be converted to other values. These changes can impact features that depend on particular data types. For example, when a profile reads a SAS data set, SAS fields with a format that applies to datetime values will be reported as datetime. SAS fields with a format that applies to time values will be reported as a time and SAS fields with a format that applies to date values will be reported as a date. As a result, the profile will not calculate some metrics such as Blank Count or Maximum Length for those fields.
The following data-type conversions are made automatically when DataFlux software, such as a data job or a profile, reads SAS data.
Nulls and missing values will be converted to other values, as follows.
The following data-type conversions are made automatically when a Data Management Studio data job writes SAS data.
DataFlux Input Data Type | SAS Output Data Type | ||
---|---|---|---|
Type | Length | Format | |
Boolean | num | 8 | |
date | num | 8 | datetime19.2 |
integer | num | 8 | |
real | num | 8 | |
string | char | 255 |
Some data job nodes require you to manually add all of the output fields before they can be propagated. These nodes include, Data Sorting, Data Union, and Cluster analysis, among others. Manually add all available fields by selecting the double-right arrow in the Output fields section of the node's Properties window. Once you manually add the fields, the fields will propagate correctly.
In some cases, when you run a profile that is stored in a Sybase repository that uses a 32-BIT Sybase Wire Protocol DSN, you might get an error that says “DataFlux [name of the ODBC Sybase Wire Protocol driver] [name of the SQL Server] ASE has run out of LOCKS.” If you get this error, one possible remedy is to raise the maximum lock count in Sybase. Another possible remedy is to raise the commit interval in Data Management Studio. To do that, open the properties dialog for the affected profile. Then select Tools > Profile Options. On the General tab, select the "Commit every N rows" checkbox. Specify a number such as 10.
You can set the WFE/ENGINE_THREAD_LIMIT configuration option to limit the number of engine threads within the application. Otherwise, the default is 0, which means unbounded, and the application defers to the system for the thread pool limits.
The reason for this behavior is that the float field is being bound as a string, which is typical behavior when the type of parameter can't be determined. The workaround is to turn on "Enable Extended Statement Information" in the Options tab of the Teradata driver. This will enable the driver to determine the parameter type and thus bind as float. Restart Data Management Studio for the changes to take effect.
You need to disable the auto-generation feature. which is enabled by default. You also need to make sure that you generate version 1.9 or version 2.1 maps. The default map version is 1.2.
Yes, Extract, Transform , and Load (ELT) is supported by using nodes in which you write database-specific SQL. This SQL is then pushed to the database.
In a data job, you can use the SQL Execute and Parameterized SQL Query nodes. In a process job, you can use the Create Table, Insert Rows, SQL Execute and Update Rows nodes. You could also consider the SQL Execute node in process jobs to be a general purpose node that enables you to build any SQL construct. The other SQL-based process nodes provide templates for more specific ELT tasks.
No. If you want to access tables in multiple database formats, use a DataFlux Federation Server data connection.
Yes, but it is better to use a single Address Verification (World 2) node. If you use multiple Address Verification (World) nodes in a job, all of the nodes will use the same country code as the first Address Verification (World) node in the flow. The best remedy is to use the Address Verification (World 2) node instead.
When you click the Add button in the properties window for the SQL Execute node, the Query Text window displays. This window has a limit of approximately 635 lines of SQL code.
If you are running an address verification job on Data Management Server on a UNIX platform, you might have run out of memory and received the following error: 'Job terminated due to error in one or more nodes: Verification Plugin Out of memory.' Also, when the VERIFY/PRELOAD option is active in the application configuration (app.cfg) file, it requires a large amount of memory. The workaround: If the VERIFY/PRELOAD=ALL option is active, and a substantial amount of memory is not allocated for this configuration, allocate more memory to prevent memory errors occurring on UNIX Data Management Servers. Note that this only applied to US or Canadian Address Verification nodes.
The properties window for a profile enables you to specify the size of the frequency distribution memory cache for the profile. To do so, open a profile, and then select Tools > Profile Options. Specify the desired cache size. However, in the current release, the value in the Frequency Distribution Memory Cache field is not applied to the profile until you click OK on the profile options window. Even if the field contains a correct custom value in the Distribution Memory Cache field, you must click OK to apply that value to the profile.
See Quality Knowledge Base Usage Notes.
Documentation Feedback: yourturn@sas.com
|
Doc ID: dfU_FAQ.html |