DataFlux Data Management Studio 2.7: User Guide
The following questions and usage notes are designed to assist you when working with DataFlux Data Management Studio.
When you move or copy DataFlux Data Management Studio entities such as profiles, jobs, and connections, you need to move all of the software, licenses, DSN connections, repositories, and other resources that are required for them to perform properly. For example, if you move or copy a Federation Server connection with saved user credentials, you also must move or copy the credentials file.
DataFlux Data Management Studio fields cannot hold very large strings without truncation. These very large strings are truncated to 65,536 characters.
This message could mean that you do not have write access to DataFlux Data Management Studio resource files. For more information, see Read and Write Access to the etc Folder.
DataFlux Data Management Studio saves global options to a user.config file that is hidden by default under Microsoft Vista and Windows Server 2008. You must un-hide this file in order to save global options. The physical path to the file is as follows:
C:\Documents and Settings\<UserID>\Local Settings\Application Data\DataFlux\ProDesigner.vshost.exe_Url_<some_hash_code>
If you are using the DataFlux Data Management Server for SAS, then select the following global option:
If you access a table with ODBC, you might be prompted to log in to the database. If your screen does not repaint properly, try setting the Show window contents while dragging option for Microsoft Windows. Consult the documentation for your version of Windows for details about setting this option.
Not in the current release. Data Management Studio will not permit you to add or update file names that differ in case only. This means, for example, that you cannot give names to file-based objects such as jobs and profiles that differ in case only, such as Job1 and JOB1.
You can download a free copy of the SAS XML Mapper, along with the documentation, from http://support.sas.com/demosdownloads/setupcat.jsp?cat=Base+SAS+Software. The XML Mapper works with XML input and XML output and XML column input and XML column output. The SAS 9.2 SAS XML Mapper creates version 1.9 XML Maps, and the SAS 9.3 XML Mapper creates version 2.1 XML Maps.
If you plan to access a DataFlux Data Management Server with DataFlux Data Management Studio 2.3, then the server must be version 2.3 or later. If you must access a DataFlux Data Management Server that is version 2.2 or earlier, then you must use a 2.2 or earlier version of DataFlux Data Management Studio.
If you use a Profile to calculate frequency distributions with a large number of unique values, use the BASE/REPOS_FREQ_CACHE_MAX configuration option to limit the number of values read into memory. Otherwise you might run out of memory.
DataFlux Data Management Studio jobs with an XML Input node or an XML Output node can run out of memory if you are accessing a large XML file. If this happens, add the following options to the JAVA/COMMAND configuration setting. This setting is specified in the DataFlux app.cfg file on the computer that executes the job. The following example shows how to set the minimum heap size to 256Mb and the maximum heap size to 1Gb:
JAVA/COMMAND="C:\Program Files (x86)\Java\jre7\bin\java.exe" -Xms256M -Xmx1G
DataFlux Data Management Studio jobs with a Web Service node or an HTTP Request node can run out of memory if you are using a Sun Java Virtual Machine (JVM). If this happens, add the following Java options to the JAVA/COMMAND configuration setting. This setting is specified in the DataFlux app.cfg file on the computer that executes the job.
The Java options to add are: -XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled
Some characters are reserved characters for XML, such as '<', '>', and '&.' However, these characters can be used in other ways. For example, the phrase "Thomas & Thomas" might be a company name, not XML. There are two options that enable the Web Service node to distinguish between XML fragments and other strings that happen to use some of the reserved characters for XML. For example, these options would enable the string "Thomas & Thomas" to be passed to the Web Service node and not be mistaken for an invalid XML fragment.
WEBSERVICE/VALUE_IS_XML is a global option. The default is TRUE. If the value is TRUE, then values passed to a Web Service node are treated as XML fragments, and no additional processing is performed on the values. If the value is FALSE, then values passed to a Web Service node are treated as element content. In that case, '<', '>', and '&' characters are replaced with their corresponding references "<", ">", and "&". Typically you will not change the default setting for this option. This option can be set in the app.cfg files for DataFlux Data Management Studio or DataFlux Data Management Server. If the option is not set explicitly, then the default for this option is TRUE.
In addition to the global option, the Web Service node has an advanced property, VALUE_IS_XML. You can use this option to override the global setting in the WEBSERVICE/VALUE_IS_XML option. This will change the way a specific Web Service node will process XML reserved characters.
To set the local VALUE_IS_XML option, display the Advanced Properties for a Web Service node. Right-click the INPUT Advanced Property, then select Edit Default Value. The Default Values Input dialog displays. Click New Row to specify a default value. The VALUE_IS_XML option is specified in the column named VALUE_IS_XML. The possible are the same as those described for WEBSERVICE/VALUE_IS_XML.
By default, if you attempt to run more jobs in parallel than a DataFlux Data Management Server will allow, the excess jobs are terminated with an error. However, you can use the DMSERVER/JOBS_QUEUE option to create a queue for jobs on the server. Excess jobs are placed in the queue. The jobs will execute as soon as a slot becomes available on the server. This option applies to data jobs and process jobs but not profiles. For more information, see the documentation for the DMSERVER/JOBS_QUEUE option in the DataFlux Data Management Server: Administrator's Guide.
If you run a data job that includes either the Web Service node or the HTTP Request node, and you get time-out errors for either of these nodes, then you can use the REQUEST_TIMEOUT advanced property.
Any temporary files that are generated by the job, such a sort files, are saved to the location that is specified in the BASE/TEMP option in a DataFlux configuration file, if this option has been set. If the BASE/TEMP option has not been set, temporary files will be written to the temporary file location that is specified for the operating system. To see if the BASE/TEMP option has been set, select Tools > Data Management Studio Options > Job > Advanced.
When you use remote access software such as Citrix or Microsoft RemoteApps to run Data Management Studio as a published application (not streamed), you must set the BASE/MAINTAIN_GROUP=YES option in the app.cfg file on the computer where the Data Management Studio jobs are executed. If you do not set this option, the jobs will not execute due to a "CreateProcess() Access Denied" error.
In data jobs, Data Input nodes and Data Output nodes support very long character fields for SAS data. They successfully work with 32K (32767 bytes) fields, which is the maximum length for character fields in SAS data sets. QKB-related nodes only process the first 256 characters and ignore the rest. Expression node string functions should work, including the mid() and len() functions. The 256 character limitation applies to regular expressions or QKB-related-functions.
In profiles, report metrics such as Data Type, Data Length, Unique Count, and Frequency Distribution are correct for strings up to 32K in length. Pattern Frequency Distribution only uses the first 254 characters instead of 256.
Configuration options for the QKB are set in the Quality Knowledge Base Engine section of the app.cfg file. For example, the QKB\PATH option enables you to specify the path to the QKB. The QKB/ON_DEMAND option determines whether the QKB is loaded on demand or all at once. By default, the option is set to YES. The QKB/ALLOW_INCOMPAT specifies how newer QKB definitions are handled. By default, this option is set to NO. You might want to change this option to YES if a profile or data exploration fails due to an incompatible (newer) QKB definition. The QKB\COMPATVER option enables you to specify the QKB version. Finally, the QKB\SURFACEALL determines whether all parse definitions are surfaced.
You can use DataFlux Data Management Studio to change the QKB/ALLOW_INCOMPAT option. Click Tools in the main menu and select Options to display the Data Management Studio Options dialog. Click the General section of the dialog and update the checkbox for Allow use of incompatible Quality Knowledge Base definitions.
To change other QKB options, you must edit the app.cfg file. See the "Configuration" section of the Data Management Studio Installation and Configuration Guide.
DataFlux software and SAS data sets support different data types. Accordingly, automatic data-type conversions will take place when Data Management Studio software reads or writes SAS data sets. Also, nulls and missing values will be converted to other values. These changes can impact features that depend on particular data types. For example, when a profile reads a SAS data set, SAS fields with a format that applies to datetime values will be reported as datetime. SAS fields with a format that applies to time values will be reported as a time and SAS fields with a format that applies to date values will be reported as a date. As a result, the profile will not calculate some metrics such as Blank Count or Maximum Length for those fields.
The following data-type conversions are made automatically when DataFlux software, such as a data job or a profile, reads SAS data.
Nulls and missing values will be converted to other values, as follows.
The following data-type conversions are made automatically when a Data Management Studio data job writes SAS data.
DataFlux Input Data Type | SAS Output Data Type | ||
---|---|---|---|
Type | Length | Format | |
Boolean | num | 8 | |
date | num | 8 | datetime19.2 |
integer | num | 8 | |
real | num | 8 | |
string | char | 255 |
Some data job nodes require you to manually add all of the output fields before they can be propagated. These nodes include, Data Sorting, Data Union, and Cluster analysis, among others. Manually add all available fields by selecting the double-right arrow in the Output fields section of the node's Properties window. Once you manually add the fields, the fields will propagate correctly.
In some cases, when you run a profile that is stored in a Sybase repository that uses a 32-BIT Sybase Wire Protocol DSN, you might get an error that says "DataFlux [name of the ODBC Sybase Wire Protocol driver] [name of the SQL Server] ASE has run out of LOCKS." If you get this error, one possible remedy is to raise the maximum lock count in Sybase. Another possible remedy is to raise the commit interval in Data Management Studio. To do that, open the properties dialog for the affected profile. Then select Tools > Profile Options. On the General tab, select the "Commit every N rows" checkbox. Specify a number such as 10.
You can set the WFE/ENGINE_THREAD_LIMIT configuration option to limit the number of engine threads within the application. Otherwise, the default is 0, which means unbounded, and the application defers to the system for the thread pool limits.
The reason for this behavior is that the float field is being bound as a string, which is typical behavior when the type of parameter can't be determined. The workaround is to turn on "Enable Extended Statement Information" in the Options tab of the Teradata driver. This will enable the driver to determine the parameter type and thus bind as float. Restart Data Management Studio for the changes to take effect.
You need to disable the auto-generation feature. which is enabled by default. You also need to make sure that you generate version 1.9 or version 2.1 maps. The default map version is 1.2.
Yes, Extract, Transform , and Load (ELT) is supported by using nodes in which you write database-specific SQL. This SQL is then pushed to the database.
In a data job, you can use the SQL Execute and Parameterized SQL Query nodes. In a process job, you can use the Create Table, Insert Rows, SQL Execute and Update Rows nodes. You could also consider the SQL Execute node in process jobs to be a general purpose node that enables you to build any SQL construct. The other SQL-based process nodes provide templates for more specific ELT tasks.
No. If you want to access tables in multiple database formats, use a DataFlux Federation Server data connection.
Yes, but it is better to use a single Address Verification (World 2) node. If you use multiple Address Verification (World) nodes in a job, all of the nodes will use the same country code as the first Address Verification (World) node in the flow. The best remedy is to use the Address Verification (World 2) node instead.
When you click the Add button in the properties window for the SQL Execute node, the Query Text window displays. This window has a limit of approximately 635 lines of SQL code.
If you are running an address verification job on Data Management Server on a UNIX platform, you might have run out of memory and received the following error: 'Job terminated due to error in one or more nodes: Verification Plugin Out of memory.' Also, when the VERIFY/PRELOAD option is active in the application configuration (app.cfg) file, it requires a large amount of memory. The workaround: If the VERIFY/PRELOAD=ALL option is active, and a substantial amount of memory is not allocated for this configuration, allocate more memory to prevent memory errors occurring on UNIX Data Management Servers. Note that this only applied to US or Canadian Address Verification nodes.
The properties window for a profile enables you to specify the size of the frequency distribution memory cache for the profile. To do so, open a profile, and then select Tools > Profile Options. Specify the desired cache size.
Frequency drill-through in profiles depends on the internal date object, which only supports millisecond precision. Drill-through does not work with higher levels of precision. To specify this level of precision, set the BASE/DATE_FORMAT option to ISO8601. Make sure that this setting is set to the same value on all of the systems that run and view the profiles. As a best practice, you should set your datetime fields to millisecond or lower precision and use the ISO8601 format.
You can include an External Data Provider node in the outer data job. This External Data Provider node must be used when you want to create a data job that has an inner data job that reads a data source. If you do not include an External Data Provider node in this scenario, the following error is displayed: Error: Data flow - Job has no external provider steps.
You cannot use the Data Joining node to join two columns from an Oracle table if one column is of type NUMBER and the other column is of type INTEGER. To make the join work, add a Field Layout node directly before the Data Joining node on the side that comes from the Oracle table. Add all the fields. Then access the Advanced Properties for the Field Layout node and double-click the FIELD_MAP default value. For the field (column) that you are joining, in the TYPE_OVERRIDE column, enter INTEGER.
Documentation Feedback: yourturn@sas.com
|
Doc ID: dfU_FAQ.html |