DataFlux Data Management Studio 2.8: User Guide



Document Conversion Node

You can add a Document Conversion node to a data job to address unstructured data by obtaining UTF-8 encoded text for different document types. In addition to the UTF-8 encoded text, the node can output metadata found in the converted document, such as author, title, and the number of pages. See Document Conversion Reference for information about the input file types, file text extraction fields, and file metadata extraction fields supported by the node. You can see the Document Conversion node used in a data job in Converting and Extracting a Document.

Once you have added the Document Conversion node, you can double-click it to open its properties dialog. The properties dialog includes the following elements:

Name - Specifies a name for the node.

Notes - Enables you to open the Notes dialog. You use the dialog to enter optional details or any other relevant information for the node.

Input file - Specifies the full path or file name of the file to be converted.

Output metadata only - When selected, specifies that the output from the Document Conversion node contains only the metadata that is found in the input file. When deselected, specifies that both converted text and metadata are contained in the output generated from the input file.

Note that the Document Conversion node can process encrypted and encoded files when the appropriate .jar files are added to the Java directory. For example, the bcprov-jdk15on-162 jar processes Microsoft Access files. You can obtain this file at this link: This site offers similar files that you use to process other document types.

You can access the following advanced properties by right-clicking the Document Conversion node:

Documentation Feedback:
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfDMStd_PFInput_DocConver.html