Prepare Data

To prepare data:
  1. Select a source table from the navigation pane, right-click, and select Prepare Data.
    To preview the data, click Preview Data on the Source Table page. The preview enables you to confirm that you have Read access to the data before you begin the data preparation.
  2. On the General page, specify values for the Job name, Location, and Description if you intend to save this action as a job.
  3. On the Source Columns page, deselect the check boxes for any columns that you do not want to include in the prepared data.
  4. On the Joined Tables page, click Add to select a table to join. For more information about how to use this page, see Joining Tables.
  5. On the Calculated Columns page, click Add to add new columns to the prepared data. For more information about how to use this page, see Adding Calculated Columns.
  6. On the Output Columns page, remove, reorder, or edit the output column information. You can specify the output column name, description, format, and length.
    Tip
    The default length for numeric variables is 8. The length can be set to a lower number to reduce storage size. However, length for numeric variables is related to the precision, and reducing their lengths arbitrarily can cause precision loss. For more information, see "Numeric Variable Length and Precision in UNIX Environments" in SAS Companion for UNIX Environments.
  7. On the Row Filters page, add filters to subset the input data. Click Add to add a new filter. Select the column name to filter on, select the filter criteria, and enter the filter value. If more than one filter is added, the filters are applied using AND logic. You cannot create a row filter on a calculated column.
  8. On the Sort Order page, select the column that you want to sort by. The default sort order is ascending. Use the menu to beside the selected column name to set the sort order to descending.
  9. On the Data Output page, set the following parameters:
    Parameter
    Sample Value
    Description
    Output table
    tablename
    The output table name is populated automatically with a default value. Click Browse to select a different table, or enter the table name that you want to use.
    Library
    WORK
    Click Browse to specify a library for the output table.
    Location
    /Shared Data
    Click Browse to specify a folder for the output table. The button becomes active when the Library value is changed from WORK to another library.
    Type of output
    Table
    For deployments that use SAS Visual Analytics Hadoop, when you select Table, the size of the data is calculated and distributed evenly in HDFS with an optimal block size. If you select View, then the data is distributed in 32-megabyte blocks. However, for large data sets, any reduction in block utilization might be considered worthwhile in order to avoid moving a large data set.
    For deployments that use a third-party vendor database, do not select View when the output library is for a DBMS. Views are not supported by the SAS/ACCESS Interfaces.
    When the co-located data provider is SAS Visual Analytics Hadoop, the following options are available:
    SAS Visual Analytics Hadoop Options
    Parameter
    Sample Value
    Description
    Add to HDFS
    Selected
    Select this check box to distribute the data to HDFS.
    HDFS output path
    /user/
    Enter the fully qualified path in HDFS to use for storing the prepared data. The path is case-senstive.
    HDFS filename
    tablename
    The filename for the table is automatically populated. It must match the name of the output table.
    Table description
    Prepared data for tablename
    Specify a description to associate with the prepared data. The description is displayed beside the table name in the explorer interface.
    When the co-located data provider is a third-party vendor database, the following options are available:
    Data Server Options
    Parameter
    Sample Value
    Description
    Add to data server
    Selected
    Select this check box to distribute the data to the data server.
    Output table
    tablename
    The output table name is populated automatically with a default value. Click Browse to select a different table, or enter the table name that you want to use.
    Library
    SAS Visual Analytics Distributed Data
    Make sure that you select a library that associated with the co-located data provider and that the library is configured to distribute data.
    Location
    /Shared Data
    Click Browse to specify a folder for the output table.
  10. Click Submit.