Delete Rows

Introduction

Delete Rows icon in the SAS Data Loader window
Use the Delete Rows directive to delete data from a selected source table. Data is deleted in the source table itself rather than in a separate target table.

Prerequisites

The prerequisites for the Delete Rows directive are defined as follows:
  • The Hadoop cluster needs to be configured with release 0.14 or later of the Apache Hive data warehouse software. This release supports transactional tables.
  • Source tables must use a Hive file format, preferably ORC (Optimized Row Columnar.)
  • Source tables must be bucketed and partitioned. Bucketing clusters data based on the values in a specified (key) column. Partitioning creates individually accessible subsets of data based on the values in one or more source columns. To determine whether a source table has been bucketed and partitioned, contact your Hadoop administrator.

Example

Follow these steps to use the Delete Rows directive:
  1. On the SAS Data Loader directives page, click Delete Rows.
  2. In the Source Table task, select a data source and click Next, or click Select Recent Table. Refer to the prerequisites as needed.
  3. In the Delete Rows task, choose one of the following:
    1. To delete all of the rows in the source table, click All rows and then click Next.
    2. To delete rows using one or more rules, click Specify rules and proceed to the next step. The Delete Rows job deletes rows when the specified rules are true. Multiple rules can be applied with logical AND and OR operators.
    3. To delete rows using a Hive expression, click Specify expression and go to Step 5. Rows are deleted when the Hive expression returns true.
  4. To delete rows by specifying one or more rules, follow these steps.
    1. Click Select a column and choose the source column that forms the basis of your rule.
    2. Click and select a logical Operator. The operators that are available depend on the type of the data in the source column. For example, the following image shows the operators that are available for the date/time data type:
      operators available for date/time source columns
    3. In the Value field, add the source column value that completes the expression. In the preceding example, the rule can be read as “Delete from the source table all rows with a last contact date prior to January 1, 2010.”
    4. Click Add Rule to add another rule. Select a different column, operator, and value.
    5. To delete rows when either the new rule or the preceding rule are true, change the AND condition to OR.
      Multiple rules
    6. When your rules are complete, click Next and go to Step 6.
  5. To delete rows using a Hive expression, follow these steps:
    1. In the Hive expression text box, either type or paste a Hive expression.
    2. To add Hive functions to your expression, click Functions, expand a category, select a function, and click Arrow Right Plus Icon.
      Hive function inserted in Hive expression text box
      To add column names to your expression, position the cursor in the Hive expression box, click Columns in the Resources box, click the source column, and then click Arrow Right Plus Icon.
  6. When you have specified a rule or a Hive expression, click Next.
  7. In the Code task, review the Hive code that will run in Hadoop. Click Edit HiveQL Code as needed.
    Note: When you edit the Hive expression in the Code task, you will lose those edits if you then change the content of the Delete Rows task.
  8. Click Next to open the Result task, and then click Start deleting data.
  9. When the job is complete, click Log to confirm the deletion of rows.