Use the Delete Rows
directive to delete data from a selected source table. Data is deleted
in the source table itself, rather than in a separate target table.
Prerequisites
The prerequisites for
the Delete Rows directive are defined as follows:
The Hadoop cluster needs to be
configured with release 0.14.0 or later of the Apache Hive data warehouse
software. This release supports transactional tables.
Source tables must use a Hive file
format, preferably ORC (Optimized Row Columnar.)
Source tables must be bucketed
and partitioned. Bucketing clusters data based on the values in a
specified (key) column. Partitioning creates individually accessible
subsets of data based on the values in one or more source columns.
To determine whether a source table has been bucketed and partitioned,
contact your Hadoop administrator.
Example
Follow these steps to
use the Delete Rows directive:
On the SAS
Data Loader directives page, click Delete
Rows. The Source Table task
is displayed. For more information about data sources and tables, see Viewing Data Sources and Tables.
In the Source
Table task, select a data source and click Next,
or click . Refer to the prerequisites as needed.
In the Delete
Rows task, choose one of the following:
To delete all of the
rows in the source table, click All rows and
then click Next.
To delete rows using
one or more rules, click Specify rules and
proceed to the next step. The Delete Rows job deletes rows when the
specified rules are true. Multiple rules can be applied with logical
AND and OR operators.
To delete rows using
a Hive expression, click Specify expression and
go to Step 5. Rows are deleted when the Hive expression returns true.
To delete rows by specifying
one or more rules, follow these steps.
Click Select
a column and choose the source column that forms the
basis of your rule.
Click and select a logical Operator.
The operators that are available depend on the type of the data in
the source column. For example, the following image shows the operators
that are available for the date/time data type:
In the Value field,
add the source column value that completes the expression. In the
preceding example, the rule can be read as “Delete from the
source table all rows with a last contact date prior to January 1,
2010.”
Click Add
Rule to add another rule. Select a different column,
operator, and value.
To delete rows when
either the new rule or the preceding rule are true, change the AND condition
to OR.
When your rules are
complete, click Next and go to Step 6.
To delete rows using
a Hive expression, follow these steps:
In the Hive
expression text box, either type or paste a Hive SQL
expression.
To add Hive SQL functions
to your expression, click Functions, expand
a category, select a function, and click .
To add column names
to your expression, position the cursor in the Hive expression box,
click Columns in the Resources box,
click the source column, and then click .
When you have specified
a rule or a Hive expression, click Next.
In the Code task,
review the Hive code that will run in Hadoop. Click Edit
HiveQL Code as needed.
Note: When you edit the Hive expression
in the Code task, you will lose those edits
if you then change the content of the Delete Rows task.
Click Next to
open the Result task, and then click Start
deleting data.
When the job is complete,
click Log to confirm the deletion of rows.