High-Performance Features of the OPTGRAPH Procedure


Distribute Data to Greenplum

The following example shows how to use the HPDS2 procedure to copy a data set to a Greenplum database; the table is distributed by a column called from_node:

   libname linkdata 'C:\mydata';

   libname gplib greenplm
      server     = "grid001.example.com"
      schema     = public
      user       = dbuser
      password   = dbpass
      database   = hps;

   proc datasets nolist lib=gplib;
      delete links_data_123;
   quit;

   proc hpds2
      data = linkdata.links_data_123
      out  = gplib.links_data_123 (distributed_by='distributed by (from_node)');
      performance
         host    = "grid001.example.com"
         install = "/opt/TKGrid"
         commit  = 10000000;
      data DS2GTF.out;
         method run();
            set DS2GTF.in;
         end;
      enddata;
   run;

If the output table links_data_123 already exists in the Greenplum database, the call to the DATASETS procedure removes the existing table from the database, because a DBMS usually does not support replacement operations on tables.

The OUT= option in the PROC HPDS2 statement specifies a table that uses the library gplib, which is the Greenplum library that is assigned in the second LIBNAME statement. This option also requests that PROC HPDS2 distribute the records by from_node among the data segments of the computing appliance. The statements that follow the PERFORMANCE statement are the DS2 program that copies the input data to the output data without further transformations.

For more information about the HPDS2 procedure, see Base SAS Procedures Guide: High-Performance Procedures.