Replacing SCAN (and TRANWRD) with DS2 Code

Consider the following code:
i = 1;
onerow = TRANWRD(SCAN(full_table, i, '|'), ';;', ';-;');
do while (onerow ~= '');
    j = 1;
    elt = scan(onerow, j, ';');
    do while (elt ~= '');
        * processing of each element in the row;
        j = j+1;
        elt = SCAN(onerow, j, ';');
    end;
    i = i+1;
    onerow = TRANWRD(SCAN(full_table, i, '|'), ';;', ';-;');
end;
You can make the following observations:
  • SCAN consumes adjacent delimiters. Therefore, TRANWRD is required to manipulate each row into a form that can be traversed element by element.
  • SCAN starts at the front of the string each time. Therefore, the aggregate cost is O(N^2).
  • SCAN and TRANWRD require NCHAR or NVARCHAR input. If full_table is declared as a CHAR or VARCHAR input, it must be converted to NVARCHAR, then processed, and then converted back to VARCHAR in order to be captured into the onerow value.
Here is code that replaces this type of loop with a native DS2 solution and that thus avoids these problems by collecting the necessary details into a package:
dcl package STRTOK row_iter();
dcl package STRTOK col_iter();
row_iter.load(full_table, '|');
do while (row_iter.hasmore());
    row_iter.getnext(onerow);
    col_iter.load(onerow, ';');
    do while (col_iter.hasmore());
        col_iter.getnext(elt)
        * processing of each element;
    end;
end;
The supporting package, STRTOK, is shown below. It can be used to replace SCAN and TRANWRD pairs anywhere in DS2.
/** STRTOK package - extract subsequent tokens from a string.
 * So named because it mirrors (in a safe way) what is done by the original
 * strtok(1) function available in C.
 */
package sasuser.strtok/overwrite=yes;
  dcl varchar(32767) _buffer;
  dcl int strt blen;
  dcl char(1) _delim;
 
  /* Loads the current object with the supplied buffer and delimiter
   * information. This avoids the cost of constructing and destructing the
   * object, and allows the declaration of a STRTOK outside of the loop in which
   * it is used.
   */
  method load(in_out varchar bufinit, char(1) delim);
    _buffer = bufinit .. delim;
    _delim = delim;
    strt = 1;
    blen = length(_buffer);
  end;
 
  /* Are there more fields? 1 means there are more fields. 0 means there are
   * no more fields.
   */
  method hasmore() returns integer;
    if (strt >= blen) then return 0;
    return 1;
  end;
 
  /* The void-returning GETNEXT method places the next token in the supplied 
   * variable, tok.
   */
  method getnext(in_out varchar tok);
    dcl char(1) c;
    dcl int e;
    tok = '';
    if (hasmore()) then do;
      e = strt;
      c = substr(_buffer,e,1);
      do while (c ~= _delim);
        tok = tok .. c;
        e = e + 1;
        c = substr(_buffer,e,1);
      end;
      strt = e + 1;
    end;
  end;
 
  /* The value-returning GETNEXT method returns the next token. This version is
   * more computationally expensive because it requires an extra copy, as opposed to
   * the void-returning version, above.
   */
  method getnext() returns varchar(32767);
    dcl varchar(32767) tok;
    getnext(tok);
    return tok;
  end;
 
  /* Construct a STRTOK object using the parameters as initial values.
   */
  method strtok(varchar(32766) bufinit, char(1) delim);
    load(bufinit, delim);
  end;
 
  /* Construct a STRTOK object without an initial buffer to be consumed.
   */
  method strtok();
    strt = 0; blen = 0;
  end;
endpackage; run;
Using STRTOK instead of SCAN and TRANWRD avoids the CHAR to NCHAR conversions and reduces CPU because of how STRTOK retains the intermediate state between calls to the getnext() methods. Therefore, it is O(N) instead of O(N^2).
Last updated: March 2, 2017