Identical Parallel WHERE Clause Subsetting Results

Overview of Parallel WHERE Clause Subsetting

Under certain circumstances, it is possible to perform parallel WHERE clause subsetting on a table more than once and to receive slightly different results. This event can occur when submitting parallel WHERE clause code to SPD Server that uses the SAS OBS=nnnn data set option.
The SAS OBS=nnnn data set option causes processing to end with the specified (nth) observation in a table. Because parallel WHERE clause processing is threaded, subsetting a table and using OBS=nnnn might not produce identical results from run to run, or different batch jobs using the same WHERE clause code might produce slightly different results.
When a parallel WHERE-cause evaluation is split into multiple threads, SPD Server uses a multi-threading model that is designed to return rows as fast as possible. Some threads might be able to complete row scans incrementally faster than other threads, due to uneven loads across multiple processors or system contention issues. This inequity can create minute variances that can generate nonidentical results to the same subsetting request.
If you have code that performs parallel WHERE clause subsetting in conjunction with the OBS=nnnn data processing option, and if it is critical that successive WHERE clause subsets on the same data must be identical, you can eliminate thread contention error by setting the thread count value for that operation to 1.
To set the SPD Server thread count value, you can use the SPDSTCNT= macro:
%let SPDSTCNT=1;
The same potential for subsetting variation applies when a DATA step uses the OBS=nnnn data processing option with a parallel by-clause, such as:
    data test1;
      set spds45.testdata (obs=1000);
      where j in (1,5,25);
      by i;
    run;
Use the SPDSTCNT= macro solution to ensure identical results across multiple identical table subsetting requests.

WHERE Clause Subsetting Variation Example

Job 1 and Job 2 use the same tables and data requests but produce non-identical results as seen in the respective Job 1 and Job 2 outputs.
To eliminate variation in the output, simply add the thread count statement
%let SPDSTCNT=1;
to the beginning of each job.

Job 1

    data test1;
       set spds45.testdata
         (obs=1000);
       where j in (1,5,25);
    run;

    PROC SORT data=test1;
       by i;
    run;

    PROC PRINT data=test1
       (obs=10);
    run;

Job 1 Output

The SAS System     11:44 Monday, May 9, 2005   1

     Obs    a      i       j    k

       1         24601     1    1
       2         24605     5    5
       3         24625    25    0
       4         24701     1    1
       5         24705     5    5
       6         24725    25    0
       7         24801     1    1
       8         24805     5    5
       9         24825    25    0
      10         24901     1    1

Job 2

    data test2;
       set spds45.testdata
         (obs=1000);
       where j in (1,5,25);
    run;

    PROC SORT data=test2;
       by i;
    run;

    PROC PRINT data=test2
       (obs=10);
    run;

Job 2 Output

The SAS System
11:44 Monday, May 9, 2005   1

     Obs   a      i      j    k

       1           1     1    1
       2           5     5    5
       3          25    25    0
       4         101     1    1
       5         105     5    5
       6         125    25    0
       7         201     1    1
       8         205     5    5
       9         225    25    0
      10         301     1    1