Identical Parallel WHERE Clause Subsetting Results

Overview of Parallel WHERE Clause Subsetting

Under certain circumstances, it is possible to perform parallel WHERE clause subsetting on a table more than once and to receive slightly different results. This event can occur when submitting parallel WHERE clause code that uses the SAS OBS= data set option to SPD Server .
The SAS OBS= data set option causes processing to end with the specified (nth) row in a table. Because parallel WHERE clause processing is threaded, subsetting a table and using OBS= might not produce identical results from run to run. Different batch jobs using the same WHERE clause code might produce slightly different results.
When a parallel WHERE-clause evaluation is split into multiple threads, the server uses a multi-threading model that is designed to return rows as fast as possible. Some threads might be able to complete row scans incrementally faster than other threads, due to uneven loads across multiple processors or system contention issues. This inequity can create minute variances that can generate nonidentical results to the same subsetting request.
If you have code that performs parallel WHERE clause subsetting in conjunction with the OBS= data processing option, and if it is critical that successive WHERE clause subsets on the same data must be identical, you can eliminate thread contention error by setting the thread count value for that operation to 1.
To set the server thread count value, you can use the SPDSTCNT macro variable:
%let SPDSTCNT=1;
The same potential for subsetting variation applies when a DATA step uses the OBS=nnnn data processing option with a parallel by-clause, such as the in this example:
    data test1;
      set spds45.testdata (obs=1000);
      where j in (1,5,25);
      by i;
    run;
Use the SPDSTCNT macro solution to ensure identical results across multiple identical table subsetting requests.

WHERE Clause Subsetting Variation Example

Job 1 and Job 2 use the same tables and data requests but produce non-identical results as seen in the respective Job 1 and Job 2 outputs.
To eliminate variation in the output, simply add the following thread count statement to the beginning of each job.
%let SPDSTCNT=1;

Job 1

    data test1;
       set spds45.testdata
         (obs=1000);
       where j in (1,5,25);
    run;

    PROC SORT data=test1;
       by i;
    run;

    PROC PRINT data=test1
       (obs=10);
    run;

Job 1 Output

The SAS System     11:44 Monday, May 9, 2005   1

     Obs    a      i       j    k

       1         24601     1    1
       2         24605     5    5
       3         24625    25    0
       4         24701     1    1
       5         24705     5    5
       6         24725    25    0
       7         24801     1    1
       8         24805     5    5
       9         24825    25    0
      10         24901     1    1

Job 2

    data test2;
       set spds45.testdata
         (obs=1000);
       where j in (1,5,25);
    run;

    PROC SORT data=test2;
       by i;
    run;

    PROC PRINT data=test2
       (obs=10);
    run;

Job 2 Output

The SAS System
11:44 Monday, May 9, 2005   1

     Obs   a      i      j    k

       1           1     1    1
       2           5     5    5
       3          25    25    0
       4         101     1    1
       5         105     5    5
       6         125    25    0
       7         201     1    1
       8         205     5    5
       9         225    25    0
      10         301     1    1
Last updated: February 8, 2017