UNIQUEBY Function

UNIQUEBY( matrix <, by> <, index> ) ;

The UNIQUEBY function returns the locations of the unique BY-group combinations for a sorted or indexed matrix. The arguments to the UNIQUEBY function are as follows:

matrix

is the input matrix, which must be sorted or indexed according to the by columns.

by

is either a numeric matrix of column numbers, or a character matrix that contains the names of columns that correspond to column labels assigned to matrix by a MATTRIB statement or READ statement. If by is not specified, then the first column is used.

index

is a vector such that index[] is the row index of the th element of matrix when sorted according to by. Consequently, matrix[index, ] is the sorted matrix. index can be computed for a matrix and a given set of by columns with the SORTNDX call. If the matrix is known to be sorted according to the by columns already, then index should be 1:nrow(matrix). In this case, you can also omit the index argument.

The UNIQUEBY function returns a column vector whose th row gives the row in index whose value is the row in matrix of the th unique combination of values in the by columns.

For example, the following statements use the SORTNDX subroutine to create a sort index for a matrix. The UNIQUEBY function is then used to determine the unique combinations of the columns of the matrix:

m = { 1 0,
      2 0,
      2 2,
      2 0,
      1 0,
      2 0,
      1 1 };
cols = 1:2;
call sortndx(ndx, m, cols);
  
sorted = m[ndx,];
unique_rows = uniqueby(m, cols, ndx);
unique_vals = m[ndx[unique_rows], cols ];
print sorted, unique_rows unique_vals;

Figure 23.291 Unique Values of the Sort Variables
sorted
1 0
1 0
1 1
2 0
2 0
2 0
2 2

unique_rows unique_vals  
1 1 0
3 1 1
4 2 0
7 2 2

In addition, the following statements gives the number of unique values and the number of elements in each BY-group:

n = nrow(unique_rows);
size = j(n,1);
do i = 1 to n-1;
   size[i] = unique_rows[i+1] - unique_rows[i];
end;
size[n] = nrow(m) - unique_rows[n] + 1;
print n, size;

Figure 23.292 Number of BY Groups and Number of Elements in Each Group
n
4

size
2
1
3
1

If matrix is already sorted according to the by columns (see the SORT call), then UNIQUEBY can be called with 1:nrow(matrix) for the index argument, or the last argument can be omitted as shown in the following statement:

unique_loc = uniqueby(sorted, cols);
print unique_loc;

Figure 23.293 Position of Unique Rows for a Sorted Matrix
unique_loc
1
3
4
7