Language Reference


UNIQUEBY Function

UNIQUEBY (matrix <, by> <, index> );

The UNIQUEBY function returns the locations of the unique BY-group combinations for a sorted or indexed matrix. The arguments to the UNIQUEBY function are as follows:

matrix

is the input matrix, which must be sorted or indexed according to the by columns.

by

is either a numeric matrix of column numbers, or a character matrix that contains the names of columns that correspond to column labels assigned to matrix by a MATTRIB statement or READ statement . If by is not specified, then the first column is used.

index

is a vector such that index[i] is the row index of the ith element of matrix when sorted according to by. Consequently, matrix[index, ] is the sorted matrix. index can be computed for a matrix and a given set of by columns with the SORTNDX call . If the matrix is known to be sorted according to the by columns already, then index should be 1:nrow(matrix). In this case, you can also omit the index argument.

The UNIQUEBY function returns a column vector whose ith row is the row in index whose value is the row in matrix of the ith unique combination of values in the by columns.

For example, the following statements use the SORTNDX subroutine to create a sort index for a matrix. The UNIQUEBY function is then used to determine the unique combinations of the columns of the matrix:

m = { 1 0,
      2 0,
      2 2,
      2 0,
      1 0,
      2 0,
      1 1 };
cols = 1:2;
call sortndx(ndx, m, cols);

sorted = m[ndx,];
unique_rows = uniqueby(m, cols, ndx);
unique_vals = m[ndx[unique_rows], cols ];
print sorted, unique_rows unique_vals;

Figure 25.421: Unique Values of the Sort Variables

sorted
1 0
1 0
1 1
2 0
2 0
2 0
2 2

unique_rows unique_vals  
1 1 0
3 1 1
4 2 0
7 2 2



In addition, the following statements compute the number of unique values and the number of elements in each BY-group:

n = nrow(unique_rows);
size = j(n,1);
do i = 1 to n-1;
   size[i] = unique_rows[i+1] - unique_rows[i];
end;
size[n] = nrow(m) - unique_rows[n] + 1;
print n, size;

Figure 25.422: Number of BY Groups and Number of Elements in Each Group

n
4

size
2
1
3
1



If matrix is already sorted according to the by columns (see the SORT call ), then UNIQUEBY can be called with 1:nrow(matrix) for the index argument, or the last argument can be omitted as shown in the following statement:

unique_loc = uniqueby(sorted, cols);
print unique_loc;

Figure 25.423: Position of Unique Rows for a Sorted Matrix

unique_loc
1
3
4
7