UNIQUEBY Function |
The UNIQUEBY function returns the locations of the unique BY-group combinations for a sorted or indexed matrix. The arguments to the UNIQUEBY function are as follows:
is the input matrix, which must be sorted or indexed according to the by columns.
is either a numeric matrix of column numbers, or a character matrix that contains the names of columns that correspond to column labels assigned to matrix by a MATTRIB statement or READ statement. If by is not specified, then the first column is used.
is a vector such that index[] is the row index of the th element of matrix when sorted according to by. Consequently, matrix[index, ] is the sorted matrix. index can be computed for a matrix and a given set of by columns with the SORTNDX call. If the matrix is known to be sorted according to the by columns already, then index should be 1:nrow(matrix). In this case, you can also omit the index argument.
The UNIQUEBY function returns a column vector whose th row gives the row in index whose value is the row in matrix of the th unique combination of values in the by columns.
For example, the following statements use the SORTNDX subroutine to create a sort index for a matrix. The UNIQUEBY function is then used to determine the unique combinations of the columns of the matrix:
m = { 1 0, 2 0, 2 2, 2 0, 1 0, 2 0, 1 1 }; cols = 1:2; call sortndx(ndx, m, cols); sorted = m[ndx,]; unique_rows = uniqueby(m, cols, ndx); unique_vals = m[ndx[unique_rows], cols ]; print sorted, unique_rows unique_vals;
sorted | |
---|---|
1 | 0 |
1 | 0 |
1 | 1 |
2 | 0 |
2 | 0 |
2 | 0 |
2 | 2 |
unique_rows | unique_vals | |
---|---|---|
1 | 1 | 0 |
3 | 1 | 1 |
4 | 2 | 0 |
7 | 2 | 2 |
In addition, the following statements gives the number of unique values and the number of elements in each BY-group:
n = nrow(unique_rows); size = j(n,1); do i = 1 to n-1; size[i] = unique_rows[i+1] - unique_rows[i]; end; size[n] = nrow(m) - unique_rows[n] + 1; print n, size;
n |
---|
4 |
size |
---|
2 |
1 |
3 |
1 |
If matrix is already sorted according to the by columns (see the SORT call), then UNIQUEBY can be called with 1:nrow(matrix) for the index argument, or the last argument can be omitted as shown in the following statement:
unique_loc = uniqueby(sorted, cols); print unique_loc;
unique_loc |
---|
1 |
3 |
4 |
7 |