GetSortedMatrixByGroups

Prototype

Matrix GetSortedMatrixByGroups( DataObject dobj, Matrix mGroupVarNames )

Return Value

The return value is an (n x (k+1)) numeric matrix where n is the number of observations in the DataObject and k is the number of groups defined by the variables specified in mGroupVarNames. The matrix is sorted by the first k columns. The last column of the return value is the original (unsorted) observation number of each row in the DataObject.

Parameters

DataObject dobj
The DataObject containing the data.

Matrix mGroupVarNames
A character matrix containing the names of the variables that define the groups.

Remarks

The group variables in a DataObject define k unique groups. This module returns the matrix of sorted groups along with the original observation numbers for each row. To get the unique groups, call the UNIQUEBY function (see the Example).

Note that the return matrix is numeric even if some or all of the group variables are character. Values of a character variable are represented in the return matrix by integers that correspond to the values' alphabetical ordering within the variable’s data. For example, a variable with values {B, C, A, AA, C, A} would be represented in the return matrix by the data {3, 4, 1, 2, 4, 1}. Thus, if you want to get the groups in terms of the original character data, you need to index into the DataObject as shown in the Example.

This module is called by the following modules:

ColorCodeLinesByGroups

ColorCodeObsByGroups

DrawPolygonsByGroups

Example
/* numeric groups */
declare DataObject dobjDrug;
dobjDrug = DataObject.CreateFromFile( "drug" );
mGroupVarNames = {"Drug" "Disease"};
numGroupVars = ncol(mGroupVarNames);
m = GetSortedMatrixByGroups( dobjDrug, mGroupVarNames );
groupIndex = uniqueby( m, 1:numGroupVars, 1:nrow(m) );
groups = m[groupIndex,];
print groups[colname=(mGroupVarNames || "Obs")];

/* character groups */
declare DataObject dobj;
dobj = DataObject.CreateFromFile( "baseball" );
mGroupVarNames = {"League" "Division"};
numGroupVars = ncol(mGroupVarNames);
m = GetSortedMatrixByGroups( dobj, mGroupVarNames );
groupIndex = uniqueby( m, 1:numGroupVars, 1:nrow(m) );
/* get original observation numbers from last column */
obsNumbers = m[groupIndex,numGroupVars+1];
/* get group values from DataObject */
free groups;
do i = 1 to numGroupVars;
    dobj.GetVarData( mGroupVarNames[i], x );
    groups = groups || x[obsNumbers];
end;
print groups[colname=mGroupVarNames];
See Also

BlendColors