Language Reference


BIN Function

BIN (x, cutpoints <, closed> );

The BIN function divides numeric values into a set of disjoint intervals called bins. The BIN function returns a matrix that is the same shape as x and that indicates which elements of x are contained in each bin. The arguments are as follows:

x

specifies a numerical vector or matrix.

cutpoints

specifies the intervals into which to bin the data. This argument can have a vector or a scalar value. A vector defines the endpoints of the intervals; a scalar value specifies the number of evenly spaced intervals into which the range of the data is divided.

closed

is an optional argument that specifies whether the bins are open on the right or left sides. The following values are valid:

"Left"

specifies that the bins are closed on the left and open on the right. The last interval is closed on both sides. This is the default value.

"Right"

specifies that the intervals are open on the left and closed on the right. The first interval is closed on both sides.

If cutpoints is a vector, then it must be ordered so that the first element is the smallest and the last element is the largest. The ordered values define the intervals that are used to bin the values. For example, the following statements bin x into the intervals $I_1 = [0,1)$, $I_2 = [1, 1.8)$, $I_3 = [1.8, 2)$, and $I_4 = [2, 4]$, and return the bin numbers for each element of x:

x = {0, 0.5, 1, 1.5, 2, 2.5, 3, 0.5, 1.5, 3, 3, 1};
cutpoints = {0 1 1.8 2 4};
b = bin(x, cutpoints);
print x b;

Figure 25.52: Bins for Each Observation

x b
0 1
0.5 1
1 2
1.5 2
2 4
2.5 4
3 4
0.5 1
1.5 2
3 4
3 4
1 2



You can use the special missing values .M and .I to specify unbounded intervals. A missing value of .M in the first element is interpreted as $-\infty $, and a missing value of .I in the last element is interpreted as $+\infty $. For example, the following statements are all valid specifications of the cutpoints argument:

c = {.M -2 -1 0 1 2};
c = {.M -2 -1 0 1 2 .I};
c = {-2 -1 0 1 2 .I};

If cutpoints is a positive integer, n, then the interval $\min (x), \max (x)$ is divided into n intervals of width $\Delta = (\max (x)-\min (x))/n$ and the data are binned into these intervals. For example, the following statements bin the elements of x into one of three intervals $[0,1)$, $[1, 2)$, or $[2, 3]$:

bin = bin(x, 3);
print x bin;

Figure 25.53: Bins That Are Associated with Each Value

x bin
0 1
0.5 1
1 2
1.5 2
2 3
2.5 3
3 3
0.5 1
1.5 2
3 3
3 3
1 2



Notice in FigureĀ 25.53 that the value 3 is placed into the third interval because the last interval is closed on the right.

The BIN function returns missing values for data values that are not contained in any bin. Missing values are also returned for missing values in the data.

You can use the BIN function in conjunction with the TABULATE function to count the number of observations in each interval. The following statements sample from the standard normal distribution and count the number of observations in a set of evenly spaced intervals:

z = rannor(j(1000, 1, 1));
set = do(-3.5, 3.5, 1);
b = bin(z, set);
call tabulate(levels, count, b);

/* label counts by the center of each interval */
intervals = char(do(-3, 3, 1), 2);
print count[colname=intervals];

Figure 25.54: Bins Counts for Evenly Spaced Intervals

count
-3 -2 -1 0 1 2 3
6 65 241 385 235 59 9