The QUANTREG Procedure

Growth Charts for Body Mass Index

Body mass index (BMI) is defined as the ratio of weight (kg) to squared height (m$^2$) and is a widely used measure for categorizing individuals as overweight or underweight. The percentiles of BMI for specified ages are of particular interest. As age increases, these percentiles provide growth patterns of BMI not only for the majority of the population, but also for underweight or overweight extremes of the population. In addition, the percentiles of BMI for a specified age provide a reference for individuals at that age with respect to the population.

Smooth quantile curves have been widely used for reference charts in medical diagnosis to identify unusual subjects, whose measurements lie in the tails of the reference distribution. This example explains how to use the QUANTREG procedure to create growth charts for BMI.

A SAS data set named bmimen was created by merging and cleaning the 1999–2000 and 2001–2002 survey results for men that is published by the National Center for Health Statistics. This data set contains the variables Weight (kg), Height (m), BMI (kg/${\mbox m}^2$), Age (year), and SeQN (respondent sequence number) for 8,250 men (Chen, 2005).

The data set that is used in this example is a subset of the original data set of Chen (2005). It contains the two variables BMI and Age with 3,264 observations.

data bmimen;
   input BMI Age @@;
   SqrtAge = sqrt(Age);
   InveAge = 1/Age;
   LogBMI  = log(BMI);
   datalines;
18.6  2.0 17.1  2.0 19.0  2.0 16.8  2.0 19.0  2.1  15.5   2.1
16.7  2.1 16.1  2.1 18.0  2.1 17.8  2.1 18.3  2.1  16.9   2.1
15.9  2.1 20.6  2.1 16.7  2.1 15.4  2.1 15.9  2.1  17.7   2.1

   ... more lines ...   

29.0 80.0 24.1 80.0 26.6 80.0 24.2 80.0 22.7 80.0  28.4  80.0
26.3 80.0 25.6 80.0 24.8 80.0 28.6 80.0 25.7 80.0  25.8  80.0
22.5 80.0 25.1 80.0 27.0 80.0 27.9 80.0 28.5 80.0  21.7  80.0
33.5 80.0 26.1 80.0 28.4 80.0 22.7 80.0 28.0 80.0  42.7  80.0
;

The logarithm of BMI is used as the response. (Although this does not improve the quantile regression fit, it helps with statistical inference.) A preliminary median regression is fitted with a parametric model, which involves six powers of Age.

The following statements invoke the QUANTREG procedure:

proc quantreg data=bmimen algorithm=interior(tolerance=1e-5) ci=resampling;
   model logbmi = inveage sqrtage age sqrtage*age
                  age*age age*age*age
                  / diagnostics cutoff=4.5 quantile=.5 seed=1268;
   id age bmi;
   test_age_cubic: test age*age*age / wald lr rankscore(tau);
run;

The MODEL statement provides the model, and the option QUANTILE=0.5 requests median regression. The ALGORITHM= option requests that the interior point algorithm be used to compute $\hat\bbeta ({\frac12})$. For more information about this algorithm, see the section Interior Point Algorithm.

Figure 83.11 displays the estimated parameters, standard errors, 95% confidence intervals, t values, and p-values that are computed by the resampling method, which is requested by the CI= option. All of the parameters are considered significant because the p-values are smaller than 0.001.

Figure 83.11: Parameter Estimates with Median Regression: Men

The QUANTREG Procedure

Parameter Estimates
Parameter DF Estimate Standard
Error
95% Confidence Limits t Value Pr > |t|
Intercept 1 7.8909 0.8168 6.2895 9.4924 9.66 <.0001
InveAge 1 -1.8354 0.4350 -2.6884 -0.9824 -4.22 <.0001
SqrtAge 1 -5.1247 0.7135 -6.5237 -3.7257 -7.18 <.0001
Age 1 1.9759 0.2537 1.4785 2.4733 7.79 <.0001
SqrtAge*Age 1 -0.3347 0.0424 -0.4179 -0.2515 -7.89 <.0001
Age*Age 1 0.0227 0.0029 0.0170 0.0284 7.77 <.0001
Age*Age*Age 1 -0.0000 0.0000 -0.0001 -0.0000 -7.40 <.0001



The TEST statement requests Wald, likelihood ratio, and rank tests for the significance of the cubic term in Age. The test results, shown in Figure 83.12, indicate that this term is significant. Higher-order terms are not significant.

Figure 83.12: Test of Significance for Cubic Term

Test test_age_cubic Results
Test Test Statistic DF Chi-Square Pr > ChiSq
Wald 54.7417 1 54.74 <.0001
Likelihood Ratio 56.9473 1 56.95 <.0001
Rank_Tau 42.5731 1 42.57 <.0001



Median regression and, more generally, quantile regression are robust to extremes of the response variable. The DIAGNOSTICS option in the MODEL statement requests a diagnostic table of outliers, shown in Figure 83.13, which uses a cutoff value that is specified in the CUTOFF= option. The variables that are specified in the ID statement are included in the table.

With CUTOFF=4.5, 14 men are identified as outliers. All of these men have large positive standardized residuals, which indicates that they are overweight for their age. The cutoff value 4.5 is ad hoc. It corresponds to a probability less than 0.5E–5 if normality is assumed, but the standardized residuals for median regression usually do not meet this assumption.

In order to construct the chart shown in Figure 83.2, the same model that is used for median regression is used for other quantiles. The QUANTREG procedure can compute fitted values for multiple quantiles.

Figure 83.13: Diagnostics with Median Regression

Diagnostics
Obs Age BMI Standardized
Residual
Outlier
1337 8.900000 36.500000 5.3575 *
1376 9.200000 39.600000 5.8723 *
1428 9.400000 36.900000 5.3036 *
1505 9.900000 35.500000 4.8862 *
1764 14.900000 46.800000 5.6403 *
1838 16.200000 50.400000 5.9138 *
1845 16.300000 42.600000 4.6683 *
1870 16.700000 42.600000 4.5930 *
1957 18.100000 49.900000 5.5053 *
2002 18.700000 52.700000 5.8106 *
2016 18.900000 48.400000 5.1603 *
2264 32.000000 55.600000 5.3085 *
2291 35.000000 60.900000 5.9406 *
2732 66.000000 14.900000 -4.7849 *



The following statements request fitted values for 10 quantile levels that range from 0.03 to 0.97:

proc quantreg data=bmimen algorithm=interior(tolerance=1e-5) ci=none;
   model logbmi = inveage sqrtage age sqrtage*age
                    age*age age*age*age
                    / quantile=0.03,0.05,0.1,0.25,0.5,0.75,
                               0.85,0.90,0.95,0.97;
   output out=outp pred=p/columnwise;
run;

data outbmi;
   set outp;
   pbmi = exp(p);
run;

proc sgplot data=outbmi;
   title 'BMI Percentiles for Men: 2-80 Years Old';
   yaxis label='BMI (kg/m**2)' min=10 max=45 values=(10 15 20 25 30 35 40 45);
   xaxis label='Age (Years)' min=2 max=80 values=(2 10 20 30 40 50 60 70 80);

   scatter x=age y=bmi /markerattrs=(size=1);
   series  x=age y=pbmi/group=QUANTILE;
run;

The fitted values are stored in the OUTPUT data set outp. The COLUMNWISE option arranges these fitted values for all quantiles in the single variable p by groups of the quantiles. After the exponential transformation, both the fitted BMI values and the original BMI values are plotted against age to create the display shown in Figure 83.2.

The fitted quantile curves reveal important information. During the quick growth period (ages 2 to 20), the dispersion of BMI increases dramatically. It becomes stable during middle age, and then it contracts after age 60. This pattern suggests that effective population weight control should start in childhood.

Compared to the 97th percentile in reference growth charts that were published by the Centers for Disease Control and Prevention (CDC) in 2000 (Kuczmarski, Ogden, and Guo, 2002), the 97th percentile for 10-year-old boys in Figure 83.2 is 6.4 BMI units higher (an increase of 27%). This can be interpreted as a warning of overweight or obesity. See Chen (2005) for a detailed analysis.