The QUANTREG Procedure |
Growth Charts for Body Mass Index |
Body mass index (BMI) is defined as the ratio of weight (kg) to squared height (m) and is a widely used measure for categorizing individuals as overweight or underweight. The percentiles of BMI for specified ages are of particular interest. As age increases, these percentiles provide growth patterns of BMI not only for the majority of the population, but also for underweight or overweight extremes of the population. In addition, the percentiles of BMI for a specified age provide a reference for individuals at that age with respect to the population.
Smooth quantile curves have been widely used for reference charts in medical diagnosis to identify unusual subjects, whose measurements lie in the tails of the reference distribution. This example explains how to use the QUANTREG procedure to create growth charts for BMI.
A SAS data set named bmimen was created by merging and cleaning the 1999–2000 and 2001–2002 survey results for men published by the National Center for Health Statistics. This data set contains the variables WEIGHT (kg), HEIGHT (m), BMI (kg/), AGE (year), and SEQN (respondent sequence number) for 8,250 men. More details can be found in Chen (2005).
The data set used in this example is a subset of the original data set of Chen (2005). It contains the two variables BMI and AGE with 3264 observations.
data bmimen0; input bmi age @@; datalines; 18.6 2.0 17.1 2.0 19.0 2.0 16.8 2.0 19.0 2.1 15.5 2.1 16.7 2.1 16.1 2.1 18.0 2.1 17.8 2.1 18.3 2.1 16.9 2.1 15.9 2.1 20.6 2.1 16.7 2.1 15.4 2.1 15.9 2.1 17.7 2.1 15.7 2.1 16.8 2.1 15.6 2.1 18.1 2.1 15.7 2.1 17.2 2.1 14.5 2.2 17.2 2.2 16.3 2.2 15.4 2.2 16.0 2.2 15.8 2.2 ... more lines ... 29.0 80.0 24.1 80.0 26.6 80.0 24.2 80.0 22.7 80.0 28.4 80.0 26.3 80.0 25.6 80.0 24.8 80.0 28.6 80.0 25.7 80.0 25.8 80.0 22.5 80.0 25.1 80.0 27.0 80.0 27.9 80.0 28.5 80.0 21.7 80.0 33.5 80.0 26.1 80.0 28.4 80.0 22.7 80.0 28.0 80.0 42.7 80.0 ;
The logarithm of BMI is used as the response (although this does not improve the quantile regression fit, it helps with statistical inference.) A preliminary median regression is fitted with a parametric model, which involves six powers of AGE.
data bmimen; set bmimen0; sqrtage = sqrt(age); inveage = 1/age; logbmi = log(bmi); run;
The following statements invoke the QUANTREG procedure:
proc quantreg data=bmimen algorithm=interior ci=resampling; model logbmi = inveage sqrtage age sqrtage*age age*age age*age*age / diagnostics cutoff=4.5 quantile=.5; id age bmi; test_age_cubic: test age*age*age / wald lr; run;
The MODEL statement provides the model, and the option QUANTILE=0.5 requests median regression, which computes by using the interior point algorithm as requested with the ALGORITHM= option. See the section Interior Point Algorithm for details about this algorithm.
Figure 72.10 displays the estimated parameters, standard errors, 95 confidence intervals, values, and -values that are computed by the resampling method as requested by the CI= option. All of the parameters are considered significant since the -values are smaller than 0.001.
Parameter Estimates | |||||||
---|---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | 95% Confidence Limits | t Value | Pr > |t| | |
Intercept | 1 | 7.8909 | 0.7755 | 6.3704 | 9.4115 | 10.17 | <.0001 |
inveage | 1 | -1.8354 | 0.4187 | -2.6563 | -1.0144 | -4.38 | <.0001 |
sqrtage | 1 | -5.1247 | 0.6765 | -6.4511 | -3.7983 | -7.58 | <.0001 |
age | 1 | 1.9759 | 0.2410 | 1.5033 | 2.4484 | 8.20 | <.0001 |
sqrtage*age | 1 | -0.3347 | 0.0405 | -0.4141 | -0.2554 | -8.27 | <.0001 |
age*age | 1 | 0.0227 | 0.0028 | 0.0172 | 0.0282 | 8.12 | <.0001 |
age*age*age | 1 | -0.0000 | 0.0000 | -0.0001 | -0.0000 | -7.69 | <.0001 |
The TEST statement requests Wald and likelihood ratio tests for the significance of the cubic term in AGE. The test results, shown in Figure 72.11, indicate that this term is significant. Higher-order terms are not significant.
Median regression and, more generally, quantile regression are robust to extremes of the response variable. The DIAGNOSTICS option in the MODEL statement requests a diagnostic table of outliers, shown in Figure 72.12, which uses a cutoff value specified with the CUTOFF= option. The variables specified in the ID statement are included in the table.
With CUTOFF=4.5, 14 men are identified as outliers. All of these men have large positive standardized residuals, which indicates that they are overweight for their age. The cutoff value 4.5 is ad hoc; it corresponds to a probability less than 0.5E5 if normality is assumed, but the standardized residuals for median regression usually do not meet this assumption.
In order to construct the chart shown in Figure 72.2, the same model used for median regression is used for other quantiles. Note that the QUANTREG procedure can compute fitted values for multiple quantiles.
Diagnostics | ||||
---|---|---|---|---|
Obs | age | bmi | Standardized Residual |
Outlier |
1337 | 8.900000 | 36.500000 | 5.3575 | * |
1376 | 9.200000 | 39.600000 | 5.8723 | * |
1428 | 9.400000 | 36.900000 | 5.3036 | * |
1505 | 9.900000 | 35.500000 | 4.8862 | * |
1764 | 14.900000 | 46.800000 | 5.6403 | * |
1838 | 16.200000 | 50.400000 | 5.9138 | * |
1845 | 16.300000 | 42.600000 | 4.6683 | * |
1870 | 16.700000 | 42.600000 | 4.5930 | * |
1957 | 18.100000 | 49.900000 | 5.5053 | * |
2002 | 18.700000 | 52.700000 | 5.8106 | * |
2016 | 18.900000 | 48.400000 | 5.1603 | * |
2264 | 32.000000 | 55.600000 | 5.3085 | * |
2291 | 35.000000 | 60.900000 | 5.9406 | * |
2732 | 66.000000 | 14.900000 | -4.7849 | * |
The following statements request fitted values for 10 quantiles ranging from 0.03 to 0.97:
proc quantreg data=bmimen ci=none algorithm=interior(tolerance=1e-6); model logbmi = inveage sqrtage age sqrtage*age age*age age*age*age / quantile=0.03,0.05,0.1,0.25,0.5,0.75, 0.85,0.90,0.95,0.97; output out=outp pred=p/columnwise; run; data outbmi; set outp; pbmi = exp(p); run; proc sgplot data=outbmi; title 'BMI Percentiles for Men: 2-80 Years Old'; yaxis label='BMI (kg/m**2)' min=10 max=45 values=(10 15 20 25 30 35 40 45); xaxis label='Age (Years)' min=2 max=80 values=(2 10 20 30 40 50 60 70 80); scatter x=age y=bmi /markerattrs=(size=1); series x=age y=pbmi/group=QUANTILE; run;
The fitted values are stored in the OUTPUT data set outp. The COLUMNWISE option arranges these fitted values for all quantiles in the single variable p by groups of the quantiles. After the exponential transformation, the fitted BMI values together with the original BMI values are plotted against AGE to create the display shown in Figure 72.2.
The fitted quantile curves reveal important information. During the quick growth period (ages 2 to 20), the dispersion of BMI increases dramatically; it becomes stable during middle age, and then it contracts after age 60. This pattern suggests that effective population weight control should start in childhood.
Compared to the 97th percentile in reference growth charts published by CDC in 2000 (Kuczmarski et al. 2002), the 97th percentile for 10-year-old boys in Figure 72.2 is 6.4 BMI units higher (an increase of 27). This can be interpreted as a warning of overweight or obesity. See Chen (2005) for a detailed analysis.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.