This example is patterned after a quantile regression analysis of covariates associated with birth weight that was carried out by Koenker and Hallock (2001). Their study uses a subset of the June 1997 Detailed Natality Data, which was published by the National Center for Health Statistics. The study demonstrates that conditional quantile functions provide more complete information about the covariate effects than ordinary least squares regression provides.
This example is based on Koenker and Hallock (2001); Abreveya (2001); it uses data for live, singleton births to mothers in the United States who were recorded as black or white, and who were
between the ages of 18 and 45. For convenience, this example uses 50,000 observations, which are randomly selected from the
qualified observations. Observations that have missing data for any of the variables are deleted. The data are available in
the data set Sashelp.BWeight
. The following step displays in Output 95.3.1 the variables in the data set:
proc contents varnum data=sashelp.bweight; ods select position; run;
Output 95.3.1: Sashelp.BWeight
Data Set
Variables in Creation Order | ||||
---|---|---|---|---|
# | Variable | Type | Len | Label |
1 | Weight | Num | 8 | Infant Birth Weight |
2 | Black | Num | 8 | Black Mother |
3 | Married | Num | 8 | Married Mother |
4 | Boy | Num | 8 | Baby Boy |
5 | MomAge | Num | 8 | Mother's Age |
6 | MomSmoke | Num | 8 | Smoking Mother |
7 | CigsPerDay | Num | 8 | Cigarettes Per Day |
8 | MomWtGain | Num | 8 | Mother's Pregnancy Weight Gain |
9 | Visit | Num | 8 | Prenatal Visit |
10 | MomEdLevel | Num | 8 | Mother's Education Level |
The following step creates descriptive labels for the values of the classification variables Visit
and MomEdLevel
:
proc format; value vfmt 0 = 'No Visit' 1 = 'Second Trimester' 2 = 'Last Trimester' 3 = 'First Trimester'; value efmt 0 = 'High School' 1 = 'Some College' 2 = 'College' 3 = 'Less Than High School'; run;
There are four levels of maternal education. When you specify the ORDER=INTERNAL option, PROC QUANTREG treats the highest unformatted value (3, which represents that the mother’s education level is less than high school) as a reference level. The regression coefficients of other levels measure the effect relative to this level. Likewise, there are four levels of prenatal medical care of the mother, and a first visit in the first trimester serves as the reference level.
The following statements fit a regression model for 19 quantiles of birth weight, which are evenly spaced in the interval . The model includes linear and quadratic effects for the age of the mother and for weight gain during pregnancy.
ods graphics on; proc quantreg ci=sparsity/iid algorithm=interior(tolerance=5.e-4) data=sashelp.bweight order=internal; class Visit MomEdLevel; model Weight = Black Married Boy Visit MomEdLevel MomSmoke CigsPerDay MomAge MomAge*MomAge MomWtGain MomWtGain*MomWtGain / quantile= 0.05 to 0.95 by 0.05 plot=quantplot; format Visit vfmt. MomEdLevel efmt.; run;
Output 95.3.2 displays the model information and summary statistics for the variables in the model.
Output 95.3.2: Model Information and Summary Statistics
Model Information | ||
---|---|---|
Data Set | SASHELP.BWEIGHT | Infant Birth Weight |
Dependent Variable | Weight | Infant Birth Weight |
Number of Independent Variables | 9 | |
Number of Continuous Independent Variables | 7 | |
Number of Class Independent Variables | 2 | |
Number of Observations | 50000 | |
Optimization Algorithm | Interior | |
Method for Confidence Limits | Sparsity |
Summary Statistics | ||||||
---|---|---|---|---|---|---|
Variable | Q1 | Median | Q3 | Mean | Standard Deviation |
MAD |
Black | 0 | 0 | 0 | 0.1628 | 0.3692 | 0 |
Married | 0 | 1.0000 | 1.0000 | 0.7126 | 0.4525 | 0 |
Boy | 0 | 1.0000 | 1.0000 | 0.5158 | 0.4998 | 0 |
MomSmoke | 0 | 0 | 0 | 0.1307 | 0.3370 | 0 |
CigsPerDay | 0 | 0 | 0 | 1.4766 | 4.6541 | 0 |
MomAge | -4.0000 | 0 | 5.0000 | 0.4161 | 5.7285 | 5.9304 |
MomAge*MomAge | 4.0000 | 16.0000 | 49.0000 | 32.9877 | 39.2861 | 22.2390 |
MomWtGain | -8.0000 | 0 | 9.0000 | 0.7092 | 12.8761 | 11.8608 |
MomWtGain*MomWtGain | 16.0000 | 64.0000 | 196.0 | 166.3 | 298.8 | 88.9561 |
Weight | 3062.0 | 3402.0 | 3720.0 | 3370.8 | 566.4 | 504.1 |
Among the 11 independent variables, Black
, Married
, Boy
, and MomSmoke
are binary variables. For these variables, the mean represents the proportion in the category. The two continuous variables,
MomAge
and MomWtGain
, are centered at their medians, which are 27 and 30, respectively.
The quantile plots for the intercept and the other 15 factors with nonzero degrees of freedom are shown in the following four panels. In each plot, the regression coefficient at a given quantile indicates the effect on birth weight of a unit change in that factor, assuming that the other factors are fixed. The bands represent 95% confidence intervals.
Although the data set used here is a subset of the Natality data set, the results are quite similar to those of Koenker and Hallock (2001) for the full data set.
In Output 95.3.3, the first plot is for the intercept. As explained by Koenker and Hallock (2001), the intercept "may be interpreted as the estimated conditional quantile function of the birth-weight distribution of a girl born to an unmarried, white mother with less than a high school education, who is 27 years old and had a weight gain of 30 pounds, didn’t smoke, and had her first prenatal visit in the first trimester of the pregnancy."
The second plot shows that infants born to black mothers weigh less than infants born to white mothers, especially in the lower tail of the birth-weight distribution. The third plot shows that marital status has a large positive effect on birth weight, especially in the lower tail. The fourth plot shows that boys weigh more than girls for any chosen quantile; this difference is smaller in the lower quantiles of the distribution.
In Output 95.3.4, the first three plots deal with prenatal care. Compared with babies born to mothers who had a prenatal visit in the first trimester, babies born to mothers who received no prenatal care weigh less, especially in the lower quantiles of the birth-weight distributions. As noted by Koenker and Hallock (2001), "babies born to mothers who delayed prenatal visits until the second or third trimester have substantially higher birthweights in the lower tail than mothers who had a prenatal visit in the first trimester. This might be interpreted as the self-selection effect of mothers confident about favorable outcomes."
The fourth plot in Output 95.3.4 and the first two plots in Output 95.3.5 are for variables that are related to education. Education beyond high school is associated with a positive effect on birth weight. The effect of high school education is uniformly around 15 grams across the entire birth-weight distribution (this is a pure location shift effect), whereas the effect of some college and college education is more positive in the lower quantiles than the upper quantiles.
The remaining two plots in Output 95.3.5 show that smoking is associated with a large negative effect on birth weight.
The linear and quadratic effects for the two continuous variables are shown in Output 95.3.6. Both of these variables are centered at their median. At the lower quantiles, the quadratic effect of the mother’s age is more concave. The optimal age at the first quantile is about 33, and the optimal age at the third quantile is about 38. The effect of the mother’s weight gain is clearly positive, as indicated by the narrow confidence bands for both linear and quadratic coefficients.
For more information about the covariate effects that are discovered by using quantile regression, see Koenker and Hallock (2001).
Output 95.3.3: Quantile Processes with 95% Confidence Bands
Output 95.3.4: Quantile Processes with 95% Confidence Bands
Output 95.3.5: Quantile Processes with 95% Confidence Bands
Output 95.3.6: Quantile Processes with 95% Confidence Bands