This example uses the MCMC procedure to fit a Bayesian linear regression model with standardized covariates. It shows how the random walk Metropolis sampling algorithm struggles when the scales of the regression parameters are vastly different. It also illustrates that the sampling algorithm performs quite well when the covariates are standardized.
The following data set contains salary and performance information for Major League Baseball players (excluding pitchers) who played at least one game in both the 1986 and 1987 seasons. The salaries are for the 1987 season (Time Inc.; 1987 ) , and the performance measures are from the 1986 season (Reichler; 1987 ) .
data baseball; input logSalary no_hits no_runs no_rbi no_bb yr_major cr_hits @@; yr_major2 = yr_major*yr_major; cr_hits2 = cr_hits*cr_hits; label no_hits="Hits in 1986" no_runs="Runs in 1986" no_rbi="RBIs in 1986" no_bb="Walks in 1986" yr_major="Years in MLB" cr_hits="Career Hits" yr_major2="Years in MLB^2" cr_hits2="Career Hits^2" logSalary = "log10(Salary)"; datalines; . 66 30 29 14 1 66 2.6766936096 81 24 38 39 14 835 ... more lines ... 2.84509804 127 65 48 37 5 806 2.942008053 136 76 50 94 12 1511 2.5854607295 126 61 43 52 6 433 2.982271233 144 85 60 78 8 857 3 170 77 44 31 11 1457 ;
The MEANS procedure produces summary statistics for these data. Summary measures are saved to the SUM_BASEBALL data set for future analysis.
proc means data = baseball mean stddev; output out=sum_baseball(drop=_type_ _freq_); run;
Figure 1 displays the results.
Variable  Label  Mean  Std Dev  





Suppose you want to fit a Bayesian linear regression model for the logarithm of a player’s salary with density as follows:
( 1 ) 
where is the vector of covariates listed as for baseball players. Pete Rose was an extreme outlier in 1986, and his information greatly skews results. He is omitted from this data set and analysis.
The likelihood function for the logarithm of salary and the corresponding covariates is
( 2 ) 
where denotes a conditional probability density. The normal density is evaluated at the specified value of and the corresponding mean parameter defined in Equation 1 . The regression parameters in the likelihood are through .
Suppose the following prior distributions are placed on the parameters:
( 3 ) 
where indicates a prior distribution and is the density function for the inversegamma distribution. Priors of this type with large variances are often called diffuse priors.
Using Bayes’ theorem, the likelihood function and prior distributions determine the posterior distribution of the parameters as follows:
PROC MCMC obtains samples from the desired posterior distribution. You do not need to specify the exact form of the posterior distribution.
The following SAS statements use the likelihood function and prior distributions to fit the Bayesian linear regression model. The PROC MCMC statement invokes the procedure and specifies the input data set. The NBI= option specifies the number of burnin iterations. The NMC= option specifies the number of posterior simulation iterations. The SEED= option specifies a seed for the random number generator (the seed guarantees the reproducibility of the random stream). The PROPCOV=QUANEW option uses the estimated inverse Hessian matrix as the initial proposal covariance matrix.
ods graphics on; proc mcmc data=baseball nbi=50000 nmc=10000 seed=1181 propcov=quanew; array beta[9] beta0beta8; array data[9] 1 no_hits no_runs no_rbi no_bb yr_major cr_hits yr_major2 cr_hits2; parms beta: 0; parms sig2 1; prior beta: ~ normal(0,var = 1000); prior sig2 ~ igamma(shape = 3/10, scale = 10/3); call mult(beta, data, mu); model logsalary ~ n(mu, var = sig2); run; ods graphics off;
Each of the two ARRAY statements associates a name with a list of variables and constants. The first ARRAY statement specifies names for the regression coefficients. The second ARRAY statement contains all of the covariates.
The first PARMS statement places all regression parameters in a single block and assigns them an initial value of 0. The second PARMS statement places the variance parameter in a separate block and assigns it an initial value of 1.
The first PRIOR statement assigns the normal prior to each of the regression parameters. The second PRIOR statement assigns the inversegamma prior distribution to .
The CALL statement uses the MULT matrix multiplication function to calculate . The MODEL statement specifies the likelihood function as given in Equation 2 .
The first step in evaluating the results is to review the convergence diagnostics. With ODS Graphics turned on, PROC MCMC produces graphs. Figure 2 displays convergence diagnostic graphs for the regression parameter. The trace plot indicates that the chain does not appear to have reached a stationary distribution and appears to have poor mixing. The diagnostic plots for the rest of the parameters (not shown here) tell a similar story.
Figure 2 Bayesian Diagnostic Plots for
The nonconvergence exhibited here results because the parameters are scaled very differently from each other for these data. The random walk Metropolis algorithm is not an optimal sampling algorithm in the case where the parameters have vastly different scales. Standardized covariates (Mayer and Younger; 1976 ) eliminate this problem, and the random walk Metropolis algorithm proceeds smoothly.
Suppose you want to fit the same Bayesian linear regression model, but you want to use standardized covariates. You rewrite the mean function in Equation 1 as
where is the design matrix constructed from a column of 1s and standardized covariates. The regression parameters on the standardized scale are represented by . The standardized covariates are computed as follows:
for players and covariates, and where and are the mean and standard deviation of the th covariate, respectively.
The following statements manipulate the SUM_BASEBALL output data set from the earlier use of PROC MEANS. The statements create macro variables for the means and standard deviations to use later in the analysis. The macro variables are independent of SAS data set variables and can be referenced in SAS procedures to facilitate computations. The TRANSPOSE procedure transposes the SUM_BASEBALL data set and a DATA step creates the macro variables by using the SYMPUTX functions. The %PUT statements enable you to verify that the macro variables have been created successfully.
proc transpose data=sum_baseball out=tab; id _stat_; run; data _null_; set tab; sub = put((_n_1), 1.); call symputx(compress('m'  sub,'*'), mean); call symputx(compress('s'  sub,'*'), std); run; %put &m1 &m2 &m3 &m4 &m5 &m6 &m7 &m8; %put &s1 &s2 &s3 &s4 &s5 &s6 &s7 &s8;
In this example, and were calculated in the MEANS procedure and recorded in the macro variables M1–M8 and S1–S8, respectively. The STANDARD procedure computes standardized values of the variables in the original data set.
proc standard data=baseball out=baseball_std mean=0 std=1; var no_hits  cr_hits2; run;
The new likelihood function for the logarithm of the salary and corresponding standardized covariates is as follows:
For ease of interpretation and inference, you can transform the standardized regression parameters back to the original scale with the following formulas:
Suppose the following diffuse prior distribution is placed on :
The prior distribution for is given in Equation 3 .
Using Bayes' theorem, the likelihood function and prior distributions determine the posterior distribution of the parameters as follows:
The following SAS statements fit the Bayesian linear regression model. The MONITOR= option outputs analysis on selected symbols of interest in the program.
ods graphics on; proc mcmc data=baseball_std nbi=10000 nmc=20000 seed=1181 propcov=quanew monitor=(beta0beta8 sig2); array beta[9] beta0beta8 (0); array betastar[9] betastar0betastar8; array data[9] 1 no_hits no_runs no_rbi no_bb yr_major cr_hits yr_major2 cr_hits2; array mn[9] (0 &m1 &m2 &m3 &m4 &m5 &m6 &m7 &m8); array std[9] (0 &s1 &s2 &s3 &s4 &s5 &s6 &s7 &s8); parms betastar: 0; parms sig2 1; prior betastar: ~ normal(0,var = 1000); prior sig2 ~ igamma(shape = 3/10, scale = 10/3); call mult(betastar, data, mu); model logsalary ~ n(mu, var = sig2); beginprior; summ = 0; do i = 2 to 9; beta[i] = betastar[i]/std[i]; summ = summ + beta[i]*mn[i]; end; beta0 = betastar0  summ; endprior; run; ods graphics off;
The first two ARRAY statements specify names for the regression coefficients and for the original and standardized scale, respectively. The last three ARRAY statements for DATA, MN, and STD vectors take advantage of PROC MCMC’s ability to use both matrix functions and the SAS programming language. The PARMS, PRIOR, and MODEL statements are called with the same syntax as in the first call to the MCMC procedure.
The BEGINPRIOR and ENDPRIOR statements reduce unnecessary observationlevel computations. The statements inside the BEGINPRIOR and ENDPRIOR statements create a block of statements that are run only once per iteration rather than once for each observation at each iteration. This enables a quick update of the symbols enclosed in the statements. The statements within the BEGINPRIOR and ENDPRIOR block transform the sampled values back to .
The trace plot in Figure 3 indicates that the chain appears to have reached a stationary distribution. It also has good mixing and is dense. The autocorrelation plot indicates low autocorrelation and efficient sampling. Finally, the kernel density plot shows the smooth, unimodal shape of posterior marginal distribution for . The remaining diagnostic plots (not shown here) similarly indicate good convergence in the other parameters. Using standardized covariates solves the case of convergence for this model for these data.
Figure 3 Bayesian Diagnostic Plots for Using Standardization
Figure 4 reports summary and interval statistics of all parameters. For example, the mean salary increases by an estimated factor of (approximately 27%) for each year the player was in Major League Baseball. Similarly, using the same formula, , you can see how the mean salary changes by one unit for each of the covariates. Both the equal tail and highest posterior density (HPD) intervals include 0 for , and , indicating that the change in salary with respect to these covariates is not significant. The number of years played seems to be the most influential covariate, followed by the number of career hits.
Figure 4 Posterior Model Summary of Bayesian Linear Regression with Standardized Covariates
Posterior Summaries  

Parameter  N  Mean 
Standard
Deviation 
Percentiles  
25%  50%  75%  
beta0  20000  1.6465  0.0666  1.6006  1.6470  1.6935 
beta1  20000  0.00007  0.000938  0.00071  0.00004  0.000594 
beta2  20000  0.000882  0.00167  0.00023  0.000860  0.00200 
beta3  20000  0.00186  0.000993  0.00119  0.00186  0.00253 
beta4  20000  0.00218  0.000980  0.00152  0.00217  0.00281 
beta5  20000  0.1042  0.0205  0.0902  0.1038  0.1176 
beta6  20000  0.000748  0.000163  0.000642  0.000750  0.000857 
beta7  20000  0.00629  0.000978  0.00692  0.00629  0.00562 
beta8  20000  1.46E7  5.867E8  1.86E7  1.47E7  1.08E7 
sig2  20000  0.0595  0.00533  0.0558  0.0592  0.0629 
Posterior Intervals  

Parameter  Alpha  EqualTail Interval  HPD Interval  
beta0  0.050  1.5123  1.7714  1.5120  1.7705 
beta1  0.050  0.00195  0.00165  0.00192  0.00168 
beta2  0.050  0.00235  0.00417  0.00233  0.00418 
beta3  0.050  0.00006  0.00382  0.00001  0.00383 
beta4  0.050  0.000236  0.00412  0.000303  0.00416 
beta5  0.050  0.0651  0.1450  0.0625  0.1415 
beta6  0.050  0.000428  0.00107  0.000428  0.00107 
beta7  0.050  0.00827  0.00443  0.00822  0.00442 
beta8  0.050  2.62E7  3.17E8  2.63E7  3.32E8 
sig2  0.050  0.0498  0.0705  0.0494  0.0699 
Mayer, L. S. and Younger, M. S. (1976), “Estimation of Standardized Regression Coefficients,” Journal of the American Statistical Association , 71(353), 154–157.
Reichler, J. L., ed. (1987), The 1987 Baseball Encyclopedia Update , New York: Macmillan.
Time Inc. (1987), “What They Make,” Sports Illustrated , 54–81.