When modeling the probability of an event, p, the INVERSECL option in PROC PROBIT provides estimates of a continuous predictor (often a dose variable) that yield a range of response probabilities. See the examples in the PROBIT documentation. Suppose you have several groups of subjects and you fit a model to each group. You may want to compare groups with respect to the dose that yields, say, a 50% response probability (commonly called the ED50 or LD50). The following discusses and illustrates how tests and confidence intervals of differences of such dose values can be obtained.
PROC PROBIT fits the model F-1(p)=x'β, where F is the cumulative distribution function used to model the probability (normal for probit models or logistic for logistic models), x is the vector of predictor values, and β is the corresponding vector of parameters.
The first step is to write a single model that encompasses all groups by simultaneously fitting separate curves to all of the groups. This is discussed in generality in this usage note, but is illustrated below for the case of modeling a response probability. Group dose values can then be compared by testing a function of the parameters in this model. The model that simultaneously estimates curves for all groups can be fitted by including the group variable in the CLASS and MODEL statements. For example, these statements
proc probit; class group; model response = group x; run;
fit the following parallel lines probit model with separate lines (lines on the probit scale, but curves on the probability scale) for each of the groups:
F-1(p) = probit(p) = α + γi + βX ,
where α is the overall intercept, γi is the parameter for the ith group representing the deviation of the group i intercept from the overall intercept, and β is the common slope on X.
A model that allows for nonparallel lines on the probit scale can be obtained by adding the GROUP×X interaction. These statements
proc probit; class group; model response = group x group*x; run;
fit the nonparallel lines probit model
F-1(p) = probit(p) = α + γi + βX + δiX = α + γi + (β+δi)X ,
where δi is the deviation of the group i slope from the common slope, β.
By simple algebraic rearrangement of the model, F-1(p)=x'β, the following estimator of the dose, ^ x , yielding a given response probability, p, is obtained as shown in "Inverse Confidence Limits" in the Details section of the PROBIT documentation:
^ x = β-1[F-1(p) - x*'b*]
where β is the dose variable's parameter, x* is the vector of predictors other than the dose variable, and b* is the corresponding vector of parameter estimates. Suppose you have two groups, 0 and 1, and want to compare their ED50 values. Since the ED50 is the dose giving a 50% response, the response probability is p=0.5. Then the difference in group dose estimates in the parallel lines model is
^ x1 - ^ x0 = β-1[F-1(0.5) - α] - β-1[F-1(0.5) - (α+γ0)] = γ0/β
With the group variable in the CLASS statement, the last level (1 in this case) is treated as the reference category, so its parameter, γ1, is set to zero. Note that the difference is invariant with respect to the chosen response probability, p, since the lines are parallel. For the nonparallel lines model, the difference in group dose estimates is
^ x1 - ^ x0 = β-1[F-1(0.5) - α] - (β+δ0)-1[F-1(0.5) - (α+γ0)]
The deviation of the group 1 slope, δ1, is zero since group 1 is the reference level.
While these differences cannot be estimated in PROC PROBIT, they can be estimated using the NLEstimate macro or equivalently by fitting the model in PROC NLMIXED and using the ESTIMATE statement.
The data in the example titled "An Epidemiology Study" in the PROBIT documentation records the number of individuals surviving after an epidemic, r, out of the number treated, n, for combinations of dosage of a medicine (dose), treatment (treat = A, B), and sex (sex = 0(Female), 1(Male)). For purposes of this discussion, the treatment variable is not used.
data epidemic; input treat$ dose n r sex @@; label dose = Dose; datalines; A 2.17 142 142 0 A .57 132 47 1 A 1.68 128 105 1 A 1.08 126 100 0 A 1.79 125 118 0 B 1.66 117 115 1 B 1.49 127 114 0 B 1.17 51 44 1 B 2.00 127 126 0 B .80 129 100 1 ;
These statements fit the nonparallel lines model that simultaneously fits unrestricted lines to both sexes. However, with the SEX×DOSE interaction in the model, PROC PROBIT can provide neither the separate ED50 estimates for the sexes nor the difference in the estimates using the INVERSECL option. The STORE statement is used to save the model information for use later by the NLEstimate macro.
proc probit data=epidemic; class sex; model r/n = sex dose sex*dose; store out=prbmod; run;
The estimated parameters of the model appear below.
|
To obtain the ED50 estimates for the two sexes, the following statements are used instead. They model the effect of dosage on the response probability, separately for each sex, using a probit model. Note that the models for the two sexes are unrestricted, allowing the sexes to have separate intercepts and slopes. The INVERSECL(PROB=0.5) option requests an estimate of the dosage that produces a 50% response probability (ED50) for each sex.
proc sort data=epidemic; by sex; run; proc probit data=epidemic; by sex; model r/n = dose / inversecl(prob=0.5); run;
Note that the ED50 estimates are 0.581 for females (sex=0) and .571 for males (sex=1). The difference (males - females) is -0.01.
sex=0
sex=1
|
Difference estimation using the NLEstimate macro
Since these are nonlinear functions of the model parameters, they can be estimated using the NLEstimate macro. The NLEstimate macro uses the fitted model information saved with the STORE statement in PROC PROBIT. It then uses PROC NLMIXED to estimate the functions and the delta method to obtain confidence limits. When estimating several functions, it is useful to specify them in a data set with appropriate labels. This is done in the following statements and the resulting data set is specified in the fdata= NLEstimate macro parameter. You write the functions to be estimated using the parameter names. See the description of the NLEstimate macro for details about displaying parameter names and using the macro.
data fd; length label f $32767; infile datalines delimiter=','; input label f; datalines; Male ED50,(1/b_p4)*(probit(0.5)-b_p1) Female ED50,(1/(b_p4+b_p5))*(probit(0.5)-(b_p1+b_p2)) Male - Female ED50,(1/b_p4)*(probit(0.5)-b_p1) - (1/(b_p4+b_p5))*(probit(0.5)-(b_p1+b_p2)) ; %NLEstimate(instore=prbmod, fdata=fd, df=10)
The estimated difference in ED50 values (males - females) is -0.01 as found earlier when fitting separate modelsNote. In addition to the point estimate, the standard error of the difference (0.1560), a 95% confidence interval (-0.3575, 0.3375), and a test that the difference equals zero (p=0.9502) are given. The test and confidence interval indicate that the small difference is not significant.
|
Since the test of the SEX×DOSE interaction in the above model is not significant (p ≈ 0.4), you might conclude that the effects of dose are the same in the two sexes and prefer a parallel lines model. The following statements fit the parallel lines model and obtain the ED50 estimates for each sex. To do this, PROC PROBIT is run two times, fitting the same model using all of the data from both sexes, but using different XDATA= data sets to estimate the two ED50 values. The XDATA= data set allows you to specify the group for which the INVERSECL option obtains estimates. The data set should contain one observation which defines the setting of the covariates for which you want dose estimates. The data set must also contain some nonmissing value for the response, though the actual value does not affect the computation. In data set FEMALE, SEX=0 and in data set MALE, SEX=1. In both data sets, DOSE is arbitrarily set to 1. Specifying XDATA=FEMALE in PROC PROBIT causes the INVERSECL(PROB=0.5) option to report the ED50 estimate for females. Similarly, XDATA=MALE provides the ED50 estimate for males. The STORE statement is used in the first PROBIT step to save the model for use in the NLEstimate macro.
/* Request ED50 estimate for females */ data female; input sex dose; datalines; 0 1 ; proc probit data=epidemic xdata=female; class sex; model r/n = sex dose / inversecl(prob=0.5); store out=prllmod; title 'Female ED50 estimate'; run; /* Request ED50 estimate for males */ data male; input sex dose; datalines; 1 1 ; proc probit data=epidemic xdata=male; class sex; model r/n = sex dose / inversecl(prob=0.5); title 'Male ED50 estimate'; run;
|
The ED50 for each sex and their difference are again specified in a data set and estimated by the NLEstimate macro.
data fd; length label f $32767; infile datalines delimiter=','; input label f; datalines; Male ED50,(1/b_p4)*(probit(0.5)-b_p1) Female ED50,(1/b_p4)*(probit(0.5)-(b_p1+b_p2)) Male - Female ED50,b_p2/b_p4 ; %NLEstimate(instore=prllmod, fdata=fd, df=10)
While the ED50 values are slightly more different than under the nonparallel lines model (0.12 vs -0.01), the difference is still not significant (p=0.2172).
|
Difference estimation using PROC NLMIXED
The ED50 estimates and difference can also be obtained by using PROC NLMIXED directly. The following fits the nonparallel lines model using PROC NLMIXED and estimates the difference in ED50 values. The model is specified by the assignment statement defining the response probability, p, as F(x'β), and by the MODEL statement indicating the number of responses, r, is distributed as binomial with number of trials, n, and response probability p. The first ESTIMATE statement estimates the ED50 for males using the first half of the difference for nonparallel models shown above. Similarly, the second ESTIMATE statement estimates the ED50 for females using the second half. The third ESTIMATE statement estimates the male - female difference in ED50 values. Note that since this is a probit model, F is the cumulative standard normal distribution. The PROBNORM function returns values from this distribution, and the PROBIT function returns values of its inverse.
proc nlmixed data=epidemic; p=probnorm(b0 + bsex*(sex=0) + bdose*dose + bsexdose*dose*(sex=0)); model r ~ binomial(n,p); estimate "Male ED50" (1/bdose)*(probit(0.5)-b0); estimate "Female ED50" (1/(bdose+bsexdose))*(probit(0.5)-(b0+bsex)); estimate "Male - Female ED50" (1/bdose)*(probit(0.5)-b0) - (1/(bdose+bsexdose))*(probit(0.5)-(b0+bsex)); run;
The results are the same as from the NLEstimate macro above.
PROC NLMIXED can again be used to fit the parallel lines model and provide an estimate, test, and confidence interval comparing the ED50 values. When fitting a parallel lines model, you assume that the difference in values that provides a 50% response is the same as the difference at any response probability. Note that as shown above, the estimator of the difference does not depend on the chosen response probability. In this example, the parameter for the nonreference level of the group variable, γ0 is BSEX, and the common slope, β, for the dose variable is BDOSE.
proc nlmixed data=epidemic; p=probnorm(b0 + bsex*(sex=0) + bdose*dose); model r ~ binomial(n,p); estimate "Male ED50" (1/bdose)*(probit(0.5)-b0); estimate "Female ED50" (1/bdose)*(probit(0.5)-(b0+bsex)); estimate "Male - Female ED50" bsex/bdose; run;
The results again match those from the NLEstimate macro above.
_____
Note: The PROC PROBIT step without the BY statement uses all the data from both sexes to fit the nonparallel lines model. The PROC PROBIT step with the BY statement fits separate models for the two sexes using only the data from one sex in each model. Because of this, the results may not be identical. Fitting a single model to all groups, using all the data, is usually preferable.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | ||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows XP Professional | ||||
Windows 7 Enterprise 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Ultimate x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows Vista for x64 | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | SAS Reference ==> Procedures ==> PROBIT Analytics ==> Categorical Data Analysis Analytics ==> Regression SAS Reference ==> Macro |
Date Modified: | 2019-05-03 13:58:17 |
Date Created: | 2011-11-22 14:44:09 |