44931 - Comparing groups with respect to the dose (or predictor value) that yields a specified response probability

Usage Note 44931: Comparing groups with respect to the dose (or predictor value) that yields a specified response probability

When modeling the probability of an event, p, the INVERSECL option in PROC PROBIT provides estimates of a continuous predictor (often a dose variable) that yield a range of response probabilities. See the examples in the PROBIT documentation. Suppose you have several groups of subjects and you fit a model to each group. You may want to compare groups with respect to the dose that yields, say, a 50% response probability (commonly called the ED50 or LD50). The following discusses and illustrates how tests and confidence intervals of differences of such dose values can be obtained.

PROC PROBIT fits the model F^-1(p)=x'β, where F is the cumulative distribution function used to model the probability (normal for probit models or logistic for logistic models), x is the vector of predictor values, and β is the corresponding vector of parameters.

The first step is to write a single model that encompasses all groups by simultaneously fitting separate curves to all of the groups. This is discussed in generality in this usage note, but is illustrated below for the case of modeling a response probability. Group dose values can then be compared by testing a function of the parameters in this model. The model that simultaneously estimates curves for all groups can be fitted by including the group variable in the CLASS and MODEL statements. For example, these statements

      proc probit;
        class group;
        model response = group x;
        run;

fit the following parallel lines probit model with separate lines (lines on the probit scale, but curves on the probability scale) for each of the groups:

F^-1(p) = probit(p) = α + γ_i + βX ,

where α is the overall intercept, γ_i is the parameter for the i^th group representing the deviation of the group i intercept from the overall intercept, and β is the common slope on X.

A model that allows for nonparallel lines on the probit scale can be obtained by adding the GROUP×X interaction. These statements

      proc probit;
        class group;
        model response = group x group*x;
        run;

fit the nonparallel lines probit model

F^-1(p) = probit(p) = α + γ_i + βX + δ_iX = α + γ_i + (β+δ_i)X ,

where δ_i is the deviation of the group i slope from the common slope, β.

By simple algebraic rearrangement of the model, F^-1(p)=x'β, the following estimator of the dose, ^ x , yielding a given response probability, p, is obtained as shown in "Inverse Confidence Limits" in the Details section of the PROBIT documentation:

^ x = β^-1[F^-1(p) - x^*'b^*]

where β is the dose variable's parameter, x^* is the vector of predictors other than the dose variable, and b^* is the corresponding vector of parameter estimates. Suppose you have two groups, 0 and 1, and want to compare their ED50 values. Since the ED50 is the dose giving a 50% response, the response probability is p=0.5. Then the difference in group dose estimates in the parallel lines model is

^ x₁ - ^ x₀ = β^-1[F^-1(0.5) - α] - β^-1[F^-1(0.5) - (α+γ₀)] = γ₀/β

With the group variable in the CLASS statement, the last level (1 in this case) is treated as the reference category, so its parameter, γ₁, is set to zero. Note that the difference is invariant with respect to the chosen response probability, p, since the lines are parallel. For the nonparallel lines model, the difference in group dose estimates is

^ x₁ - ^ x₀ = β^-1[F^-1(0.5) - α] - (β+δ₀)^-1[F^-1(0.5) - (α+γ₀)]

The deviation of the group 1 slope, δ₁, is zero since group 1 is the reference level.

While these differences cannot be estimated in PROC PROBIT, they can be estimated using the NLEstimate macro or equivalently by fitting the model in PROC NLMIXED and using the ESTIMATE statement.

Example

The data in the example titled "An Epidemiology Study" in the PROBIT documentation records the number of individuals surviving after an epidemic, r, out of the number treated, n, for combinations of dosage of a medicine (dose), treatment (treat = A, B), and sex (sex = 0(Female), 1(Male)). For purposes of this discussion, the treatment variable is not used.

      data epidemic;
         input treat$ dose n r sex @@;
         label dose = Dose;
         datalines;
      A  2.17 142 142  0   A   .57 132  47  1
      A  1.68 128 105  1   A  1.08 126 100  0
      A  1.79 125 118  0   B  1.66 117 115  1
      B  1.49 127 114  0   B  1.17  51  44  1
      B  2.00 127 126  0   B   .80 129 100  1
      ;

These statements fit the nonparallel lines model that simultaneously fits unrestricted lines to both sexes. However, with the SEX×DOSE interaction in the model, PROC PROBIT can provide neither the separate ED50 estimates for the sexes nor the difference in the estimates using the INVERSECL option. The STORE statement is used to save the model information for use later by the NLEstimate macro.

      proc probit data=epidemic;
         class sex;
         model r/n = sex dose sex*dose;
         store out=prbmod;
         run;

The estimated parameters of the model appear below.

Analysis of Maximum Likelihood Parameter Estimates
Parameter		DF	Estimate	Standard Error	95% Confidence Limits		Chi-Square	Pr > ChiSq
Intercept		1	-0.7273	0.1514	-1.0242	-0.4305	23.06	<.0001
sex	0	1	-0.1465	0.3691	-0.8699	0.5768	0.16	0.6913
sex	1	0	0.0000	.	.	.	.	.
dose		1	1.2738	0.1360	1.0073	1.5404	87.74	<.0001
dose*sex	0	1	0.2303	0.2645	-0.2880	0.7487	0.76	0.3838
dose*sex	1	0	0.0000	.	.	.	.	.

To obtain the ED50 estimates for the two sexes, the following statements are used instead. They model the effect of dosage on the response probability, separately for each sex, using a probit model. Note that the models for the two sexes are unrestricted, allowing the sexes to have separate intercepts and slopes. The INVERSECL(PROB=0.5) option requests an estimate of the dosage that produces a 50% response probability (ED50) for each sex.

      proc sort data=epidemic; 
         by sex; 
         run;
      proc probit data=epidemic;
         by sex;
         model r/n = dose / inversecl(prob=0.5);
         run;

Note that the ED50 estimates are 0.581 for females (sex=0) and .571 for males (sex=1). The difference (males - females) is -0.01.

sex=0

Probit Analysis on dose
Probability	dose	95% Fiducial Limits
0.50	0.58096	0.19929	0.79834

sex=1

Probit Analysis on dose
Probability	dose	95% Fiducial Limits
0.50	0.57098	0.41290	0.68820

Difference estimation using the NLEstimate macro

Since these are nonlinear functions of the model parameters, they can be estimated using the NLEstimate macro. The NLEstimate macro uses the fitted model information saved with the STORE statement in PROC PROBIT. It then uses PROC NLMIXED to estimate the functions and the delta method to obtain confidence limits. When estimating several functions, it is useful to specify them in a data set with appropriate labels. This is done in the following statements and the resulting data set is specified in the fdata= NLEstimate macro parameter. You write the functions to be estimated using the parameter names. See the description of the NLEstimate macro for details about displaying parameter names and using the macro.

      data fd; 
        length label f $32767; 
        infile datalines delimiter=',';
        input label f; 
        datalines;
      Male ED50,(1/b_p4)*(probit(0.5)-b_p1)
      Female ED50,(1/(b_p4+b_p5))*(probit(0.5)-(b_p1+b_p2))
      Male - Female ED50,(1/b_p4)*(probit(0.5)-b_p1) - (1/(b_p4+b_p5))*(probit(0.5)-(b_p1+b_p2))
      ;
      %NLEstimate(instore=prbmod, fdata=fd, df=10)

The estimated difference in ED50 values (males - females) is -0.01 as found earlier when fitting separate models^Note. In addition to the point estimate, the standard error of the difference (0.1560), a 95% confidence interval (-0.3575, 0.3375), and a test that the difference equals zero (p=0.9502) are given. The test and confidence interval indicate that the small difference is not significant.

Additional Estimates
Label	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper
Male ED50	0.5710	0.06792	10	8.41	<.0001	0.05	0.4197	0.7223
Female ED50	0.5810	0.1404	10	4.14	0.0020	0.05	0.2681	0.8938
Male - Female ED50	-0.00999	0.1560	10	-0.06	0.9502	0.05	-0.3575	0.3375

Since the test of the SEX×DOSE interaction in the above model is not significant (p ≈ 0.4), you might conclude that the effects of dose are the same in the two sexes and prefer a parallel lines model. The following statements fit the parallel lines model and obtain the ED50 estimates for each sex. To do this, PROC PROBIT is run two times, fitting the same model using all of the data from both sexes, but using different XDATA= data sets to estimate the two ED50 values. The XDATA= data set allows you to specify the group for which the INVERSECL option obtains estimates. The data set should contain one observation which defines the setting of the covariates for which you want dose estimates. The data set must also contain some nonmissing value for the response, though the actual value does not affect the computation. In data set FEMALE, SEX=0 and in data set MALE, SEX=1. In both data sets, DOSE is arbitrarily set to 1. Specifying XDATA=FEMALE in PROC PROBIT causes the INVERSECL(PROB=0.5) option to report the ED50 estimate for females. Similarly, XDATA=MALE provides the ED50 estimate for males. The STORE statement is used in the first PROBIT step to save the model for use in the NLEstimate macro.

      /* Request ED50 estimate for females */
      data female;
         input sex dose;
         datalines;
      0 1
      ;
      proc probit data=epidemic xdata=female;
         class sex;
         model r/n = sex dose / inversecl(prob=0.5);
         store out=prllmod;
         title 'Female ED50 estimate';
         run;

      /* Request ED50 estimate for males */
      data male;
         input sex dose;
         datalines;
      1 1
      ;
      proc probit data=epidemic xdata=male;
         class sex;
         model r/n = sex dose / inversecl(prob=0.5);
         title 'Male ED50 estimate';
         run;

Female ED50 estimate

Probit Analysis on dose
Probability	dose	95% Fiducial Limits
0.50	0.47217	0.23121	0.65389

Male ED50 estimate

Probit Analysis on dose
Probability	dose	95% Fiducial Limits
0.50	0.59193	0.46146	0.69734

The ED50 for each sex and their difference are again specified in a data set and estimated by the NLEstimate macro.

      data fd; 
        length label f $32767; 
        infile datalines delimiter=',';
        input label f; 
        datalines;
      Male ED50,(1/b_p4)*(probit(0.5)-b_p1)
      Female ED50,(1/b_p4)*(probit(0.5)-(b_p1+b_p2))
      Male - Female ED50,b_p2/b_p4
      ;
      %NLEstimate(instore=prllmod, fdata=fd, df=10)

While the ED50 values are slightly more different than under the nonparallel lines model (0.12 vs -0.01), the difference is still not significant (p=0.2172).

Additional Estimates
Label	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper
Male ED50	0.5919	0.05896	10	10.04	<.0001	0.05	0.4606	0.7233
Female ED50	0.4722	0.1052	10	4.49	0.0012	0.05	0.2378	0.7066
Male - Female ED50	0.1198	0.09092	10	1.32	0.2172	0.05	-0.08282	0.3223

Difference estimation using PROC NLMIXED

The ED50 estimates and difference can also be obtained by using PROC NLMIXED directly. The following fits the nonparallel lines model using PROC NLMIXED and estimates the difference in ED50 values. The model is specified by the assignment statement defining the response probability, p, as F(x'β), and by the MODEL statement indicating the number of responses, r, is distributed as binomial with number of trials, n, and response probability p. The first ESTIMATE statement estimates the ED50 for males using the first half of the difference for nonparallel models shown above. Similarly, the second ESTIMATE statement estimates the ED50 for females using the second half. The third ESTIMATE statement estimates the male - female difference in ED50 values. Note that since this is a probit model, F is the cumulative standard normal distribution. The PROBNORM function returns values from this distribution, and the PROBIT function returns values of its inverse.

      proc nlmixed data=epidemic;
         p=probnorm(b0 + bsex*(sex=0) + bdose*dose + bsexdose*dose*(sex=0));
         model r ~ binomial(n,p);
         estimate "Male ED50" (1/bdose)*(probit(0.5)-b0);
         estimate "Female ED50" (1/(bdose+bsexdose))*(probit(0.5)-(b0+bsex));
         estimate "Male - Female ED50" (1/bdose)*(probit(0.5)-b0) - 
                 (1/(bdose+bsexdose))*(probit(0.5)-(b0+bsex));
         run;

The results are the same as from the NLEstimate macro above.

PROC NLMIXED can again be used to fit the parallel lines model and provide an estimate, test, and confidence interval comparing the ED50 values. When fitting a parallel lines model, you assume that the difference in values that provides a 50% response is the same as the difference at any response probability. Note that as shown above, the estimator of the difference does not depend on the chosen response probability. In this example, the parameter for the nonreference level of the group variable, γ₀ is BSEX, and the common slope, β, for the dose variable is BDOSE.

      proc nlmixed data=epidemic;
         p=probnorm(b0 + bsex*(sex=0) + bdose*dose);
         model r ~ binomial(n,p);
         estimate "Male ED50" (1/bdose)*(probit(0.5)-b0);
         estimate "Female ED50" (1/bdose)*(probit(0.5)-(b0+bsex));
         estimate "Male - Female ED50" bsex/bdose;
         run;

The results again match those from the NLEstimate macro above.

_____

Note: The PROC PROBIT step without the BY statement uses all the data from both sexes to fit the nonparallel lines model. The PROC PROBIT step with the BY statement fits separate models for the two sexes using only the data from one sex in each model. Because of this, the results may not be identical. Fitting a single model to all groups, using all the data, is usually preferable.

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	SAS/STAT	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 for x64
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	SAS Reference ==> Procedures ==> PROBIT Analytics ==> Categorical Data Analysis Analytics ==> Regression SAS Reference ==> Macro

Date Modified:	2019-05-03 13:58:17
Date Created:	2011-11-22 14:44:09

Support

Usage Note 44931: Comparing groups with respect to the dose (or predictor value) that yields a specified response probability

Example

Operating System and Release Information