69873 - Estimate the predictor value yielding a specified response value

Usage Note 69873: Estimate the predictor value yielding a specified response value

When using a regression method to fit a model to data, one of the most common goals is to use that model to estimate the response mean, μ, for a given setting of the predictor variables in the model. However, the reverse of this goal is sometimes of interest – that is, to estimate the value of a predictor variable that yields a particular response value while holding the other predictors in the model, if any, at fixed values. This is sometimes referred to as calibration or inverse regression.

Consider a generalized linear model

g(μ) = Xβ ,

where g(·) is a link function, X is the model matrix of predictor settings, and β are the model parameters to be estimated. Suppose you want to estimate the value of the X₁ predictor yielding a response mean value of μ₁. Rewrite the model as

g(μ) = β₁x₁ + X^*β^* ,

where X^*β^* is the portion of the model matrix and parameter vector omitting x₁ and its parameter, β₁. Note that this includes the intercept column and parameter if present in the model.

Solving for x₁

x₁ = (g(μ) - X^*β^*) / β₁ .

So, an estimate of the x₁ value yielding response mean μ₁ is

^ x₁ = (g(μ₁) - X^* ^ β^* ) / ^ β₁ .

Note that this estimator is a ratio of random values with model parameters in both the numerator and the denominator. While a point estimate for x₁ is easily produced using the estimated model parameters, obtaining a standard error and confidence interval is not so straightforward. Since this is a nonlinear function of the model parameters, the ESTIMATE statement that is available in many modeling procedures cannot be used since it estimates only linear functions. However, you can use the NLEST macro (SAS Note 58775) to estimate nonlinear functions. If you fit the model using the NLMIXED procedure, it can also be done using the ESTIMATE statement in that procedure, which is not restricted to linear functions. With either approach, the standard error is estimated using the delta method.

For binary response models such as logistic and probit models, inverse regression estimates can be obtained more simply by fitting the model in the PROBIT procedure and using the INVERSECL option. Examples can be found in SAS Note 44931 and at the end of SAS Note 56476. Also see the examples in the PROBIT procedure documentation titled "Dosage Levels" and "An Epidemiology Study."

Normal Response Model with Interaction

Consider the following model for a normally distributed response variable, Y, as a function of X₁, X₂, and their interaction. For this ordinary regression model, the link function is the identity link, g(μ)=μ.

μ = β₀ + β₁x₁ + β₂x₂ + β₃x₁x₂

Rewrite the model as

μ = β₀ + (β₁ + β₃x₂) x₁ + β₂x₂ .

Then the estimate of x₁ yielding a response mean of μ₁ is

^ x₁ = (μ₁ - ^ β₀ - ^ β₂ x₂) / ( ^ β₁ + ^ β₃ x₂ ) .

The following statements generate data from the model using known parameter values as shown. TrueX1at50 computes the actual value of X1, under this model, that produces a mean value of Y equal to 50 in each observation. This value changes only as x₂ changes.

     data a; 
        call streaminit(23422);
        scale=10;
        do x1=1 to 10;
         do x2=1 to 5;
          do n=1 to 1000; 
            mean=10 + 5*x1 - 2*x2 + 0.5*x1*x2;  
            std=scale;
            TrueX1at50 = (50-10+2*x2)/(5+0.5*x2); 
            y=rand('normal',mean,scale);
            output; 
          end;
         end;
        end; 
        run;

These statements fit the normal response model with interaction using the GENMOD procedure, though other regression procedures could be used. The EFFECTPLOT statement shows the predicted values from the model as a set of lines for the values of x₂. The OUTPUT statement saves the predicted values in data set OUT. The STORE statement saves the fitted model in an item store named GEN. This will be used the NLEST macro.

     proc genmod data=a;
        model y=x1|x2;
        effectplot slicefit(x=x1 sliceby=x2);
        output out=out p=p;
        store gen;
        run;

The following shows the fitted model parameters. Note that they closely match the true model specified in the DATA step code above. The fitted model is displayed in the plot that follows.

The NLEST macro is called to estimate the nonlinear function of model parameters as shown above. Note that the macro uses names for the model parameters of the form B_P1, B_P2, and so on in the order provided by the modeling procedure. Since the function involves the x2 values, score=a is specified so that the macro estimates the function for each observation in data set A. When score= is specified, the default output data set produced by the macro is named NLEST.

     %nlest(instore=gen, f=(50-b_p1-b_p3*x2)/(b_p2+b_p4*x2), score=a)

These statements display the estimated values of x₁ that yield a response mean of 50 for each value of x₂ along with their standard errors and 95% large-sample confidence intervals. The SORT step simply keeps one observation per x₂ value since all observations with the same x₂ value have the sample estimate.

     proc sort data=nlest out=nlest2 nodupkey;
        by x2;
        run;
     proc print data=nlest2 label;
        id x2;
        var mean std TrueX1at50 pred stderrpred lower upper;
        label TrueX1at50="True X1" pred="Estimated X1";
        title "Estimates of X1 yielding Y=50 for X2=1 to 5";
        title2 "with 95% confidence intervals";
        run;

Note that the x₁ values yielding Y=50 estimated using the generated data closely match the actual values (True X1) under the specified model.

The following statements produce a plot showing the fitted model, the estimated x₁ values and their confidence intervals.

     data nlest2; set nlest2; 
        pos=30+2*_n_; pos2=pos+1;
        predval=" "||put(pred,4.2); 
        run;
     data out2; set out nlest2(keep=pos pred predval x2 pos2 lower upper); run;
     proc sgplot;
        series y=p x=x1 / group=x2;
        dropline y=50 x=pred / dropto=both;
        text x=pred y=pos text=predval / position=right textattrs=(size=10);
        highlow y=pos2 low=lower high=upper / lowcap=serif highcap=serif;
        xaxis min=5 max=8 label="X1";
        yaxis min=30 max=60 label="Y";
        run;

The same estimates can be produced by fitting the model in the NLMIXED procedure. These statements fit the model and use the PREDICT statement to estimate the x1 values yielding Y=50. The ESTIMATE statement cannot be used since the function involves values in the data. The same estimated values, standard errors, and confidence intervals as shown above are computed and saved in data set EST for each observation.

     proc nlmixed data=a df=1e8;
        mean = b0 + b1*x1 + b2*x2 + b3*x1*x2;
        model y ~ normal(mean,s**2);
        predict (50-b0-b2*x2)/(b1+b3*x2) out=est;
        run;

Gamma Response Model

The same process can be used for generalized linear models involving other response distributions. The following DATA step generates response values from the gamma distribution as a function of a single predictor, x. The generating model uses the log link function. The model is

log(μ) = β₀ + β₁x

The goal is to estimate the value of x that produces a mean response of μ=6. Using the above formula, the actual value of x yielding a mean of 6 is (log(6)-1)/0.15 = 5.28.

     data a; 
        call streaminit(23422);
        scale=.5;
        do x=1 to 10;
         do n=1 to 100;
           mean=exp(1 + 0.15*x);
           lambda=mean/scale;
           std=sqrt(mean**2/scale);
           y=rand('gamma',scale,lambda);
           output; 
         end;
        end;
        run;

The GENMOD procedure is again used to fit the model although another procedure could be used. The fitted model is plotted and saved and the predicted values are saved in a data set.

     proc genmod data=a; 
        model y = x / dist=gamma link=log; 
        output out=out p=p; 
        effectplot; 
        store gen; 
        run;

The parameter estimates of the fitted model closely agree with the true parameters specified in the generating model above.

Since there is only one predictor in the model, there is only a single estimate of the value producing the desired response. In this case, you can simply specify the desired function in f= in the NLEST macro. Optionally, you can specify a label for the estimate in label=. If the predictor value for more than one response mean is to be estimated, you can create a data set containing all of the function expressions and then specify that data set in fdata=. You can see examples using fdata= in the NLEST macro documentation (SAS Note 58775). By default, the result from the macro are saved in a data set named EST.

     %nlest(instore=gen, label=X: Y=6, f=(log(6)-b_p1)/b_p2)

The result displayed by the macro shows that the estimate of x that produces a response mean of 6 is 5.59 with 95% large-sample confidence interval (4.97,6.20). The point estimate is slightly off from the true value of 5.28 due to sampling error, but it is well within the confidence interval.

These statements replot the model and add the estimate of x and its confidence interval.

     data out2; set out est; 
        pos=4; pos2=pos+1; 
        predval=" "||put(estimate,4.2); 
        run;
     proc sgplot noautolegend;
        series y=p x=x;
        dropline y=6 x=estimate / dropto=both;
        text x=estimate y=pos text=predval / position=right textattrs=(size=10);
        highlow y=pos2 low=lower high=upper / lowcap=serif highcap=serif;
        xaxis label="X";
        yaxis label="Y";
        title "Estimates of X yielding Y=6. True X=5.28";
        title2 "with 95% confidence interval";
        run;

Again, the same estimates can be produced by fitting the model in the NLMIXED procedure. Since the function estimate does not involve any data values from other predictors, the ESTIMATE statement can be used. The following statements produce the same estimated values, standard errors and confidence intervals as shown above.

     proc nlmixed data=a df=1e8;
        mean=exp(b0+b1*x);
        lambda=mean/s;   
        model y ~ gamma(s,lambda);
        estimate 'x:y=6' (log(6)-b0)/b1;
        run;

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	N/A	Netezza TwinFin 64-bit SMP Hosts
		Teradata on Linux
		Solaris
		Solaris for x64
		OpenVMS on HP Integrity
		OpenVMS Alpha
		Cloud Foundry
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for AArch64
		Linux for x64
		Linux on Itanium
		Netezza TwinFin 64-bit S-Blades
		Netezza TwinFin 32bit blade
		Netezza TwinFin 32-bit SMP Hosts
		DB2 Universal Database on Linux x64
		DB2 Universal Database on AIX
		Aster Data nCluster on Linux x64
		Tru64 UNIX
		z/OS
		z/OS 64-bit
		IBM AS/400
		OpenVMS VAX
		N/A
		Android Operating System
		Apple Mobile Operating System
		Chrome Web Browser
		Macintosh
		Macintosh on x64
		Microsoft Windows 10
		Microsoft Windows 7
		Microsoft Windows 8 Enterprise 32-bit
		Microsoft Windows 8 Enterprise x64
		Microsoft Windows 8 Pro 32-bit
		Microsoft Windows 8 Pro x64
		Microsoft Windows 8 x64
		Microsoft Windows Server 2008 R2
		Microsoft Windows Server 2012 R2 Datacenter
		Microsoft Windows Server 2012 R2 Std
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		SAS Cloud
		Microsoft Windows 8.1 Enterprise 32-bit
		Microsoft Windows 8.1 Enterprise x64
		Microsoft Windows 8.1 Pro 32-bit
		Microsoft Windows 8.1 Pro x64
		Microsoft Windows 11
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2003 for x64
Microsoft Windows Server 2008
Microsoft Windows Server 2008 for x64
Microsoft Windows Server 2012 Datacenter
Microsoft Windows Server 2012 Std
Microsoft Windows Server 2016
Microsoft Windows Server 2019
Microsoft Windows Server 2022
Microsoft Windows XP Professional
Windows 7 Enterprise 32 bit
Windows 7 Enterprise x64
Windows 7 Home Premium 32 bit
Windows 7 Home Premium x64
Windows 7 Professional 32 bit
Windows 7 Professional x64
Windows 7 Ultimate 32 bit
Windows 7 Ultimate x64
Windows Millennium Edition (Me)
Windows Vista
Windows Vista for x64

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:
Topic:	Analytics ==> Regression SAS Reference ==> Macro SAS Reference ==> Procedures ==> NLMIXED SAS Reference ==> Procedures ==> GENMOD

Date Modified:	2023-02-17 15:55:56
Date Created:	2023-02-16 11:28:24

Support

Usage Note 69873: Estimate the predictor value yielding a specified response value

Normal Response Model with Interaction

Gamma Response Model

Operating System and Release Information