In this example, a series of people are asked whether or not they would subscribe to a new newspaper. For each person, the
variables sex
(Female, Male), age
, and subs
(1=yes,0=no) are recorded. The PROBIT procedure is used to fit a logistic regression model to the probability of a positive
response (subscribing) as a function of the variables sex
and age
. Specifically, the probability of subscribing is modeled as
where F is the cumulative logistic distribution function.
By default, the PROBIT procedure models the probability of the lower response level for binary data. One way to model is to format the response variable so that the formatted value corresponding to is the lower level. The following statements format the values of subs
as 1 = ’accept’ and 0 = ’reject’, so that PROBIT models . They produce Output 75.3.1.
data news; input sex $ age subs @@; datalines; Female 35 0 Male 44 0 Male 45 1 Female 47 1 Female 51 0 Female 47 0 Male 54 1 Male 47 1 Female 35 0 Female 34 0 Female 48 0 Female 56 1 Male 46 1 Female 59 1 Female 46 1 Male 59 1 Male 38 1 Female 39 0 Male 49 1 Male 42 1 Male 50 1 Female 45 0 Female 47 0 Female 30 1 Female 39 0 Female 51 0 Female 45 0 Female 43 1 Male 39 1 Male 31 0 Female 39 0 Male 34 0 Female 52 1 Female 46 0 Male 58 1 Female 50 1 Female 32 0 Female 52 1 Female 35 0 Female 51 0 ;
proc format; value subscrib 1 = 'accept' 0 = 'reject'; run;
proc probit data=news; class subs sex; model subs=sex age / d=logistic itprint; format subs subscrib.; run;
Output 75.3.1: Logistic Regression of Subscription Status
Iteration History for Parameter Estimates | |||||
---|---|---|---|---|---|
Iter | Ridge | Loglikelihood | Intercept | sexFemale | age |
0 | 0 | -27.725887 | 0 | 0 | 0 |
1 | 0 | -20.142659 | -3.634567629 | -1.648455751 | 0.1051634384 |
2 | 0 | -19.52245 | -5.254865196 | -2.234724956 | 0.1506493473 |
3 | 0 | -19.490439 | -5.728485385 | -2.409827238 | 0.1639621828 |
4 | 0 | -19.490303 | -5.76187293 | -2.422349862 | 0.1649007124 |
5 | 0 | -19.490303 | -5.7620267 | -2.422407743 | 0.1649050312 |
6 | 0 | -19.490303 | -5.7620267 | -2.422407743 | 0.1649050312 |
Model Information | |
---|---|
Data Set | WORK.NEWS |
Dependent Variable | subs |
Number of Observations | 40 |
Name of Distribution | Logistic |
Log Likelihood | -19.49030281 |
Class Level Information | ||
---|---|---|
Name | Levels | Values |
subs | 2 | accept reject |
sex | 2 | Female Male |
Last Evaluation of the Negative of the Gradient | ||
---|---|---|
Intercept | sexFemale | age |
-5.95557E-12 | 8.768324E-10 | -1.6367E-8 |
Last Evaluation of the Negative of the Hessian | |||
---|---|---|---|
Intercept | sexFemale | age | |
Intercept | 6.4597397447 | 4.6042218284 | 292.04051848 |
sexFemale | 4.6042218284 | 4.6042218284 | 216.20829515 |
age | 292.04051848 | 216.20829515 | 13487.329973 |
Analysis of Maximum Likelihood Parameter Estimates | ||||||||
---|---|---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error | 95% Confidence Limits | Chi-Square | Pr > ChiSq | ||
Intercept | 1 | -5.7620 | 2.7635 | -11.1783 | -0.3458 | 4.35 | 0.0371 | |
sex | Female | 1 | -2.4224 | 0.9559 | -4.2959 | -0.5489 | 6.42 | 0.0113 |
sex | Male | 0 | 0.0000 | . | . | . | . | . |
age | 1 | 0.1649 | 0.0652 | 0.0371 | 0.2927 | 6.40 | 0.0114 |
Output 75.3.1 shows that there appears to be an effect due to both the variables sex
and age
. The positive coefficient for age
indicates that older people are more likely to subscribe than younger people. The negative coefficient for sex
indicates that females are less likely to subscribe than males.