The SURVEYLOGISTIC Procedure

Getting Started: SURVEYLOGISTIC Procedure

The SURVEYLOGISTIC procedure is similar to the LOGISTIC procedure and other regression procedures in the SAS System. See Chapter 60: The LOGISTIC Procedure, for general information about how to perform logistic regression by using SAS. PROC SURVEYLOGISTIC is designed to handle sample survey data, and thus it incorporates the sample design information into the analysis.

The following example illustrates how to use PROC SURVEYLOGISTIC to perform logistic regression for sample survey data.

In the customer satisfaction survey example in the section Getting Started: SURVEYSELECT Procedure in Chapter 102: The SURVEYSELECT Procedure, an Internet service provider conducts a customer satisfaction survey. The survey population consists of the company’s current subscribers from four states: Alabama (AL), Florida (FL), Georgia (GA), and South Carolina (SC). The company plans to select a sample of customers from this population, interview the selected customers and ask their opinions on customer service, and then make inferences about the entire population of subscribers from the sample data. A stratified sample is selected by using the probability proportional to size (PPS) method. The sample design divides the customers into strata depending on their types ('Old' or 'New') and their states (AL, FL, GA, SC). There are eight strata in all. Within each stratum, customers are selected and interviewed by using the PPS with replacement method, where the size variable is Usage. The stratified PPS sample contains 192 customers. The data are stored in the SAS data set SampleStrata. Figure 98.1 displays the first 10 observations of this data set.

Figure 98.1: Stratified PPS Sample (First 10 Observations)

Customer Satisfaction Survey
Stratified PPS Sampling
(First 10 Observations)

Obs State Type CustomerID Rating Usage SamplingWeight
1 AL New 24394278 Neutral 13.17 26.358
2 AL New 64798692 Extremely Unsatisfied 15.53 22.352
3 AL New 75375074 Unsatisfied 99.11 3.501
4 AL New 262831809 Neutral 5.40 64.228
5 AL New 294428658 Extremely Satisfied 1.17 297.488
6 AL New 336222949 Unsatisfied 38.69 8.970
7 AL New 351929023 Extremely Satisfied 2.72 127.475
8 AL New 366142640 Satisfied 2.61 132.958
9 AL New 371478614 Neutral 14.36 24.173
10 AL New 477172230 Neutral 4.06 85.489



In the SAS data set SampleStrata, the variable CustomerID uniquely identifies each customer. The variable State contains the state of the customer’s address. The variable Type equals 'Old' if the customer has subscribed to the service for more than one year; otherwise, the variable Type equals 'New'. The variable Usage contains the customer’s average monthly service usage, in hours. The variable Rating contains the customer’s responses to the survey. The sample design uses an unequal probability sampling method, with the sampling weights stored in the variable SamplingWeight.

The following SAS statements fit a cumulative logistic model between the satisfaction levels and the Internet usage by using the stratified PPS sample:

title 'Customer Satisfaction Survey';
proc surveylogistic data=SampleStrata;
   strata state type/list;
   model Rating (order=internal) = Usage;
   weight SamplingWeight;
run;

The PROC SURVEYLOGISTIC statement invokes the SURVEYLOGISTIC procedure. The STRATA statement specifies the stratification variables State and Type that are used in the sample design. The LIST option requests a summary of the stratification. In the MODEL statement, Rating is the response variable and Usage is the explanatory variable. The ORDER=internal is used for the response variable Rating to ask the procedure to order the response levels by using the internal numerical value (1–5) instead of the formatted character value. The WEIGHT statement specifies the variable SamplingWeight that contains the sampling weights.

The results of this analysis are shown in the following figures.

Figure 98.2: Stratified PPS Sample, Model Information

Customer Satisfaction Survey

The SURVEYLOGISTIC Procedure

Model Information
Data Set WORK.SAMPLESTRATA  
Response Variable Rating  
Number of Response Levels 5  
Stratum Variables State  
  Type  
Number of Strata 8  
Weight Variable SamplingWeight Sampling Weight
Model Cumulative Logit  
Optimization Technique Fisher's Scoring  
Variance Adjustment Degrees of Freedom (DF)  



PROC SURVEYLOGISTIC first lists the following model fitting information and sample design information in Figure 98.2:

  • The link function is the logit of the cumulative of the lower response categories.

  • The Fisher scoring optimization technique is used to obtain the maximum likelihood estimates for the regression coefficients.

  • The response variable is Rating, which has five response levels.

  • The stratification variables are State and Type.

  • There are eight strata in the sample.

  • The weight variable is SamplingWeight.

  • The variance adjustment method used for the regression coefficients is the default degrees of freedom adjustment.

Figure 98.3 lists the number of observations in the data set and the number of observations used in the analysis. Since there is no missing value in this example, observations in the entire data set are used in the analysis. The sums of weights are also reported in this table.

Figure 98.3: Stratified PPS Sample, Number of Observations

Number of Observations Read 192
Number of Observations Used 192
Sum of Weights Read 11326.25
Sum of Weights Used 11326.25



The "Response Profile" table in Figure 98.4 lists the five response levels, their ordered values, and their total frequencies and total weights for each category. Due to the ORDER=INTERNAL option for the response variable Rating, the category "Extremely Unsatisfied" has the Ordered Value 1, the category "Unsatisfied" has the Ordered Value 2, and so on.

Figure 98.4: Stratified PPS Sample, Response Profile

Response Profile
Ordered
Value
Rating Total
Frequency
Total
Weight
1 Extremely Unsatisfied 58 2368.8598
2 Unsatisfied 47 1606.9657
3 Neutral 44 2594.3564
4 Satisfied 35 1898.5839
5 Extremely Satisfied 8 2857.4848

Probabilities modeled are cumulated over the lower Ordered Values.




Figure 98.5 displays the output of the stratification summary. There are a total of eight strata, and each stratum is defined by the customer types within each state. The table also shows the number of customers within each stratum.

Figure 98.5: Stratified PPS Sample, Stratification Summary

Stratum Information
Stratum
Index
State Type N Obs
1 AL New 24
2   Old 23
3 FL New 25
4   Old 22
5 GA New 25
6   Old 24
7 SC New 24
8   Old 25



Figure 98.6 shows the chi-square test for testing the proportional odds assumption. The test is highly significant, which indicates that the cumulative logit model might not adequately fit the data.

Figure 98.6: Stratified PPS Sample, Testing the Proportional Odds Assumption

Score Test for the Proportional
Odds Assumption
Chi-Square DF Pr > ChiSq
617.8597 3 <.0001



Figure 98.7 shows the iteration algorithm converged to obtain the MLE for this example. The "Model Fit Statistics" table contains the Akaike information criterion (AIC), the Schwarz criterion (SC), and the negative of twice the log likelihood ($-2\log L$) for the intercept-only model and the fitted model. AIC and SC can be used to compare different models, and the ones with smaller values are preferred.

Figure 98.7: Stratified PPS Sample, Model Fitting Information

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics
Criterion Intercept Only Intercept and
Covariates
AIC 35996.656 35312.584
SC 36009.686 35328.872
-2 Log L 35988.656 35302.584



The table "Testing Global Null Hypothesis: BETA=0" in Figure 98.8 shows the likelihood ratio test, the efficient score test, and the Wald test for testing the significance of the explanatory variable (Usage). All tests are significant.

Figure 98.8: Stratified PPS Sample

Testing Global Null Hypothesis: BETA=0
Test F Value Num DF Den DF Pr > F
Likelihood Ratio 686.07 1 Infty <.0001
Score 123.54 1 184 <.0001
Wald 3.89 1 184 0.0500



Figure 98.9 shows the parameter estimates of the logistic regression and their standard errors.

Figure 98.9: Stratified PPS Sample, Parameter Estimates

Analysis of Maximum Likelihood Estimates
Parameter   Estimate Standard
Error
t Value Pr > |t|
Intercept Extremely Unsatisfied -1.6784 0.3874 -4.33 <.0001
Intercept Unsatisfied -0.9356 0.3645 -2.57 0.0111
Intercept Neutral 0.0438 0.4177 0.10 0.9166
Intercept Satisfied 0.8440 0.5699 1.48 0.1403
Usage   0.0350 0.0175 1.99 0.0475
NOTE: The degrees of freedom for the t tests is 184.



Figure 98.10 displays the odds ratio estimate and its confidence intervals.

Figure 98.10: Stratified PPS Sample, Odds Ratios

Odds Ratio Estimates
Effect Point Estimate 95% Confidence Limits
Usage 1.036 1.000 1.072
NOTE: The degrees of freedom in computing
the confidence limits is 184.