Introduction to Structural Equation Modeling with Latent Variables

Some Measurement Models

Psychometric test theory involves many kinds of models relating scores on psychological and educational tests to latent variables representing intelligence or various underlying abilities. The following example uses data on four vocabulary tests from Lord (1957). Tests $\text{[math]}$ and $\text{[math]}$ have 15 items each and are administered with very liberal time limits. Tests $\text{[math]}$ and $\text{[math]}$ have 75 items and are administered under time pressure. The covariance matrix is read by the following DATA step:

   data lord(type=cov);
      input _type_ $ _name_ $ w x y z;
      datalines;
   n   . 649       .       .       .
   cov w 86.3979   .       .       .
   cov x 57.7751 86.2632   .       .
   cov y 56.8651 59.3177 97.2850   .
   cov z 58.8986 59.6683 73.8201 97.8192
   ;

The psychometric model of interest states that $\text{[math]}$ and $\text{[math]}$ are determined by a single common factor $\text{[math]}$ , and $\text{[math]}$ and $\text{[math]}$ are determined by a single common factor $\text{[math]}$ . The two common factors are expected to have a positive correlation, and it is desired to estimate this correlation. It is convenient to assume that the common factors have unit variance, so their correlation will be equal to their covariance. The error terms for all the manifest variables are assumed to be uncorrelated with each other and with the common factors. The model (labeled here as model form D) is as follows.

Model Form D

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

with the following assumptions:

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

The corresponding path diagram is shown in Figure 17.10.

Figure 17.10 Path Diagram: Lord

With the following rules, the conversion from the path diagram to the PATH model specification is very straightforward:

Each single-headed arrow in the path diagram is specified in the PATH statement.
Each double-headed arrow that points to a single variable is specified in the PVAR statement.
Each double-headed arrow that points to two distinct variables is specified in the PCOV statement.

Hence, this path diagram can be converted easily to a PATH model as follows:

   title 'H4: Unconstrained';
   proc tcalis data=lord outmodel=model4;
      path   
         w <- fwx    betaw,
         x <- fwx    betax,
         y <- fyz    betay,
         z <- fyz    betaz;
      pvar 
         fwx fyz = 2 * 1.0,
         w x y z = vew vex vey vez;
      pcov
         fwx fyz = rho;
   run;

The major results are displayed in Figure 17.11 and Figure 17.12.

Figure 17.11 Lord Data: Fit Summary, Hypothesis H4

Fit Summary
Modeling Info	N Observations	649
	N Variables	4
	N Moments	10
	N Parameters	9
	N Active Constraints	0
	Independence Model Chi-Square	1466.5524
	Independence Model Chi-Square DF	6
Absolute Index	Fit Function	0.0011
	Chi-Square	0.7030
	Chi-Square DF	1
	Pr > Chi-Square	0.4018
	Z-Test of Wilson & Hilferty	0.2363
	Hoelter Critical N	3543
	Root Mean Square Residual (RMSR)	0.2720
	Standardized RMSR (SRMSR)	0.0030
	Goodness of Fit Index (GFI)	0.9995
Parsimony Index	Adjusted GFI (AGFI)	0.9946
	Parsimonious GFI	0.1666
	RMSEA Estimate	0.0000
	RMSEA Lower 90% Confidence Limit	.
	RMSEA Upper 90% Confidence Limit	0.0974
	Probability of Close Fit	0.6854
	ECVI Estimate	0.0291
	ECVI Lower 90% Confidence Limit	.
	ECVI Upper 90% Confidence Limit	0.0391
	Akaike Information Criterion	-1.2970
	Bozdogan CAIC	-6.7725
	Schwarz Bayesian Criterion	-5.7725
	McDonald Centrality	1.0002
Incremental Index	Bentler Comparative Fit Index	1.0000
	Bentler-Bonett NFI	0.9995
	Bentler-Bonett Non-normed Index	1.0012
	Bollen Normed Index Rho1	0.9971
	Bollen Non-normed Index Delta2	1.0002
	James et al. Parsimonious NFI	0.1666

Figure 17.12 Lord Data: Estimation Results, Hypothesis H4

PATH List
Path			Parameter	Estimate	Standard Error	t Value
w	<-	fwx	betaw	7.50066	0.32339	23.19390
x	<-	fwx	betax	7.70266	0.32063	24.02354
y	<-	fyz	betay	8.50947	0.32694	26.02730
z	<-	fyz	betaz	8.67505	0.32560	26.64301

Variance Parameters
Variance Type	Variable	Parameter	Estimate	Standard Error	t Value
Exogenous	fwx		1.00000
	fyz		1.00000
Error	w	vew	30.13796	2.47037	12.19979
	x	vex	26.93217	2.43065	11.08021
	y	vey	24.87396	2.35986	10.54044
	z	vez	22.56264	2.35028	9.60000

Covariances Among Exogenous Variables
Var1	Var2	Parameter	Estimate	Standard Error	t Value
fwx	fyz	rho	0.89855	0.01865	48.17998

It is convenient to create the OUTMODEL= data set called model4 for use in fitting other models with additional constraints. The same analysis can be performed with the LINEQS statement, as specified in the following:

   title 'H4: Unconstrained; LINEQS Specification';
   proc tcalis data=lord;
      lineqs w = betaw fwx + ew,
             x = betax fwx + ex,
             y = betay fyz + ey,
             z = betaz fyz + ez;
      std fwx fyz = 2 * 1.,
          ew ex ey ez = vew vex vey vez;
      cov fwx fyz = rho;
   run;

Unlike the PATH model specification, in the LINEQS specification you need to specify the error terms explicitly in the LINEQS statement. In the STD statement, you would need to specify the variance parameters for the exogenous variables, including both of the factors and the error terms. However, using the PATH model specification, no explicit names for error or disturbance terms are needed. As a result, the exogenous variance and error variance parameters are both specified in the PVAR statement. This treatment generalizes to the following useful rule about the PATH model specification:

Each variable in the PATH model specification or path diagram should have a variance or partial variance parameter specified in the PVAR statement—as either an exogenous variance or a partial variance due to error.

The main results from the LINEQS model specification are displayed in Figure 17.13.

Figure 17.13 Lord Data: Using LINEQS Statement for RAM Model, Hypothesis H4

Linear Equations
w	=	7.5007	*	fwx	+	1.0000	ew
Std Err		0.3234		betaw
t Value		23.1939
x	=	7.7027	*	fwx	+	1.0000	ex
Std Err		0.3206		betax
t Value		24.0235
y	=	8.5095	*	fyz	+	1.0000	ey
Std Err		0.3269		betay
t Value		26.0273
z	=	8.6751	*	fyz	+	1.0000	ez
Std Err		0.3256		betaz
t Value		26.6430

Estimates for Variances of Exogenous Variables
Variable Type	Variable	Parameter	Estimate	Standard Error	t Value
Latent	fwx		1.00000
	fyz		1.00000
Error	ew	vew	30.13796	2.47037	12.19979
	ex	vex	26.93217	2.43065	11.08021
	ey	vey	24.87396	2.35986	10.54044
	ez	vez	22.56264	2.35028	9.60000

Covariances Among Exogenous Variables
Var1	Var2	Parameter	Estimate	Standard Error	t Value
fwx	fyz	rho	0.89855	0.01865	48.17998

Aside from the output format, all estimates in the LINEQS model results in Figure 17.13 match those of the PATH model results in Figure 17.12. In some situations, the PATH and LINEQS statements might yield slightly different results due to the inexactness of the numerical optimization; the discrepancies can be reduced by specifying a more stringent convergence criterion such as GCONV=1E–4 or GCONV=1E–6.

Subsequent analyses are illustrated with the PATH statement rather than the LINEQS statement because it is easier to translate the path diagram to the PATH model specification.

In an analysis of these data by Jöreskog and Sörbom (1979, pp. 54–56; Loehlin 1987, pp. 84–87), four hypotheses are considered:

$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$	$\text{[math]}$

The hypothesis $\text{[math]}$ says that there is really just one common factor instead of two; in the terminology of test theory, $\text{[math]}$ , $\text{[math]}$ , $\text{[math]}$ , and $\text{[math]}$ are said to be congeneric. The hypothesis $\text{[math]}$ says that $\text{[math]}$ and $\text{[math]}$ have the same true scores and have equal error variance; such tests are said to be parallel. The hypothesis $\text{[math]}$ also requires $\text{[math]}$ and $\text{[math]}$ to be parallel. The hypothesis $\text{[math]}$ says that $\text{[math]}$ and $\text{[math]}$ are parallel tests, $\text{[math]}$ and $\text{[math]}$ are parallel tests, and all four tests are congeneric.

It is most convenient to fit the models in the opposite order from that in which they are numbered. The previous analysis fit the model for $\text{[math]}$ and created an OUTMODEL= data set called model4. The hypothesis $\text{[math]}$ can be fitted directly or by modifying the model4 data set. Since $\text{[math]}$ differs from $\text{[math]}$ only in that $\text{[math]}$ is constrained to equal 1, the model4 data set can be modified by finding the observation for which _NAME_=’rho’ and changing the variable _NAME_ to a blank value (meaning that the observation represents a constant rather than a parameter to be fitted) and by setting the variable _ESTIM_ to the value $\text{[math]}$ . The following statements create a new model stored in the model3 data set that is modified from the model4 data set:

   data model3(type=calismdl);
      set model4;
      if _name_='rho' then
         do;
            _name_=' ';
            _estim_=1;
         end;
   run;

In other words, the model information stored in data set model3 is specified exactly as hypothesis $\text{[math]}$ requires. This data set is then read as an INMODEL= data set for the following PROC TCALIS run:

   title 'H3: W, X, Y, and Z are congeneric';
   proc tcalis data=lord inmodel=model3;
   run;

Another way to specify the model under hypothesis $\text{[math]}$ is to specify the entire PATH model anew, such as in the following statements:

   title 'H3: W, X, Y, and Z are congeneric';
   proc tcalis data=lord;
      path  w <- f  betaw,
            x <- f  betax,
            y <- f  betay,
            z <- f  betaz;
      pvar 
         f = 1,
         w x y z = vew vex vey vez;
   run;

This would produce essentially the same results as those of the analysis based on the model stored in the data set model3. The main results from the analysis with the INMODEL=MODEL3 data set are displayed in Figure 17.14.

Figure 17.14 Lord Data: Major Results for Hypothesis H3

Fit Summary
Modeling Info	N Observations	649
	N Variables	4
	N Moments	10
	N Parameters	8
	N Active Constraints	0
	Independence Model Chi-Square	1466.5524
	Independence Model Chi-Square DF	6
Absolute Index	Fit Function	0.0559
	Chi-Square	36.2095
	Chi-Square DF	2
	Pr > Chi-Square	0.0000
	Z-Test of Wilson & Hilferty	5.2108
	Hoelter Critical N	109
	Root Mean Square Residual (RMSR)	2.4636
	Standardized RMSR (SRMSR)	0.0277
	Goodness of Fit Index (GFI)	0.9714
Parsimony Index	Adjusted GFI (AGFI)	0.8570
	Parsimonious GFI	0.3238
	RMSEA Estimate	0.1625
	RMSEA Lower 90% Confidence Limit	0.1187
	RMSEA Upper 90% Confidence Limit	0.2108
	Probability of Close Fit	0.0000
	ECVI Estimate	0.0808
	ECVI Lower 90% Confidence Limit	0.0561
	ECVI Upper 90% Confidence Limit	0.1170
	Akaike Information Criterion	32.2095
	Bozdogan CAIC	21.2586
	Schwarz Bayesian Criterion	23.2586
	McDonald Centrality	0.9740
Incremental Index	Bentler Comparative Fit Index	0.9766
	Bentler-Bonett NFI	0.9753
	Bentler-Bonett Non-normed Index	0.9297
	Bollen Normed Index Rho1	0.9259
	Bollen Non-normed Index Delta2	0.9766
	James et al. Parsimonious NFI	0.3251

PATH List
Path			Parameter	Estimate	Standard Error	t Value
w	<-	fwx	betaw	7.10472	0.32177	22.08019
x	<-	fwx	betax	7.26906	0.31826	22.83965
y	<-	fyz	betay	8.37348	0.32542	25.73160
z	<-	fyz	betaz	8.51057	0.32409	26.25985

Variance Parameters
Variance Type	Variable	Parameter	Estimate	Standard Error	t Value
Exogenous	fwx		1.00000
	fyz		1.00000
Error	w	vew	35.92087	2.41466	14.87615
	x	vex	33.42397	2.31038	14.46688
	y	vey	27.16980	2.24619	12.09595
	z	vez	25.38948	2.20839	11.49684

Covariances Among Exogenous Variables
Var1	Var2	Estimate	Standard Error	t Value
fwx	fyz	1.00000

The hypothesis $\text{[math]}$ requires that several pairs of parameters be constrained to have equal estimates. With PROC TCALIS, you can impose this constraint by giving the same name to parameters that are constrained to be equal. This can be done directly in the PATH and PVAR statements or by using the DATA step to change the values in the model4 data set.

First, you can specify the model directly under the hypothesis $\text{[math]}$ ; the following PATH model is specified:

   title 'H2: W and X parallel, Y and Z parallel';
   proc tcalis data=lord;
      path
         w <- fwx   betawx,
         x <- fwx   betawx,
         y <- fyz   betayz,
         z <- fyz   betayz;
      pvar 
         fwx fyz = 2 * 1.0,
         w x y z = vewx vewx veyz veyz;
      pcov
         fwx fyz = rho;
   run;

Alternatively, if you use the DATA step to modify from the model4 data set, you would specify a new data set called model2 for storing the model information under the hypothesis $\text{[math]}$ , as shown in the following statements:

   data model2(type=calismdl);
      set model4;
      if _name_='betaw' then _name_='betawx';
      if _name_='betax' then _name_='betawx';
      if _name_='betay' then _name_='betayz';
      if _name_='betaz' then _name_='betayz';
      if _name_='vew' then _name_='vewx';
      if _name_='vex' then _name_='vewx';
      if _name_='vey' then _name_='veyz';
      if _name_='vez' then _name_='veyz';
   run;

Then you would use model2 as the INMODEL= data set in the following PROC TCALIS run:

   title 'H2: W and X parallel, Y and Z parallel';
   proc tcalis data=lord inmodel=model2;
   run;

The main results from either of these analyses are displayed in Figure 17.15.

Figure 17.15 Lord Data: Major Results for Hypothesis H2

Fit Summary
Modeling Info	N Observations	649
	N Variables	4
	N Moments	10
	N Parameters	5
	N Active Constraints	0
	Independence Model Chi-Square	1466.5524
	Independence Model Chi-Square DF	6
Absolute Index	Fit Function	0.0030
	Chi-Square	1.9335
	Chi-Square DF	5
	Pr > Chi-Square	0.8583
	Z-Test of Wilson & Hilferty	-1.0768
	Hoelter Critical N	3712
	Root Mean Square Residual (RMSR)	0.6983
	Standardized RMSR (SRMSR)	0.0076
	Goodness of Fit Index (GFI)	0.9985
Parsimony Index	Adjusted GFI (AGFI)	0.9970
	Parsimonious GFI	0.8321
	RMSEA Estimate	0.0000
	RMSEA Lower 90% Confidence Limit	.
	RMSEA Upper 90% Confidence Limit	0.0293
	Probability of Close Fit	0.9936
	ECVI Estimate	0.0185
	ECVI Lower 90% Confidence Limit	.
	ECVI Upper 90% Confidence Limit	0.0276
	Akaike Information Criterion	-8.0665
	Bozdogan CAIC	-35.4436
	Schwarz Bayesian Criterion	-30.4436
	McDonald Centrality	1.0024
Incremental Index	Bentler Comparative Fit Index	1.0000
	Bentler-Bonett NFI	0.9987
	Bentler-Bonett Non-normed Index	1.0025
	Bollen Normed Index Rho1	0.9984
	Bollen Non-normed Index Delta2	1.0021
	James et al. Parsimonious NFI	0.8322

PATH List
Path			Parameter	Estimate	Standard Error	t Value
w	<-	fwx	betawx	7.60099	0.26844	28.31580
x	<-	fwx	betawx	7.60099	0.26844	28.31580
y	<-	fyz	betayz	8.59186	0.27967	30.72146
z	<-	fyz	betayz	8.59186	0.27967	30.72146

Variance Parameters
Variance Type	Variable	Parameter	Estimate	Standard Error	t Value
Exogenous	fwx		1.00000
	fyz		1.00000
Error	w	vewx	28.55545	1.58641	18.00000
	x	vewx	28.55545	1.58641	18.00000
	y	veyz	23.73200	1.31844	18.00000
	z	veyz	23.73200	1.31844	18.00000

Covariances Among Exogenous Variables
Var1	Var2	Parameter	Estimate	Standard Error	t Value
fwx	fyz	rho	0.89864	0.01865	48.18011

The hypothesis $\text{[math]}$ requires one more constraint in addition to those in $\text{[math]}$ . Again, there are two ways to do this. First, a direct model specification is shown in the following statements:

   title 'H1: W and X parallel, Y and Z parallel, all congeneric';
   proc tcalis data=lord;
      path
         w <- f   betawx,
         x <- f   betawx,
         y <- f   betayz,
         z <- f   betayz;
      pvar 
         f       = 1.0,
         w x y z = vewx vewx veyz veyz;
   run;

Alternatively, you can modify the model2 data set to create a new data set model2 that stores the model information required by the hypothesis $\text{[math]}$ , as shown in the following statements:

   data model1(type=calismdl);
      set model2;
      if _name_='rho' then
         do;
            _name_=' ';
            _estim_=1;
         end;
   run;

You can then pass the model information stored in model1 as an INMODEL= data set in the following PROC TCALIS run:

   title 'H1: W and X parallel, Y and Z parallel, all congeneric';
   proc tcalis data=lord inmodel=model1;
   run;

The main results from either of these analyses are displayed in Figure 17.16.

Figure 17.16 Lord Data: Major Results for Hypothesis H1

Fit Summary
Modeling Info	N Observations	649
	N Variables	4
	N Moments	10
	N Parameters	4
	N Active Constraints	0
	Independence Model Chi-Square	1466.5524
	Independence Model Chi-Square DF	6
Absolute Index	Fit Function	0.0576
	Chi-Square	37.3337
	Chi-Square DF	6
	Pr > Chi-Square	0.0000
	Z-Test of Wilson & Hilferty	4.5535
	Hoelter Critical N	220
	Root Mean Square Residual (RMSR)	2.5430
	Standardized RMSR (SRMSR)	0.0286
	Goodness of Fit Index (GFI)	0.9705
Parsimony Index	Adjusted GFI (AGFI)	0.9509
	Parsimonious GFI	0.9705
	RMSEA Estimate	0.0898
	RMSEA Lower 90% Confidence Limit	0.0635
	RMSEA Upper 90% Confidence Limit	0.1184
	Probability of Close Fit	0.0076
	ECVI Estimate	0.0701
	ECVI Lower 90% Confidence Limit	0.0458
	ECVI Upper 90% Confidence Limit	0.1059
	Akaike Information Criterion	25.3337
	Bozdogan CAIC	-7.5189
	Schwarz Bayesian Criterion	-1.5189
	McDonald Centrality	0.9761
Incremental Index	Bentler Comparative Fit Index	0.9785
	Bentler-Bonett NFI	0.9745
	Bentler-Bonett Non-normed Index	0.9785
	Bollen Normed Index Rho1	0.9745
	Bollen Non-normed Index Delta2	0.9785
	James et al. Parsimonious NFI	0.9745

PATH List
Path			Parameter	Estimate	Standard Error	t Value
w	<-	fwx	betawx	7.18622	0.26598	27.01798
x	<-	fwx	betawx	7.18622	0.26598	27.01798
y	<-	fyz	betayz	8.44198	0.28000	30.14946
z	<-	fyz	betayz	8.44198	0.28000	30.14946

Variance Parameters
Variance Type	Variable	Parameter	Estimate	Standard Error	t Value
Exogenous	fwx		1.00000
	fyz		1.00000
Error	w	vewx	34.68878	1.64635	21.07013
	x	vewx	34.68878	1.64635	21.07013
	y	veyz	26.28501	1.39954	18.78118
	z	veyz	26.28501	1.39954	18.78118

Covariances Among Exogenous Variables
Var1	Var2	Estimate	Standard Error	t Value
fwx	fyz	1.00000

The goodness-of-fit tests for the four hypotheses are summarized in the following table.

	Number of		Degrees of
Hypothesis	Parameters	$\text{[math]}$	Freedom	p-value	$\text{[math]}$
$\text{[math]}$	4	37.33	6	0.0000	1.0
$\text{[math]}$	5	1.93	5	0.8583	0.8986
$\text{[math]}$	8	36.21	2	0.0000	1.0
$\text{[math]}$	9	0.70	1	0.4018	0.8986

The hypotheses $\text{[math]}$ and $\text{[math]}$ , which posit $\text{[math]}$ , can be rejected. Hypotheses $\text{[math]}$ and $\text{[math]}$ seem to be consistent with the available data. Since $\text{[math]}$ is obtained by adding four constraints to $\text{[math]}$ , you can test $\text{[math]}$ versus $\text{[math]}$ by computing the differences of the chi-square statistics and their degrees of freedom, yielding a chi-square of $\text{[math]}$ with $\text{[math]}$ degrees of freedom, which is obviously not significant. So hypothesis $\text{[math]}$ is consistent with the available data.

The estimates of $\text{[math]}$ for $\text{[math]}$ and $\text{[math]}$ are almost identical, about 0.90, indicating that the speeded and unspeeded tests are measuring almost the same latent variable, even though the hypotheses that stated they measured exactly the same latent variable are rejected.

Top of Page