Two endemic problems face researchers in the social sciences (e.g., Marketing, Economics, Psychology, and Finance): unobserved heterogeneity and measurement error in data. Structural equation modeling is a powerful tool for dealing with these difficulties using a simultaneous equation framework with unobserved constructs and manifest indicators which are error-prone. When estimating structural equation models, however, researchers frequently treat the data as if they were collected from a single population (Muthén [Muthén, Bengt O. 1989. Latent variable modeling in heterogeneous populations. Psychometrika 54 557–585.]). This assumption of homogeneity is often unrealistic. For example, in multidimensional expectancy value models, consumers from different market segments can have different belief structures (Bagozzi [Bagozzi, Richard P. 1982. A field investigation of causal relations among cognitions, affect, intentions, and behavior. J. Marketing Res. 19 562–584.]). Research in satisfaction suggests that consumer decision processes vary across segments (Day [Day, Ralph L. 1977. Extending the concept of consumer satisfaction. W. D. Perreault, ed. Advances in Consumer Research, Vol. 4. Association for Consumer Research, Atlanta, 149–154.]).
This paper shows that aggregate analysis which ignores heterogeneity in structural equation models produces misleading results and that traditional fit statistics are not useful for detecting unobserved heterogeneity in the data. Furthermore, sequential analyses that first form groups using cluster analysis and then apply multigroup structural equation modeling are not satisfactory.
We develop a general finite mixture structural equation model that simultaneously treats heterogeneity and forms market segments in the context of a specified model structure where all the observed variables are measured with error. The model is considerably more general than cluster analysis, multigroup confirmatory factor analysis, and multigroup structural equation modeling. In particular, the model subsumes several specialized models including finite mixture simultaneous equation models, finite mixture confirmatory factor analysis, and finite mixture second-order factor analysis.
The finite mixture structural equation model should be of interest to academics in a wide range of disciplines (e.g., Consumer Behavior, Marketing, Economics, Finance, Psychology, and Sociology) where unobserved heterogeneity and measurement error are problematic. In addition, the model should be of interest to market researchers and product managers for two reasons. First, the model allows the manager to perform response-based segmentation using a consumer decision process model, while explicitly allowing for both measurement and structural error. Second, the model allows managers to detect unobserved moderating factors which account for heterogeneity. Once managers have identified the moderating factors, they can link segment membership to observable individual-level characteristics (e.g., socioeconomic and demographic variables) and improve marketing policy.
We applied the finite mixture structural equation model to a direct marketing study of customer satisfaction and estimated a large model with 8 unobserved constructs and 23 manifest indicators. The results show that there are three consumer segments that vary considerably in terms of the importance they attach to the various dimensions of satisfaction. In contrast, aggregate analysis is misleading because it incorrectly suggests that except for price all dimensions of satisfaction are significant for all consumers. Methodologically, the finite mixture model is robust; that is, the parameter estimates are stable under double cross-validation and the method can be used to test large models. Furthermore, the double cross-validation results show that the finite mixture model is superior to sequential data analysis strategies in terms of goodness-of-fit and interpretability.
We performed four simulation experiments to test the robustness of the algorithm using both recursive and nonrecursive model specifications. Specifically, we examined the robustness of different model selection criteria (e.g., CAIC, BIC, and GFI) in choosing the correct number of clusters for exactly identified and overidentified models assuming that the distributional form is correctly specified. We also examined the effect of distributional misspecification (i.e., departures from multivariate normality) on model performance. The results show that when the data are heterogeneous, the standard goodness-of-fit statistics for the aggregate model are not useful for detecting heterogeneity. Furthermore, parameter recovery is poor. For the finite mixture model, however, the BIC and CAIC criteria perform well in detecting heterogeneity and in identifying the true number of segments. In particular, parameter recovery for both the measurement and structural models is highly satisfactory. The finite mixture method is robust to distributional misspecification; in addition, the method significantly outperforms aggregate and sequential data analysis methods when the form of heterogeneity is misspecified (i.e., the true model has random coefficients).
Researchers and practitioners should only use the mixture methodology when substantive theory supports the structural equation model, a priori segmentation is infeasible, and theory suggests that the data are heterogeneous and belong to a finite number of unobserved groups. We expect these conditions to hold in many social science applications and, in particular, market segmentation studies.
Future research should focus on large-scale simulation studies to test the structural equation mixture model using a wide range of models and statistical distributions. Theoretical research should extend the model by allowing the mixing proportions to depend on prior information and/or subject-specific variables. Finally, in order to provide a fuller treatment of heterogeneity, we need to develop a general random coefficient structural equation model. Such a model is presently unavailable in the statistical and psychometric literatures.