Structural Equation Modeling Approaches to Estimating Score Dependability Within Generalizability Theory-Based Univariate, Multivariate, and Bifactor Designs

Generalizability theory (GT) provides an all-encompassing framework for estimating accuracy of scores and effects of multiple sources of measurement error when using measures intended for either norm- or criterion-referencing purposes. Structural equation models (SEMs) can replicate results from GT-...

Full description

Saved in:
Bibliographic Details
Main Authors: Walter P. Vispoel, Hyeryung Lee, Tingting Chen
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/6/1001
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Generalizability theory (GT) provides an all-encompassing framework for estimating accuracy of scores and effects of multiple sources of measurement error when using measures intended for either norm- or criterion-referencing purposes. Structural equation models (SEMs) can replicate results from GT-based ANOVA procedures while extending those analyses to account for scale coarseness, generate Monte Carlo-based confidence intervals for key parameters, partition universe score variance into general and group factor effects, and assess subscale score viability. We apply these techniques in R to univariate, multivariate, and bifactor designs using a novel indicator-mean approach to estimate absolute error. When representing responses to items from the shortened form of the Music Self-Perception Inventory (MUSPI-S) using 2-, 4-, and 8-point response metrics over two occasions, SEMs reproduced results from the ANOVA-based <i>mGENOVA</i> package for univariate and multivariate designs with score accuracy and subscale viability indices within bifactor designs comparable to those from corresponding multivariate designs. Adjusting for scale coarseness improved the accuracy of scores across all response metrics, with dichotomous observed scores least approximating truly continuous scales. Although general-factor effects were dominant, subscale viability was supported in all cases, with transient measurement error leading to the greatest reductions in score accuracy. Key implications are discussed.
ISSN:2227-7390