1 Introduction
The envelope model is initially introduced by Cook et al. (Reference Cook, Li and Chiaromonte2010) as a technique for dimension reduction in multivariate analysis. Cook et al. (Reference Cook, Li and Chiaromonte2010) first proposed the response envelope model, which aims to reduce the dimension of the response vector
$\mathbf {Y}$
. The core concept of the method is to decompose
$\mathbf {Y}$
into a material and an immaterial part based on the assumption that certain linear combinations of
$\mathbf {Y}$
, known as X-invariant, remain unchanged regardless of variations in the predictor vector
$\mathbf {X}$
. These X-invariant linear combinations are considered immaterial to the regression and thus can reduce parameter space in estimation. Expanding upon this idea, Cook et al. (Reference Cook, Helland and Su2013) extended the approach to include the dimensionality reduction in
$\mathbf {X}$
within the context of multivariate regression, assuming the predictor
$\mathbf {X}$
is stochastic. This extension, known as the predictor envelope model, is closely related to the partial least squares (PLS) method.
The envelope model has shown promising efficiency gains, which has prompted its extension into various contexts, including the partial envelope model (Su & Cook, Reference Su and Cook2011), generalized linear model (Cook & Zhang, Reference Cook and Zhang2015a), matrix-valued response model (Ding & Cook, Reference Ding and Cook2018), sparse envelope model (Su et al., Reference Su, Zhu, Chen and Yang2016) and spatial envelope model (Rekabdarkolaee et al., Reference Rekabdarkolaee, Wang, Naji and Fuente2020). However, applying Bayesian methods to the envelope model has received limited attention compared to the frequentist framework in previous literature. This fact is primarily due to the challenge posed by parameterizing the envelope subspace within the Grassmann manifold space, where the basis of the envelope subspace is not unique. Nevertheless, considering the envelope model from a Bayesian perspective is highly meaningful, as it allows for incorporating prior information into posterior inference without relying on asymptotic assumptions.
Regarding Bayesian envelope models, Khare et al. (Reference Khare, Pal and Su2017) first proposed a Bayesian approach for analyzing response envelope models. They reparameterized the envelope model in a Steifel manifold to ensure the uniqueness of the orthogonal basis and adopted the Bingham distribution as a prior for the orthogonal basis matrix. However, this method is developed based on a specific design of the response envelope model, and it may lose parameter conjugacy in other scenarios. Additionally, the implementation of this method relies on sampling from the matrix Bingham distribution and the truncated inverse Gamma distribution, which can result in a heavy computational burden. On the other hand, Cook et al. (Reference Cook, Forzani and Su2016) developed a novel parameterization for the orthogonal basis matrix of the envelope subspace, and this idea was further extended to the Bayesian framework by Chakraborty & Su (Reference Chakraborty and Su2024). The proposed method does not rely on the Grassmann or Steifel manifold and can be applied to various envelope model contexts, including the response and predictor envelopes. Meanwhile, Lee et al. (Reference Lee, Chakraborty and Su2022) adopted the same technique and formulated the envelope model for Bayesian quantile regression.
Inspired by the successful application of the reparameterized envelope method, this study introduces a factor analytic technique to the predictor envelope model, enabling dimension reduction for both the response and predictor variables. Factor models, such as confirmatory factor analysis (CFA) and exploratory factor analysis (EFA), are widely utilized statistical tools in fields such as psychology, education, and social sciences. Over the past few decades, existing studies have demonstrated the effectiveness of the factor model in capturing latent structures and reducing dimensionality by summarizing the latent factors through multiple observed variables. Moreover, the joint modeling approach, also known as the structural equation model (SEM), has shown high potential for adapting to various modeling techniques. The joint model typically consists of two parts. The first part entails a factor analysis model aggregating multiple observed variables into latent factors. This step captures the underlying structure and interrelationships among the observed variables. The second part utilizes a regression model to elucidate the association between the latent factors and the observed covariates of interest. It allows for adapting various modeling techniques, enabling a flexible and versatile analysis. For example, Roy & Lin (Reference Roy and Lin2000) used multiple longitudinal measures as outcomes to quantify a latent variable of interest from different perspectives. They adopted a linear mixed model to study the effects of covariates on the time-dependent latent variable. Pan et al. (Reference Pan, Kang, Wang and Song2019) integrated latent variables into a proportional hazards model to examine the observed and latent risk factors associated with the failure time of interest. Wang et al. (Reference Wang, Song and Zhu2021) introduced a latent-on-image model to jointly analyze high-dimensional imaging data and multiple clinical measurements in an Alzheimer’s disease (AD) study. They characterized the severity of AD using various cognitive test scores as a latent factor in a CFA model and investigated the relationship between changes in brain structure and cognitive decline using a functional data regression model. Besides, researchers have developed plenty of methods to handle flexible data structure with a factor model, including hierarchical and heterogeneous data (Lee & Song, Reference Lee and Song2004), missing data (Song & Lee, Reference Song and Lee2006), longitudinal data (Song et al., Reference Song, Lu, Hser and Lee2011).
Bayesian methods have been extensively applied to SEM topics because they emphasize individual-level random observations and the estimation of first-order moment properties. This approach offers a simpler alternative to the traditional approach of fitting the covariance structure. By focusing on the raw individual-level data, Bayesian methods provide a flexible and intuitive framework for specifying prior distributions, incorporating prior knowledge, and conducting posterior inference. Furthermore, the hierarchical representation of the model, combined with efficient Markov chain Monte Carlo (MCMC) algorithms, allows for a straightforward statistical inference and accommodates highly complex models. For instance, Wang et al. (Reference Wang, Feng and Song2016) integrated a mixture representation of the quantile regression model (Kozumi & Kobayashi, Reference Kozumi and Kobayashi2011; Yu & Moyeed, Reference Yu and Moyeed2001) into SEM, moving beyond the usual assumption of normal errors. Feng et al. (Reference Feng, Wang, Lu and Song2017) incorporated the Bayesian version of Lasso (BLasso) and adaptive Lasso (BaLasso) to quantile SEM, enabling simultaneous estimation and variable selection in this context.
This study is motivated by a real-world data analysis of AD. A comprehensive understanding of the relationship between changes in brain structure, cognitive function, and the progression of brain disease is critical for accurate diagnosis and prevention of brain-related disorders. In the context of brain degeneration, it is often a global process that affects multiple regions rather than being confined to specific areas. Therefore, understanding the interdependencies among different regions of interest (ROIs) is crucial. The Envelope method, which we employed in this study, allows us to investigate and uncover these interdependencies among ROIs. By examining the relationships among various brain regions, we aim to identify the most informative ROI or combination of ROIs that can provide valuable insights into the underlying mechanisms of brain diseases.
The main contribution of this article is to formulate a novel envelope approach within the context of Bayesian SEM. Introducing the envelope model to the structural component of SEM serves a dual purpose. On the one hand, the envelope model eliminates immaterial variation in the data, thereby enhancing estimation efficiency. This enhancement is particularly valuable when confronted with high-dimensional candidate predictors (e.g., neuroimaging phenotypes derived from MRI data), of which only a small subset are genuinely influential so that high-dimensional predictors can be projected to a reduced subspace. On the other hand, grouping latent variables from multiple observed surrogates also represents a form of dimension reduction, especially pertinent in situations, where the correlation among multivariate responses arises from the shared underlying mechanism of reflecting the same latent construct from varying perspectives (e.g., cognitive impairment is manifested by multiple cognitive tests together in the ADNI study). Therefore, the proposed Bayesian Envelope SEM (BESEM) offers a novel perspective on concurrent dimension reduction for both response variables and predictors. This is achieved by employing factor analysis in the domain of latent variable modeling for the former and integrating an envelope structure into the structural equation for the latter. Even in scenarios, where a lower-dimensional envelope subspace does not exist, the proposed method seamlessly degenerates to a standard SEM without compromising parameter estimation accuracy. To our knowledge, this work is the first to introduce envelope methods to the SEM framework. We restrict the estimation of the envelope space to an orthogonal basis, which greatly reduces the computation efficiency. The derived posterior distributions are proved to be proper even with non-informative priors. A simple block Metropolis-within-Gibbs MCMC algorithm is presented to facilitate posterior sampling. The proposed Markov chain is shown to be
$\phi $
-irreducible and aperiodic, ensuring the convergence of MCMC samples.
The remainder of the article is organized as follows. Section 2 outlines the envelope model and defines BESEM. Section 3 discusses the Bayessian inference. Section 4 presents simulated experiments, showing the efficiency gains of BESEM compared to conventional Bayesian SEM approaches. In Section 5, we apply BESEM to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study to explore new insights into the relationship between cognitive decline and different brain regions. Technical details are provided in the Supplementary Material.
2 Model description
2.1 Review of envelope model
In this section, we provide a brief overview of the envelope idea proposed by Cook et al. (Reference Cook, Li and Chiaromonte2010). This idea was initially developed to reduce the regression coefficients in the multivariate linear regression model given by

where
$\mathbf {Y} \in \mathbb {R}^r$
,
$\mathbf {X} \in \mathbb {R}^p$
represent the response and predictor vector, respectively,
$\pmb {\mu } \in \mathbb {R}^r$
and
$\pmb {\beta } \in \mathbb {R}^{r\times p}$
are the intercept and coefficients, and
$\pmb \varepsilon \in \mathbb {R}^r$
is the error term with a zero mean and a positive definite covariance matrix
$\pmb {\Sigma }$
.
The response envelope model aims to decompose the response variable
$\mathbf {Y}$
into a material part and an immaterial part based on the assumption that X-invariant linear combinations of
$\mathbf {Y}$
exist. Specifically, let
$\mathcal {E} $
be a subspace of
$\mathbb {R}^r$
,
$\mathbf {P}_{\mathcal {E}}$
is the projection onto
$\mathcal {E}$
and
$\mathbf {Q}_{\mathcal {E}} = \mathbf {I}_r-\mathbf {P}_{\mathcal {E}}$
, where
$\mathbf {I}_r$
is the identity matrix of size
$r$
. The response envelope seeks to find the minimal subspace
$\mathcal {E}$
that satisfies the following two conditions:
-
(i)
$\mathbf {Q}_{\mathcal {E}} \mathbf {Y} \mid \mathbf {X} \sim \mathbf {Q}_{\mathcal {E}} \mathbf {Y} $ , where
$\sim $ denotes equality in distribution and
-
(ii)
$\mathbf {P}_{\mathcal {E}} \mathbf {Y} \perp \!\!\!\perp \mathbf {Q}_{\mathcal {E}} \mathbf {Y} \mid \mathbf {X}$ .
These two conditions stipulate that the distribution of
$\mathbf {Q}_{\mathcal {E}}\mathbf {Y}$
is not influenced by
$\mathbf {X}$
nor by
$\mathbf {P}_{\mathcal {E}} \mathbf {Y}$
, indicating that the dependence of
$\mathbf {Y}$
on
$\mathbf {X}$
is concentrated only in
$\mathbf {P}_{\mathcal {E}} \mathbf {Y}$
, which is referred to as the material part. Then,
$\mathbf {Q}_{\mathcal {E}} \mathbf {Y}$
is the immaterial part. Cook et al. (Reference Cook, Li and Chiaromonte2010) showed that (i) and (ii) are equivalent to the following two conditions:
-
(i′)
$\mathcal {B}\subseteq \mathcal {E}$ , where
$\mathcal {B}=\text {span}(\pmb {\beta }) $ and
-
(ii′)
$\pmb {\Sigma } = \mathbf {P}_{\mathcal {E}}\pmb {\Sigma }\mathbf {P}_{\mathcal {E}} + \mathbf {Q}_{\mathcal {E}}\pmb {\Sigma }\mathbf {Q}_{\mathcal {E}}$ .
Condition (ii
$'$
) states that
$\mathcal {E}$
is a reducing subspace of
$\pmb {\Sigma }$
. Combined with condition (i
$'$
), the
$\pmb {\Sigma }$
-envelope of
$\pmb {\beta }$
, denoted by
$\mathcal {E}_{\pmb {\Sigma }} (\mathcal {B})$
, is defined as the smallest reducing subspace of
$\pmb {\Sigma }$
that contains
$\text {span}(\pmb {\beta })$
. The existence of the
$\pmb {\Sigma }$
-envelope of
$\pmb {\beta }$
was discussed by Cook et al. (Reference Cook, Li and Chiaromonte2010).
Let
$u=\dim (\mathcal {E}_{\pmb {\Sigma }}(\mathcal {B}))$
,
$\pmb {\Gamma } \in \mathbb {R} ^{r\times u}$
and
${\pmb {\Gamma }}_0 \in \mathbb {R} ^{r\times (p-u)}$
be the orthogonal bases of
$\mathcal {E}_{\pmb {\Sigma }} (\mathcal {B})$
and
$\mathcal {E}_{\pmb {\Sigma }} ^\bot (\mathcal {B})$
, respectively. Here,
$\mathcal {E}_{\pmb {\Sigma }} ^\bot (\mathcal {B})$
is the orthogonal complement of
$\mathcal {E}_{\pmb {\Sigma }} (\mathcal {B})$
. Model (1) can be parameterized in terms of
$\mathcal {E}_{\pmb {\Sigma }} (\mathcal {B})$
as follows:

where
$\boldsymbol {d} \in \mathbb {R}^{u\times p}$
is the coordinates of
$\pmb {\beta }$
with respect to
$\pmb {\Gamma }$
, and we have
$\pmb {\beta } = \pmb {\Gamma } \boldsymbol {d}$
. The matrices
$\pmb {\Omega } \in \mathbb {R}^{u \times u}$
and
${\pmb {\Omega }}_0 \in \mathbb {R}^{(r-u) \times (r-u)}$
are positive definite and specify the covariance structure of the material and immaterial parts. For a fixed
$u$
, the total number of parameters required for the model (2) is

If
$u<r$
, the efficiency gains of the envelope are possible compared to the standard multivariate linear regression model (1). If
$u=r$
, the envelope model degenerates into the linear regression standard model.
The predictor envelope model is built on a framework similar to the response envelope model, aiming to reduce the dimensionality of
$\mathbf {X}$
. To accommodate a stochastic predictor
$\mathbf {X}$
with mean
${\pmb {\mu }}_{\mathbf {X}}$
and variance
${\pmb {\Sigma }}_{\mathbf {X}}$
, model (1) is modified as follows:

where
$\pmb \varepsilon $
is independent of
$\mathbf {X}$
and not necessarily normally distributed. To decompose
$\mathbf {X}$
into its material and immaterial parts, we assume there is a subspace
$\mathcal {S} \in \mathbb {R}^p$
that satisfies the following conditions:
-
(a) cov
$\left (\mathbf {Y}, \mathbf {Q}_{\mathcal {S}} \mathbf {X} \mid \mathbf {P}_{\mathcal {S}}\mathbf {X}\right )$ = 0 and
-
(b) cov
$\left (\mathbf {P}_{\mathcal {S}}\mathbf {X}, \mathbf {Q}_{\mathcal {S}}\mathbf {X}\right )$ = 0.
These conditions indicate that
$\mathbf {Q}_{\mathcal {S}}\mathbf {X}$
is uncorrelated with both
$\mathbf {P}_{\mathcal {S}}\mathbf {X}$
and
$\mathbf {Y}$
, while all the information in
$\mathbf {X}$
that is linearly related to the regression is captured by
$\mathbf {P}_{\mathcal {S}}\mathbf {X}$
. It has been shown that these conditions are equivalent to the following ones (Cook et al., Reference Cook, Helland and Su2013):
-
(a′)
$\mathcal {B}' \in \mathcal {S}$ , and
-
(b′)
$\mathcal {S}$ is a reducing subspace of
${\pmb {\Sigma }}_{\mathbf {X}}$ ,
where
$\mathcal {B}'=\text {span}({\pmb {\beta }}^{T})$
. The intersection of all subspaces satisfying these two properties is referred to as the
${\pmb {\Sigma }}_{\mathbf {X}}$
-envelope of
$\mathcal {B}'$
, denoted as
$\mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}} (\mathcal {B}')$
. Let
$m=\dim \left ( \mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}} (\mathcal {B}')\right )$
, and
$\pmb {\Phi } \in \mathbb {R}^{p\times m}$
be a orthogonal basis of
$\mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}} (\mathcal {B}')$
. Then, the predictor envelope model is formulated as

where
$\pmb {\beta } = \pmb {\Phi } \boldsymbol {c}$
,
$\boldsymbol {c} \in \mathbb {R}^{m\times r}$
is the coordinate of
$\pmb {\beta }$
with respect to
$\pmb {\Phi }$
,
$\Phi _0\in \mathbb {R}^{p\times (p-m)}$
is an orthogonal basis of
$\mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}}^\bot (\mathcal {B}')$
, and
$\pmb {\Delta }$
and
${\pmb {\Delta }}_0$
are positive definite matrices.
Considering that reducing the dimensionality of complex covariates is often of great interest in efficiently quantifying the relationship between the latent outcomes and a set of predictors in SEM, in the following section, we focus on the aspect of predictor dimension reduction to describe the proposed methodology.
2.2 BESEM
We introduce the envelope approach in the context of SEM. Let
$\mathbf {Y} \in \mathbb {R}^r$
be the observed response vector like the previous representation. In addition, we let
$\mathbf {X} \in \mathbb {R}^p$
be the vector of random predictor variables with mean
${\pmb {\mu }}_{\mathbf {X}}$
and variance
${\pmb {\Sigma }}_{\mathbf {X}}$
, and
$\pmb {\eta } \in \mathbb {R}^q$
be a vector of latent variables that are expected to be formulated from the observed variables in
$\mathbf {Y}$
. A standard SEM can be defined as follows:

where
$\pmb {\Lambda } \in \mathbb {R}^{r\times q}$
is the unknown factor loading matrix,
${\pmb {\mu }}_{\mathbf {Y}}\in \mathbb {R}^{r}$
and
${\pmb {\mu }}_{\pmb {\eta }}\in \mathbb {R}^{q}$
are intercepts,
$\pmb {\beta } \in \mathbb {R}^{p\times q}$
is the unknown coefficient of interest, and
$\pmb \varepsilon $
and
$\pmb \delta $
are independent error terms. We assume
$\pmb \varepsilon \sim N(\pmb 0, {\pmb {\Sigma }}_\varepsilon ) $
and
$\pmb \delta \sim N(\pmb 0,{\pmb {\Sigma }}_\delta )$
, where
${\pmb {\Sigma }}_\varepsilon $
is a diagonal matrix with diagonal elements
$\left \{\sigma _{\varepsilon k}^2 \right \}_{k=1}^r$
, and
${\pmb {\Sigma }}_\delta $
is a positive definite matrix.
The first equation in (5) represents the link between the observed outcome variables
$\mathbf {Y}$
and the latent variables
$\pmb {\eta }$
, characterized by a CFA model. The second equation in (5) is a structural equation to assess the effects of the covariates of interest
$\mathbf {X}$
on
$\pmb {\eta }$
.
To reduce the dimensionality of
$\mathbf {X}$
with an envelope structure and decompose
$\mathbf {X}$
into its material and immaterial parts, we assume there is a subspace
$\mathcal {S} \in \mathbb {R}^p$
that satisfies the following conditions:
-
(a) cov
$\left (\pmb {\eta }, \mathbf {Q}_{\mathcal {S}} \mathbf {X} \mid \mathbf {P}_{\mathcal {S}}\mathbf {X}\right )$ = 0 and
-
(b) cov
$\left (\mathbf {P}_{\mathcal {S}}\mathbf {X}, \mathbf {Q}_{\mathcal {S}}\mathbf {X}\right )$ = 0.
These conditions indicate that
$\mathbf {Q}_{\mathcal {S}}\mathbf {X}$
is uncorrelated with both
$\mathbf {P}_{\mathcal {S}}\mathbf {X}$
and
$\pmb {\eta }$
, while all the information in
$\mathbf {X}$
that is linearly related to
$\pmb {\eta }$
in the SEM is captured by
$\mathbf {P}_{\mathcal {S}}\mathbf {X}$
. The conditions (a) and (b) are equivalent to the following two conditions (Cook et al., Reference Cook, Helland and Su2013):
-
(a′)
$\mathcal {B}' \in \mathcal {S}$ , where
$\mathcal {B}'=\text {span}({\pmb {\beta }}^{T})$ and
-
(b′)
$\mathcal {S}$ is a reducing subspace of
${\pmb {\Sigma }}_{\mathbf {X}}$ .
Therefore, we aim to find the smallest subspace that satisfies (a
$'$
) and (b
$'$
), or equivalently, the intersection of all reducing subspaces of
${\pmb {\Sigma }}_{\mathbf {X}}$
that contains
$\mathcal {B}'$
, which is the
${\pmb {\Sigma }}_{\mathbf {X}}$
-envelope of
$\mathcal {B}'$
, denoted as
$\mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}} (\mathcal {B}')$
.
Let
$m=\dim \left ( \mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}} (\mathcal {B}')\right )$
, and
$\pmb {\Phi } \in \mathbb {R}^{p\times m}$
be a orthogonal basis of
$\mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}} (\mathcal {B}')$
. Then, the SEM model (5) can be formulated as the following envelope SEM:

where
${\pmb {\beta }}^{T} = \pmb {\Phi } \boldsymbol {c}$
,
$\boldsymbol {c} \in \mathbb {R}^{m\times q}$
is the full rank coordinate of
$\pmb {\beta }$
with respect to
$\pmb {\Phi }$
,
${\pmb {\Phi }}_0\in \mathbb {R}^{p\times (p-m)}$
is an orthogonal basis of
$\mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}}^{\bot } (\mathcal {B}')$
. The matrices
$\pmb {\Delta } \in \mathbb {R}^{m\times m}$
and
${\pmb {\Delta }}_0 \in \mathbb {R}^{(p-m)\times (p-m)}$
are positive definite. When
$m=p$
, the proposed envelope model is equivalent to a standard SEM.
2.3 Reparametrization of BESEM
For a fixed
$m$
, the envelope
$ \mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}} (\mathcal {B}')$
is defined on a
$p\times m$
Grassmann manifold, which implies that the specification of the basis
$\pmb {\Phi }$
is not unique. We can reparameterize the envelope model to address this issue and identify a unique orthogonal basis in an Euclidean space (Cook et al., Reference Cook, Forzani and Su2016).
Let
$\pmb {\Phi }$
be an arbitrary basis of
$ \mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}} (\mathcal {B}')$
. We define
${\pmb {\Phi }}_1$
as the first m rows of
$\pmb {\Phi }$
. Without loss of generality, we assume that
${\pmb {\Phi }}_1$
is nonsingular (otherwise, we can reorder the rows in
$\mathbf {X}$
). The remaining
$p-m$
rows of
$\pmb {\Phi }$
are denoted as
${\pmb {\Phi }}_2$
.
$\pmb {\Phi }$
can be expressed as follows:

where
$\boldsymbol {A} = {\pmb {\Phi }}_2{\pmb {\Phi }}_1^{-1} \in \mathbb {R}^{(p-m)\times m}$
is an unconstrained matrix, and
${\pmb {C}}_{\boldsymbol {A}} = \left (\mathbf {I}_m, \boldsymbol {A}^{T} \right )^{T} $
is also a basis of
$\mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}}(\mathcal {B'})$
. It has been shown that
$\boldsymbol {A}$
depends on
$\pmb {\Phi }$
only through
$\text {span}(\pmb {\Phi })$
: suppose
${\pmb {\Phi }}^*$
is a different basis of
$\mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}}(\mathcal {B'})$
and
$\pmb {O}\in \mathbb {R}^{m\times m}$
is a full rank matrix, such that
${\pmb {\Phi }}^* = \pmb {\Phi } \pmb {O}$
, then
${\pmb {\Phi }}_1^* = {\pmb {\Phi }}_1 \pmb {O}$
,
${\pmb {\Phi }}_2^* = {\pmb {\Phi }}_2 \pmb {O}$
, and
$\boldsymbol {A}^* = {\pmb {\Phi }}_2\pmb {O}{\pmb {O}}^{-1}{\pmb {\Phi }}_1 = \boldsymbol {A}$
(Su et al., Reference Su, Zhu, Chen and Yang2016). Therefore, there is a one-to-one correspondence between
${\boldsymbol {A} \in \mathbb {R}^{(p-m)\times m}}$
and
$\mathcal {E}_{{\pmb {\Sigma }}_{\mathbf {X}}}(\mathcal {B'})$
. Furthermore, the orthonormal basis of
$\mathcal {E}_{{\pmb {\Sigma }}_{\mathbf x}}(\mathcal {B'})$
can be expressed as
$\pmb {\Phi }=\pmb {\Phi }(\boldsymbol {A}) = {\pmb {C}}_{\boldsymbol {A}}\left ({\pmb {C}}_{\boldsymbol {A}}^{T}{\pmb {C}}_{\boldsymbol {A}}\right )^{-\frac {1}{2}}$
.
Based on (Chen et al., Reference Chen, Su, Yang and Ding2020, Lemma 1), we can construct a basis of
$\mathcal {E}^{\bot }_{{\pmb {\Sigma }}_{\mathbf {X}}}(\mathcal {B'})$
by forming the matrix
$\boldsymbol {D}_{\boldsymbol {A}} = \left (-\boldsymbol {A}, \mathbf I_{p-m}\right )^{T} $
. Likewise, we can obtain an orthonormal basis as
${\pmb {\Phi }}_0(\boldsymbol {A}) = {\pmb {D}}_{\boldsymbol {A}}({\pmb {D}}_{\boldsymbol {A}}^{T} {\pmb {D}}_{\boldsymbol {A}})^{-\frac {1}{2}}$
. Accordingly, (5) can be written as

where
$\mathbf {X} \sim N\left ({\pmb {\mu }}_{\mathbf {X}},\pmb {\Phi }(\boldsymbol {A})\pmb {\Delta }\pmb {\Phi }(\boldsymbol {A})^{T} +{\pmb {\Phi }}_0(\boldsymbol {A}){\pmb {\Delta }}_0{\pmb {\Phi }}_0(\boldsymbol {A})^{T} \right )$
.
2.4 Model identification
The measurement equation in (8) is not identified because, for any nonsigular matrix
$\mathbf {B}$
, we have
$\pmb {\Lambda }\pmb {\eta } = \pmb {\Lambda }\mathbf {B}\mathbf {B}^{-1}\pmb {\eta } = {\pmb {\Lambda }}^* {\pmb {\eta }}^*$
, where
${\pmb {\Lambda }}^* = \pmb {\Lambda }\mathbf {B}$
, and
${\pmb {\eta }}^* = \mathbf {B}^{-1}\pmb {\eta }$
is still random latent variables. Imposing identification constraints on the measurement equation is necessary to establish identification. One commonly used approach, as described in Song & Lee (Reference Song and Lee2012), is to define
$\pmb {\Lambda }$
in a non-overlapping structure with a fixed nonzero element in each column. Here is an illustrative example: consider a scenario with
$p=6$
observed variables and
$q=2$
latent variables. In this case, the first four observed variables are associated with the first latent factor, while the remaining two are related to the second latent factor. Let
${\lambda }_{jk}$
denote the
$(j,k)$
th element of
$\pmb {\Lambda }$
. A non-overlapping structure of
$\pmb {\Lambda }$
can be defined as follows:

In the above structure, the elements with values 1 and 0 are known parameters with fixed values, while the
${\lambda }_{jk}$
s’ represent the unknown parameters that need to be estimated. Therefore, the total number of parameters that need to be estimated is

2.5 Envelope dimension selection
Determining the optimal dimension of the envelope space is an essential step before estimating the envelope model. Various information criteria, such as the Akaike information criterion (AIC), Bayesian information criterion (BIC), and the deviance information criteria(DIC), have been widely used in previous research. Information criteria offer a trade-off between model fit and complexity, with lower values indicating better-fitting models. The dimension of the envelope space that minimizes the AIC, BIC, or DIC can be chosen as the optimal dimension. However, there is no universally applicable guideline regarding which information criterion performs best in all scenarios. The performance of these criteria may vary depending on the specific characteristics of the data and the model under consideration. Prior studies, such as Shen et al. (Reference Shen, Park, Chakraborty and Zhang2023) and Khare et al. (Reference Khare, Pal and Su2017); and Chakraborty & Su (Reference Chakraborty and Su2024), have demonstrated that different information criteria have distinct performance in various envelope model contexts. Therefore, we investigate the performance of several information criteria, including AIC, BIC, DIC, and average weighted estimation (AWE), in determining the optimal dimension of the envelope space in the context of BESEM by conducting an empirical experiment in Section 4. Let
$\pmb {\theta }$
be all the unknown parameters and
$L(\text {data}|\pmb {\theta })$
be the likelihood, then
$\hat {L}$
is the maximized likelihood value of the model. The deviance is
$ D(\pmb {\theta }) = -2\log L(\text {data}|\pmb {\theta })$
. The values of AIC, BIC, DIC, and AWE are calculated as follows:

where
$\ p_D = E_{\pmb {\theta } |\text {data}}\left [D\right ] - D(E_{\pmb {\theta }|\text {data}}\left [\pmb {\theta }\right ]) = \bar {D} - D(\bar {\pmb {\theta }})$
denotes the posterior mean deviance minus the deviance evaluated at the posterior mean of the parameters.
This empirical analysis provides valuable insights into the relative performance of information criteria and their suitability in specific scenarios of envelope models.
3 Bayesian inference
3.1 Prior distributions
Let
$\pmb {\theta }$
represent all the unknown parameters in model (8), and
$\pmb {\theta }$
=
$\{{\pmb {\mu }}_{\mathbf {Y}}$
,
$ \pmb \Lambda , {\pmb {\Sigma }}_\varepsilon $
,
$ {\pmb {\Sigma }}_\delta $
,
${\pmb {\mu }}_{\boldsymbol {\eta }}$
,
$ {\pmb {\mu }}_{\mathbf {X}}$
,
$ \pmb {\Delta }$
,
$ {\pmb {\Delta }}_0$
,
$\boldsymbol {c}$
,
$\boldsymbol {A} \}$
. We define a joint prior density
$p(\pmb {\theta })$
for the unknown parameters, which can be decomposed as
$p(\pmb {\theta }) = p({\pmb {\mu }}_{\mathbf {Y}},\pmb \Lambda \mid {\pmb {\Sigma }}_\varepsilon ) p({\pmb {\Sigma }}_\varepsilon ) p({\pmb {\Sigma }}_\delta ) p({\pmb {\mu }}_{\boldsymbol {\eta }}) p({\pmb {\mu }}_{\mathbf {X}}) p(\pmb {\Delta }) p({\pmb {\Delta }}_0) p\left (\boldsymbol {c}| \boldsymbol {A}, {\pmb {\Sigma }}_\delta \right ) p(\boldsymbol {A} )$
.
We first specify some notations:
$\mathbb {S}_+^{m\times m}$
denotes the set of
$m\times m$
symmetric positive definite matrices,
$\mathbb {R}_+^{m}$
denotes the set of vectors of length
$m$
with positive values,
$\text {IW}_m(\pmb {\Psi }, v)$
represents the Inverse-Wishart distribution with scale matrix
$\pmb {\Psi } \in \mathbb {S}_+^{m\times m}$
and degree of freedom
$v$
,
$\text {IG}(a,b)$
represents the Inverse-Gamma distribution, and
$\mathcal {MN}_{n_1,n_2}(\mathbf {M}, \mathbf U, \mathbf V)$
is the matrix normal distribution with the parameters
$\mathbf {M} \in \mathbb {R}^{n_1\times n_2}$
,
$\mathbf U \in \mathbb {S}_+^{n_1\times n_1}$
,
$\mathbf V \in \mathbb {S}_+^{n_2\times n_2}$
.
The priors for the concerned parameters are defined as follows:
-
• Let
${\pmb {\Lambda }}_y = \left ({\pmb {\mu }}_{\mathbf {Y}}, \pmb {\Lambda } \right )$ , then for each
$k=1,\dots ,r$ , where
${\pmb {\Lambda }}_{yk}^{T}$ represents the
$k$ th row of
${\pmb {\Lambda }}_y$ , we assign a joint prior to
$({\pmb {\Lambda }}_{yk}, \sigma ^2_{\varepsilon k})$ , given by
$p({\pmb {\Lambda }}_{yk}, \sigma ^2_{\varepsilon k}) = p({\pmb {\Lambda }}_{yk}| \sigma ^2_{\varepsilon k})p(\sigma ^2_{\varepsilon k})$ . Specifically,
$$ \begin{align*} \left[{\pmb{\Lambda}}_{yk}\mid \sigma^2_{\varepsilon k}\right] \sim \text{N}_{q+1}({\pmb{\Lambda}}_{0k}, \mathbf{H}_{0k}), \quad \quad \sigma^2_{\varepsilon k} \sim \text{IG}(a_{0k}, b_{0k}), \quad \text{for} \ k=1,\dots,r, \end{align*} $$
${\pmb {\Lambda }}_{0k} \in \mathbb {R}^{q+1}$ ,
$\mathbf {H}_{0k} \in \mathbb {S}_+^{(q+1)\times (q+1)}$ ,
$a_{0k}>0$ and
$b_{0k}> 0$ are prefixed hyperparameters.
-
• We adopt flat priors for the intercept terms
${\pmb {\mu }}_{\mathbf {X}}$ and
${\pmb {\mu }}_{\pmb {\eta }}$
$$ \begin{align*} p({\pmb{\mu}}_{\mathbf{X}}) \propto 1,\quad \quad p({\pmb{\mu}}_{\pmb{\eta}}) \propto 1. \end{align*} $$
-
• We assign the Inverse-Wishart distribution to
$\pmb {\Delta }$ ,
${\pmb {\Delta }}_0$ , and
${\pmb {\Sigma }}_\delta $ :
$$ \begin{align*} \pmb{\Delta} \sim \text{IW}_m \left({\pmb{\Psi}}_1, v_1\right), \quad \quad {\pmb{\Delta}}_0 \sim \text{IW}_{p-m} \left({\pmb{\Psi}}_2, v_2\right),\quad \quad {\pmb{\Sigma}}_\delta \sim \text{IW}_q\left({\pmb{\Psi}}_\delta, v_0\right), \end{align*} $$
${\pmb {\Psi }}_1 \in \mathbb {S}_+^{m\times m}$ ,
${\pmb {\Psi }}_2 \in \mathbb {S}_+^{(p-m)\times (p- m)}$ ,
${\pmb {\Psi }}_\delta \in \mathbb {S}_+^{q\times q}$ ,
$v_1>m-1$ ,
$v_2> p-m-1$ , and
$v_0>q-1$ , are hyperparameters.
-
• We use matrix normal prior for
$\boldsymbol {c}$ and
$\boldsymbol {A}$ :
$$ \begin{align*} \left[\boldsymbol{c}\mid \boldsymbol{A}, {\pmb{\Sigma}}_\delta\right] &\sim \mathcal{MN}_{m,q}\left(\mathbf{M}^{-1}\pmb{\Phi}(\boldsymbol{A})^{T}\boldsymbol{e}, \mathbf{M}, {\pmb{\Sigma}}_\delta \right), \\ \boldsymbol{A} & \sim \mathcal{MN}_{p-m, m}\left( \boldsymbol{A}_0, \boldsymbol{K}, \boldsymbol{L} \right), \end{align*} $$
$\mathbf {M} \in \mathbb {S}_+^{m\times m }$ ,
$\boldsymbol {e} \in \mathbb {R}^{p\times q}$ ,
$\boldsymbol {A}_0 \in \mathbb {R}^{(p-m)\times m}$ ,
$\boldsymbol {K} \in \mathbb {S}_+^{(p-m)\times (p-m)}$ ,
$\boldsymbol {L} \in \mathbb {S}_+^{m\times m}$ are hyperparameters.
Vague (noninformative) priors are used in the numerical studies. Such vague priors are appropriate choices widely adopted when limited information is available about the relationship between the latent responses and a large number of covariates. However, the proposed method can also straightforwardly accommodate prior knowledge once available. For instance, if we have prior information on the factor loading matrix
$\boldsymbol {\Lambda }$
in the form of
$\widehat {\boldsymbol {\Lambda }}_{prior}$
, we can set the prior mean of the loading matrix as
$\boldsymbol {\Lambda }_0 = \widehat {\boldsymbol {\Lambda }}_{prior}$
with a relatively small prior variance matrix
$\mathbf {H}_{0k}$
. Similarly, if we possess prior knowledge on the potential envelope subspace, i.e.,
$\widehat {\mathcal {E}}_{prior}$
, we can determine the prior information for
$\mathbf {A}$
,
$\widehat {\mathbf {A}}_{prior}$
, through the one-to-one correspondence in Equation (7) and set
$\mathbf {A}_0 = \widehat {\mathbf {A}}_{prior}$
. In line with the methodology proposed by Chakraborty & Su (Reference Chakraborty and Su2024), even partial prior information concerning
$\widehat {\mathcal {E}}_{prior}$
can be incorporated into the proposed method. For example, assuming a candidate envelope dimension of
$m=4$
and we only know about
$\widehat {\mathcal {E}}_{prior}$
that it contains two independent unit vectors
$\mathbf {v}_1$
and
$\mathbf {v}_2$
. In such circumstances, we can generate two random vectors from span
$(\mathbf {v}_1 ,\mathbf {v}_2)^\bot $
as
$(\mathbf {v}_3 ,\mathbf {v}_4) = \mathbf {G}_0\mathbf {C}(\mathbf {C}^{T}\mathbf {C})^{-1/2},$
where
$\mathbf {G}_0 \in \mathbb {R}^{p \times (p-2)}$
is an orthonormal basis of span
$(\mathbf {v}_1 ,\mathbf {v}_2)^\bot $
and
$\mathbf {C}$
is a
$(p-2)\times p$
matrix with each entry independently generated from the standard normal distribution. Subsequently, the prior orthonormal basis can be formulated as
$\widehat {\boldsymbol {\Phi }}(\mathbf {A})_{prior} = (\mathbf {v}_1 ,\mathbf {v}_2, \mathbf {v}_3 ,\mathbf {v}_4)$
and
$\widehat {\mathbf {A}}_{prior}$
can be derived through Equation (7).
3.2 Posterior analysis and sampling process
Let
$\mathcal {D} = \left \{\mathbb {X}, \mathbb {Y} \right \}$
represent the collection of
$n$
independent observations of
$(\mathbf {X}, \mathbf {Y})$
, where
$\mathbb {X} = \left (\mathbf {X}_1,\dots ,\mathbf {X}_n\right )$
and
$\mathbb {Y} = \left (\mathbf {Y}_1,\dots ,\mathbf {Y}_n\right )$
. Additionally, let
$\pmb {\eta } = \left ({\pmb {\eta }}_1,\dots , {\pmb {\eta }}_n\right )$
denote the matrix of latent variables. The Bayesian estimate of
$\pmb {\theta }$
is commonly defined as the sample mean or mode of the posterior distribution
$p(\pmb {\theta } | \mathcal {D} )$
. However, drawing samples from
$p(\pmb {\theta }|\mathcal {D})$
can be challenging due to the presence of the latent variable
$\pmb {\eta }$
, as
$p(\pmb {\theta } | \mathcal {D})$
may not have a closed form. We utilize the data augmentation technique proposed by Tanner & Wong (Reference Tanner and Wong1987) to address this issue. In this approach, we treat the latent variables
$\pmb {\eta }$
as missing data and augment the observed data with them. Consequently, the posterior sampling procedure can be constructed based on the complete data set and the joint distribution of
$\left [\pmb {\theta },\pmb {\eta } | \mathcal {D} \right ]$
. Theorem 1 establishes the propriety of the posterior density, and its proof is included in the Supplementary Material.
Theorem 1 (Posterior propriety)
The posterior density
$p(\pmb {\theta },\pmb {\eta } | \mathcal {D} )$
is proper with respect to Lebesgue measure on
$\mathbb {R}^r\times \mathbb {R}^{r-q} \times \mathbb {R}_{+}^{r} \times \mathbb {S}_{+}^{q\times q} \times \mathbb {R}^q \times \mathbb {R}^p \times \mathbb {S}_+^{m\times m} \times \mathbb {S}_+^{(p-m)\times (p-m)} \times \mathbb {R}^{m\times q} \times \mathbb {R}^{(p-m)\times m} \times \mathbb {R}^q$
.
Given the complexity of the posterior distribution
$p(\pmb {\theta },\pmb {\eta } | \mathcal {D} )$
and the challenge of directly sampling from it, we propose a Metropolis-within-Gibbs sampler to draw posterior samples. The Gibbs sampler allows us to generate samples for each element of
$\pmb {\theta }$
and
$\pmb {\eta }$
from their respective full conditional distributions iteratively. The proposed MCMC sampler (Algorithm 1) for the case when envelope dimension
$m \in \left \{1,\dots ,p-1\right \}$
can be found in the Supplementary Material.
We also prove that the Markov chain generated by the proposed algorithm is
$\phi $
-irreducible and aperiodic in Theorem 2, which ensures the convergence to its stationary distribution from almost all starting states (Cinlar, Reference Cinlar2013).
Theorem 2 (
$\phi $
-irreducible and aperiodic)
The Markov chain generated by the Metropolis-within-Gibbs algorithm for the posterior sampling is
$\phi $
-irreducible and aperiodic.
4 Simulation
We design a simulation experiment to assess the performance of the BESEM methodology proposed in this article. The data are generated using equations (7) and (8). We consider three dimensions of the underlying envelope space, denoted as
$m_{\text {true}}$
, which include values of 2, 4, and 6. Additionally, we consider two different settings for the predictor dimension
$p$
, with values of 20 and 40. The other parameters are fixed as follows:
$r=6$
and
$q=2$
. To ensure identification,
${\lambda }_{1,1}$
and
${\lambda }_{4,2}$
are set to 1, while the remaining free elements of
$\pmb \Lambda $
are generated independently from a uniform distribution
$\text {Unif}(-5,5)$
. The elements of
${\pmb \mu }_{\pmb {\eta }}$
,
${\pmb \mu }_{\mathbf {X}}$
, and
${\pmb {\mu }}_{\mathbf {Y}}$
are independently sampled from a uniform distribution
$\text {Unif}(-10,10)$
, and the elements of
$\boldsymbol {c}$
and
$\boldsymbol {A}$
are drawn independently from uniform distributions
$\text {Unif}(5,10)$
and
$\text {Unif}(0,5)$
, respectively. The matrices
$\pmb {\Delta }$
,
${\pmb {\Delta }}_0$
and
${\pmb {\Sigma }}_\delta $
are generated independently from
$\text {IW}_m(500\mathbf {I}_m, m+2)$
,
$\text {IW}_{p-m}(\mathbf {I}_{p-m}, p-m+2)$
, and
$\text {IW}_q(10\mathbf {I}_{q}, q+2)$
, respectively. The diagonal elements of the error covariance matrix
${\pmb {\Sigma }}_\varepsilon $
are simulated from
$\text {Unif}(0.5,5)$
. More specifically, we first generate
$\mathbf {X}$
from a multivariate normal distribution characterized by parameters
${\pmb {\mu }}_{\mathbf {X}}$
,
$\boldsymbol {A}$
,
$\pmb {\Delta }$
, and
${\pmb {\Delta }}_0$
as mentioned before in Section 2.3. The error terms
$\pmb \delta $
and
$\pmb \varepsilon $
are independently drawn from normal distributions with zero mean and covariance matrix
${\pmb {\Sigma }}_\delta $
and
${\pmb {\Sigma }}_\varepsilon $
, respectively. The latent variables
$\pmb {\eta }$
and the corresponding observed responses
$\mathbf {Y}$
are then generated according to Equation (8). The simulation experiment considers different sample sizes, namely, 50, 150, 300, and 600. For each sample size, 100 replicated datasets are generated.
Table 1 Percentages of estimated envelope dimension
$\hat{m}$
that were selected by the AIC, BIC, AWE, and DIC methods in 100 replications

For model estimation and Bayesian inference, we consider vague priors on the unknown parameters as follows:
-
(1) For the prior distributions of
${\pmb {\Lambda }}_{yk}$ and
$\sigma _{\varepsilon k}^2$ , we set
${\pmb {\Lambda }}_{0k} = \pmb 0$ ,
$\mathbf {H}_{0k} = 1000\mathbf {I}_{q+1}$ ,
$a_{0k }= 9$ ,
$b_{0k} = 4$ , for
$k=1,\dots ,r$ .
-
(2) For the prior distributions of
$\pmb {\Delta }, {\pmb {\Delta }}_0$ and
${\pmb {\Sigma }}_\delta $ , the scale matrices
${\pmb {\Psi }}_1$ ,
${\pmb {\Psi }}_2$ ,
${\pmb {\Psi }}_\delta $ of the Inverse-Wishart distributions are set to
$10^{-6}$ times the identity matrix. The degrees of freedom are set to
$v_1=m, v_2 = p-m$ , and
$v_0=q$ .
-
(3) For the prior distributions of
$\boldsymbol {c}$ and
$\boldsymbol {A}$ , the prior covariance matrices are specified as
$10^6$ times the identity matrix, and the prior means are set to zero.
We use the trace plots of three Markov chains starting from different initial values to check convergence of the algorithm. Figure S1 in the Supplementary Material depicts the trace plots of several randomly selected elements of
$\boldsymbol {\beta }$
, showing that the Markov chains mixed well within the initial several thousand iterations. Therefore, we run the MCMC algorithm for 16000 iterations, with the first 8000 iterations as the burn-in stage. Four information criteria, AIC, BIC, DIC, and AWE, are employed for model selection. Due to the presence of the latent factor, the complete likelihood can be represented as an integration of
$p(\pmb {\theta },\pmb {\eta } | \mathcal {D})$
with respect to
$\pmb {\eta }$
, which can be complex. We use the importance sampling approach to approximate the maximized complete data likelihood. In each replication, we calculate the information criteria for different values of m ranging from 0 to
$p$
. The case where
$m=p$
corresponds to the standard SEM method. The estimated envelope dimension, denoted as
$\hat {m}$
, is selected based on the minimum criterion value.
The accuracy of the model selection is assessed by determining the percentage of correctly identifying the true envelope dimension
$m_{\text {true}}$
. The selection rate results for the different settings of
$m_{\text {true}}$
and
$p$
are presented in Table 1. As the sample size
$n$
increases, the selected
$m$
tends to concentrate more on the true envelope dimension
$m_{\text {true}}$
for all methods. The selection accuracy varies across different settings of
$m_{\text {true}}$
,
$p$
, and the true parameter value, and all four methods can exhibit good or poor performance in certain situations. However, among the four methods compared, the AWE method consistently achieves the highest accuracy across all settings. It is important to note that all four methods tend to overestimate the true envelope dimension. This overestimation has also been observed in previous studies (Chakraborty & Su, Reference Chakraborty and Su2024; Lee et al., Reference Lee, Chakraborty and Su2022). In such cases, the cost of overestimating
$m$
may involve losing some efficiency gained from dimension reduction, yet without introducing any bias in parameter estimation, as no material information is lost. The encouraging finding is that despite the fluctuating and dissatisfying selection rate, the overestimated models selected by AWE method still perform comparably to the true envelope model, although there might be a slight loss in efficiency. This assertion is supported by the estimation results of
$\pmb {\beta }$
, which is one of the main focuses of this analysis.

Figure 1 The RMSE of the estimated elements of
${\pmb \beta }$
for two cases in simulation.
Note: x-axis: coordinate of
$\pmb \beta $
. y-axis: value of RMSE.
Figure 1 and Tables 2 and 3 display the root mean squared error (RMSE) and bias of the estimated
$\pmb {\beta }$
for two specific settings of
$m_{\text {true}}$
and
$p$
. The plot includes the selected model with
$\hat {m}$
, the true model with
$m_{\text {true}}$
, and the standard SEM for two settings of
$m_{\text {true}}$
and
$p$
. The explicit values of the RMSE and bias of several selected elements in
$\pmb {\beta }$
are presented in Tables 2 and 3, respectively. The figure and the tables demonstrate that the envelope model can achieve efficiency gains and significantly reduce the RMSE and bias compared to the standard SEM. This improvement is particularly pronounced when the sample size is small. The performance for the other settings is similar, and the figures are not presented. Increasing the number of replications, e.g., to 200, only slightly reduced the RMSE and bias of the
$\boldsymbol {\beta }$
estimates further. Figure S2 in the Supplementary Material provides detailed variations observed in the estimated
$\boldsymbol {\beta }$
elements under the scenario of
$m=2,\ p=40$
to illustrate this marginal improvement from additional replications. Moreover, as the sample size increases, the selected model in the envelope model tends to behave similarly to the true envelope model, and their performance in terms of RMSE becomes almost indistinguishable. Regarding the estimation of the latent factor matrix, Table 4 shows that all three methods perform well with small RMSE values, and there is no noticeable difference among them. This suggests that the SEM can incorporate the envelope structure in the structural equation without compromising the accuracy of the measurement equation. The results for the other settings exhibit similar performance and thus are not reported.
The enhanced estimation accuracy of BESEM gained from the underlying envelope space is further corroborated through a comparison with SEM utilizing BLasso, a widely employed regularized SEM approach (Feng et al., Reference Feng, Wang, Wang and Song2015). Table S1 and Figure S3 in the Supplementary Material present the bias and RMSE of the
$\boldsymbol {\beta }$
coefficients estimated by the proposed BESEM, standard SEM, and SEM with BLasso. Notably, BESEM exhibits minimal bias and RMSE, standard SEM shows the highest bias and RMSE, whereas SEM with BLasso falls in between.
We conduct several additional simulations to evaluate further the robustness of the proposed method in dimension selection and parameter estimation. The first one considers the scenario, where the lower-dimensional envelope subspace is non-existent, i.e.,
$m=p$
. The second one examines the validity of the proposed method in terms of violating the normality assumptions on the distribution of predictors by generating
$\mathbf {X}$
from heavy-tailed and skewed distributions. Additionally, we perform prior sensitivity analyses with respect to different choices of hyperparameters. Performance of the proposed method is stable across the scenarios. Detailed setups and results are provided in the Supplementary Material.
In summary, the envelope model, including the selected model and the true model, outperforms the standard SEM model, especially in estimating
$\pmb {\beta }$
. While information criteria may exhibit limitations in terms of selection rate, it is noteworthy that as the sample size increases, the selected model within the envelope model framework can approximate the true model closely. This observation highlights the feasibility of the AWE method in selecting appropriate envelope models and maintaining their effectiveness in estimating the parameter of interest efficiently.
Table 2 The RMSE of estimated elements in
$\pmb {\beta }$
for two cases in Simulation

Note: Displayed are several randomly selected elements in
$\pmb {\beta }$
.
Table 3 The bias of estimated elements in
$\pmb {\beta }$
for two cases in simulation

Note: Displayed are several randomly selected elements in
$\pmb {\beta }$
.
Table 4 The RMSE of estimated free elements in
$\pmb {\Lambda }$
for two cases in simulation

To obtain the results reported in Table 1 using the proposed method, the computational time per replication ranges from about 1 min to a maximum of approximately 17 min on a Linux machine running R with a CPU block speed of 2.60 GHz. The actual timing depends on the specific combination of sample size, predictor dimension, and envelope dimension. Detailed results are summarized in Table S2 in the Supplementary Material. The code for implementing the preceding analysis is scripted in R with Rcpp and is freely available in the Supplementary Material.
5 Real data analysis
We apply the proposed BESEM to analyze the ADNI dataset. The ADNI project, launched in 2004, aims to advance research on the early detection, diagnosis, tracking, and treatment of AD and other forms of cognitive impairment. It has become a widely recognized and extensively studied collection of neuroimaging, clinical, and biomarker data related to AD. One crucial topic in the study of AD and other dementia diseases is understanding the relationship between cognitive decline and changes in brain structure. Previous studies have indicated that various brain regions are affected to different extents in AD (Bartos et al., Reference Bartos, Gregus, Ibrahim and Tintěra2019; Jones et al., Reference Jones, Barnes, Uylings, Fox, Frost, Witter and Scheltens2006). In our study, we integrate neuroimaging phenotypes comprising volumetric and cortical thickness measures derived from MRI data. These measures capture the structural characteristics of the brain across a moderate number of ROIs. We incorporate these neuroimaging phenotypes into a regression model and utilize the envelope technique to gain new insights into the relationship between brain structure and cognitive ability.
Clinical diagnostic tools, including neurological exams and cognitive and functional assessments, are believed to reflect the degree of cognitive impairment and monitor disease progression (Albert et al., Reference Albert, DeKosky, Dickson, Dubois, Feldman, Fox, Gamst, Holtzman, Jagust, Petersen, Snyder, Carrillo, Thies and Phelps2013). Therefore, we regard cognitive impairment as a latent variable denoted by
$\eta $
. We consider five different test scores from the ADNI study to assess cognitive impairment in various domains. The first test score is the Mini-Mental State Examination (MMSE), a widely used screening tool that assesses cognitive impairment from various cognitive domains, including orientation, attention and calculation, recall, language, and visuospatial abilities. The second test score is the AD Assessment Scale-Cognitive subscale 13 (ADAS13), which is specifically designed for use in clinical trials to measure cognitive changes associated with AD. The scoring of ADAS13 is more complex than that of MMSE. Higher ADAS13 scores indicate severe cognitive impairment. The third one is the Functional Assessment Questionnaire (FAQ), which evaluates an individual’s functional ability to perform activities of daily living such as dressing, bathing, grooming, eating, and managing finances. The final two test scores are derived from the Rey Auditory Verbal Learning Test (RAVLT), a neuropsychological test that assesses verbal learning and memory. Specifically, we selected two measures from RAVLT: RAVLT Immediate (RAVLT.immediate) and RAVLT percent forgetting (RAVLT.perc.forgetting). These two measures are selected since they capture different aspects of episodic memory and have been shown to be closely related to AD detection by previous research (Estévez-González et al., Reference Estévez-González, Kulisevsky, Boltes, Otermín and García-Sánchez2003; Moradi et al., Reference Moradi, Hallikainen, Hänninen and Tohka2017). To ensure consistency in interpreting the test scores, we reversed the MMSE and RAVLT.immediate scores so that higher values indicate a severe impairment in cognitive function. Additionally, all five test scores were standardized before conducting the analysis. These standardized test scores are denoted as
$\mathbf {Y} = (Y_{1}, \dots , Y_{5})^{T}$
, as the response variables in equation (8).
The MRI brain imaging data were preprocessed by UCSF using Freesurfer methods, which involved segmenting the T1 weighted images into small regions to define volumetric and cortical thickness values. Detailed parcellation and quality control (QC) guidelines can be found on the LONI-ADNI website. We followed the instructions provided by Szefer et al. (Reference Szefer, Lu, Nathoo, Beg and Graham2017), and Greenlaw et al. (Reference Greenlaw, Szefer, Graham, Lesperance and Nathoo2017) and include 56 derived neroimaging phenotypes. A description of these phenotypes can be found in Table 5. The log-transformed values of the volumetric or cortical thickness measures serve as the predictors
$\mathbf {X}=(X_1,\dots ,X_{56})^{T}$
in the structural envelope model.
Table 5 Imaging phenotypes defined as volumetric or cortical thickness measures of
${28 \times 2 = 56}$
ROIs

Note: Each phenotype in the table corresponds to the two phenotypes for the left and right hemispheres.
The original dataset from the ADNI-1 phase comprises clinical and imaging measurements from 800 participants, with missing data in both cognitive test scores and brain imaging data. After excluding missing data, a total of
$n=557$
observations are retained in the analysis. Among these observations, 169 individuals are diagnosed with cognitive normal (CN), 271 have mild cognitive impairment (MCI), and 117 are diagnosed with AD. We apply the proposed BESEM method to this dataset, using a flat prior similar to that described in Section 4.
The MCMC algorithm converges within the initial thousands of iterations, as shown by the trace plots depicted in Figure S4 in the Supplementary Material. Therefore, a total of 20000 samples are drawn from the conditional posterior density, with the first 8000 iterations discarded as the burn-in stage. The AIC, BIC, DIC, and AWE methods were used to determine the optimal dimension of the envelope structure. All these methods consistently indicate that the optimal dimension is
$m=1$
. Therefore, the subsequent estimation results using the BESEM method are based on the envelope model with a dimension of 1.
The point estimate of the factor loading
$\pmb {\Lambda }$
is
$( 1.000, 0.618, 0.787, 0.914, 0.730)^{T}$
, with a negligible standard error estimate for each element. The estimates of coefficients
$\pmb {\beta }$
(Est), along with their standard error estimates (SE), and 95% credible interval (CI) of the posterior samples, are displayed in Figure 2a and Table 6. Notably, the inferior lateral ventricle and lateral ventricle for both hemispheres are found to have a significant effect under the envelope structure, while the remaining ROIs exhibit minimal impact, with estimated coefficients close to zero. This finding aligns with previous research by Bartos et al. (Reference Bartos, Gregus, Ibrahim and Tintěra2019), which suggests that the inferior regions of both lateral ventricles have the most significant enlargements in individuals with AD.

Figure 2 Point and 95% interval estimates of each element of
$\pmb {\beta }$
for the ADNI study.
Note: x-axis: ID of each ROI, which aligns with the order in Table 5 (adjacent numbers represent the left hemisphere and right hemisphere, respectively). y-axis: Estimated value. Red short line: the value of the estimated coefficient. Grey rectangle: 95% CI.
Table 6 Point estimates (Est), Standard Error Estimates (SE), and 95% CIs of the elements in
$\pmb {\beta }$
by BESEM method in ADNI study

For comparison, we also present the results obtained using the standard SEM method in Figure 2b and Table 7. Akin to the simulation study, the standard SEM was estimated using a Bayesian approach on the subset of 557 subjects with complete measurements, ensuring a fair comparison with the proposed BESEM. In line with standard Bayesian SEM practices, we assigned conjugate normal-inverse-gamma priors to parameters in the measurement equation and vague normal-inverse-Wishart priors to parameters in the structural equation to derive Bayesian estimates. These results demonstrate a conspicuous deviation from those obtained using the envelope method. In this analysis, we exclude regions with point estimates close to zero and those with 95% CIs that include zero and identify several ROIs that exhibited significance in the regression model. The selected regions are marked with bold text in Table 7. The selected significant regions do not exhibit consistency between the two brain hemispheres. The fusiform and inferior temporal in the left hemisphere is shown to be more significant, which aligns with the notion proposed by Galton et al. (Reference Galton, Patterson, Graham, Lambon-Ralph, Williams, Antoun, Sahakian and Hodges2001). Meanwhile, we also found a significant impact on the hippocampus volume in the left hemisphere but not in the right hemisphere. This finding contradicts the conclusion in Barnes et al. (Reference Barnes, Scahill, Schott, Frost, Rossor and Fox2005) and Yang et al. (Reference Yang, Zhong, Zhou, Wei, Wang and Nie2017), which suggests a preferential association of the right hippocampus with AD. Such partial inconsistency with the existing literature is likely a consequence of multicollinearity among the covariates and overfitting to the noise or the immaterial information in the multivariate regression problem, thereby raises doubts about the validity of the results. It demonstrates that the standard SEM model may yield results lacking interpretability or mask some significant associations in the setting of complex covariates that can be projected to a reducing subspace. The estimation results of SEM with BLasso are also presented in Table S3 and Figure S5 in the Supplementary Material for a comprehensive comparison. A similar pattern to the standard SEM is observed. Moreover, in this dataset, the changes observed in different ROIs are not entirely independent, and a plain linear regression model may not adequately capture the interdependencies among the covariates in such cases, further contributing to noisy and potentially misleading results. In contrast, the BESEM method enhances the identification of important predictors and significantly improves estimation efficiency. It better handles the challenges posed by high-dimensional covariates and accounts for the interdependencies among ROIs, leading to more reliable and informative results.
Table 7 Point estimates (Est), Standard Error Estimates (SE), and 95% CI of the elements in
$\pmb {\beta }$
by standard SEM method in ADNI study

6 Conclusion and discussion
This article introduces a BESEM that incorporates a predictor envelope structure in the structural equation, in conjunction with a factor model, to achieve dimension reduction. The response variables are grouped into latent factors with a predetermined structure, allowing for dimension reduction of multiple responses in the factor model. In the structural equation, the concerned predictors are decomposed into two parts, the material part and the immaterial part. These two parts are independent, with the immaterial part containing no information relevant to the latent factor. The presence of the envelope structure, along with a small envelope dimension, can greatly reduce the number of unknown parameters, leading to improved estimation efficiency. The application of the proposed model to the ADNI dataset demonstrates its potential in identifying important covariates.
While there are still several areas of interest that warrant further investigation. First, there is a need to develop an efficient method for selecting the envelope dimension. Currently, we approach this as a model selection task and utilize information criteria to identify the optimal envelope structure, which can lead to overestimated envelope dimension. A potential solution to mitigate this limitation involves circumventing the selection of envelope dimension by developing a weighted average variant of the envelope estimator, using the information criterion values as potential weights (Eck & Cook, Reference Eck and Cook2017). Nonetheless, this approach may introduce theoretical and computational challenges, and we consider it as a promising future direction that warrants continuing effort. Besides, employing information criteria for envelope dimension can be time-consuming and computationally intensive. To address this, Zhang & Mai (Reference Zhang and Mai2018) proposed two unified approaches, full Grassmannian (FG) optimization and 1 dimension (1D) selection, which provide theoretical justifications for selection consistency and exhibit computational stability. Adapting this idea from a Bayesian perspective holds promise, and we aim to develop a method that can simultaneously select the dimension and perform estimation. Second, the substitution of the CFA model with an EFA model is worth exploring. This would relax the assumption of a predetermined number of latent factors and instead allow the factor loading matrix to be determined by both prior information and data. However, determining the number of latent factors can be challenging, especially when it is linked to predictors with an uncertain envelope structure. Third, intricate missing mechanisms can arise in multivariate regression while addressing missing data within envelope models is an ongoing area of research with few established methodologies. We see the accommodation of diverse missing mechanisms in the joint modeling of envelope and SEM as a potential for further exploration. Lastly, it is worth considering the possibility of incorporating the envelope structure into the multivariate variables (responses) in the measurement component of an SEM. Cook & Zhang (Reference Cook and Zhang2015b) introduced envelopes that simultaneously reduce predictors and responses in a multivariate linear regression model, leading to significant improvements over traditional methods. Therefore, we consider it a promising future direction to further extend the BESEM to incorporate an envelope structure for the multivariate responses in the measurement component. For instance, if we can appropriately separate the immaterial part of the response variables in the measurement equation and group the material part into the latent factors, then it is possible to handle high-dimensional response vectors in BESEM. Substantial efforts are required to achieve this advancement.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/psy.2025.10027.
Author contributions
C. W. and R. S. are co-first authors. They contributed equally to this work.
Funding statement
Dr. Feng’s research was fully supported by the National Natural Science Foundation of China (No. 72271060). Dr. Song’s research was fully supported by GRF Grant (No.14303622) from the Research Grant Council of the Hong Kong Special Administration Region.
Competing interests
The authors declare none.