1 Introduction
Multivariate analysis of psychological variables using graphical models has become a staple of the psychometric literature (Marsman & Rhemtulla, Reference Marsman and Rhemtulla2022). The two most popular graphical models for cross-sectional data in psychology are the Ising graphical model (IGM; Ising, Reference Ising1925) for networks of binary variables and the Gaussian graphical model (GGM; Dempster, Reference Dempster1972) for networks of continuous, normally distributed variables. Although popular, these graphical models do not fit the nature of most cross-sectional psychological data, which are typically scored on a Likert-scale and thus ordinal. To fill this gap, we proposed the ordinal Markov random field in earlier work (OMRF; Marsman et al., Reference Marsman, van den Bergh and Haslbeck2025). The IGM, GGM, and OMRF are Markov random fields (MRFs; Kindermann & Snell, Reference Kindermann and Snell1980), and their edge weights reflect the conditional dependence between pairs of variables in the network. Several approaches exist to estimate these network structures from empirical data (e.g., Borsboom et al., Reference Borsboom, Deserno, Rhemtulla, Epskamp, Fried, McNally and Waldorp2021), and the more recent Bayesian approaches have the unique advantage of being able to test whether a pair of variables in the network are conditionally independent or not (Sekulovski et al., Reference Sekulovski, Keetelaar, Haslbeck and Marsman2024; Williams & Mulder, Reference Williams and Mulder2020a).
In many network studies, the central research question is whether networks differ between two or more observed groups, such as people with a diagnosis vs. people without a diagnosis. There are several methods to test for group differences in network structure (see Haslbeck, Reference Haslbeck2022, for a recent review). The network comparison test (NCT; van Borkulo et al., Reference van Borkulo, van Bork, Boschloo, Kossakowski, Tio, Schoever and Waldorp2023) is a widely used permutation test for assessing parameter equivalence across groups in IGMs, GGMs, or mixed graphical models (MGMs; Haslbeck & Waldorp, Reference Haslbeck and Waldorp2020). The fused graphical lasso (FGL; Danaher et al., Reference Danaher, Wang and Witten2014) is an alternative approach that has an additional penalization parameter for group differences in GGMs, and Epskamp et al. (Reference Epskamp, Isvoranu and Cheung2022) proposed an iterative model search approach within an SEM framework. Recent work has also explored network comparisons at the level of latent dimensionality or community structure (e.g., Jamison et al., Reference Jamison, Christensen and Golino2024). All of these methods are based on frequentist approaches and as such cannot support the null hypothesis of equivalence, only reject it. Williams et al. (Reference Williams, Rast, Pericchi and Mulder2020) proposed a Bayesian approach to assessing group differences in GGMs based on the Bayes factor (Jeffreys, Reference Jeffreys1961; Kass & Raftery, Reference Kass and Raftery1995), which quantifies the strength of evidence for the competing hypotheses of parameter equivalence or invariance and parameter differences. An important advantage of a Bayes factor test is that it can support the null hypothesis of parameter equivalence, the evidence for the absence of a difference, but it can also separate this conclusion from a lack of support for either hypothesis, an absence of evidence. These advantages of the Bayes factor contribute to a robust methodology for network analysis (Huth et al., Reference Huth, de Ron, Goudriaan, Luigjes, Mohammadi, van Holst and Marsman2023), especially network comparisons. However, this type of analysis is not currently available for OMRFs.
Here, we address this problem by presenting an extension of the OMRF to two independent groups and developing a Bayesian methodology for assessing group differences in the estimated networks. Since the IGM is a special case of the OMRF, this also provides a test for group differences for the IGM. We develop a Bayesian methodology to assess parameter differences in the OMRF in a manner similar to how we test for mean differences in the Bayesian independent samples t test (Rouder et al., Reference Rouder, Speckman, Sun and Morey2009). Central to the Bayesian t test is the parameterization of the model so that the group difference is a parameter and this effect is separated from the overall mean. The Bayes factor test then pits the hypothesis that the group difference parameter is zero against the alternative hypothesis that the parameter is free to vary. Here, we apply this strategy to each of the OMRF parameters.
In a recent review of Bayesian methods for testing conditional independence, Sekulovski et al. (Reference Sekulovski, Keetelaar, Huth, Wagenmakers, van Bork, van den Bergh and Marsman2024) distinguished between two types of Bayes factors: the “single-model” and the “inclusion” Bayes factors. The single-model Bayes factor, typically derived via the Savage–Dickey density ratio (Dickey, Reference Dickey1971), requires strong assumptions about the presence or absence of other parameter differences when testing for a specific effect, and may be sensitive to violations of these assumptions (Sekulovski et al., Reference Sekulovski, Marsman and Wagenmakers2024). In contrast, the inclusion Bayes factor uses Bayesian model averaging (BMA; Hinne et al., Reference Hinne, Gronau, van den Bergh and Wagenmakers2020; Hoeting et al., Reference Hoeting, Madigan, Raftery and Volinsky1999) to account for model uncertainty by averaging over all possible configurations of parameter differences. This allows each effect to be tested while integrating over the uncertainty in the rest of the model. An important advantage of this approach is that it can directly account for multiplicity through the prior distribution on model space, something that the single-model Bayes factor cannot account for. We therefore base our methodology on the inclusion Bayes factor, extending its application from conditional independence testing (Marsman et al., Reference Marsman, van den Bergh and Haslbeck2025) to the problem of assessing group differences in OMRF parameters.
The remainder of this article is organized as follows. First, we provide a theoretical overview of the OMRF and its extension to account for group differences in Section 2. We also present new one- and two-group extensions for ordinal variables with a neutral response category in the Supplementary Material. Consistent with the Bayesian approach of Marsman et al. (Reference Marsman, van den Bergh and Haslbeck2025), we use the pseudolikelihood (Besag, Reference Besag1975) because the normalization constant of the OMRF is computationally intractable. We introduce the pseudoposterior and outline a Bayesian procedure for its estimation in Section 3. Then, in Section 4, we detail the Bayesian approach to hypothesis testing in the general case of the OMRF and, in particular, testing for group differences in model parameters. Throughout these two sections, we use numerical checks to verify that the methods work as intended. In additional simulations in Section 5, we compare the proposed approach with two existing methods for assessing group differences that must treat the ordinal data as Gaussian (i.e., use a GGM); the NCT and the single-model Bayes factor proposed by Williams et al. (Reference Williams, Rast, Pericchi and Mulder2020). Finally, we discuss how our method can contribute to more robust results in the network literature and suggest avenues for future research.
2 Markov random field graphical models for ordinal data
2.1 The ordinal MRF
The OMRF was independently studied by Anderson & Vermunt (Reference Anderson and Vermunt2000) and Suggala et al. (Reference Suggala, Yang, Ravikumar, Precup and Teh2017), and was later formalized as a graphical model for multivariate ordinal data by Marsman et al. (Reference Marsman, van den Bergh and Haslbeck2025). It generalizes the well-known IGM, which arises as a special case when each variable has only two response categories. In this section, we present the OMRF as a multivariate generalization of the adjacent category logit model for ordinal responses (Agresti, Reference Agresti2010, Reference Agresti2018).
Suppose we have a univariate ordinal variable
$ o \in \{l_0\text {, } \dots \text {, } l_m\} $
, where
$ l_0 < \dots < l_m $
are ordered response labels, and defining the corresponding category index
$ {x \in \{0\text {, } \dots \text {, } m\}} $
such that
$ o = l_x $
and equivalently
$ x = \sum _{c = 1}^m \mathcal {I}(o \geq l_c) $
. The adjacent category model characterizes the log-odds between successive categories as
$$\begin{align*}\log\left( \frac{p(x = c+1)}{p(x = c)} \right) = \eta_c, \quad c = 0, \dots, m-1. \end{align*}$$
To ensure that the model parameters are identifiable, we set
$ \eta _0 = 0 $
. The resulting probability mass function is given by
$$\begin{align*}p(x = c) = \frac{\exp\left( \sum_{k=c}^m \eta_k \right)}{1 + \sum_{j=1}^m \exp\left( \sum_{k=j}^m \eta_k \right)}, \end{align*}$$
which defines a member of the exponential family with canonical parameters
$ \eta _c $
and sufficient statistics
$ \mathcal {I}(x \geq c) = \mathcal {I}(o \geq l_c) $
.
Suppose now that we have
$ p $
ordinal variables
$ o_i \in \{l_0\text {, } \dots \text {, } l_m\} $
, for
$ i = 1\text {, }\dots \text {, } p $
, and defining the corresponding category indices
$ x_i \in \{0\text {, } \dots \text {, } m\} $
Footnote 1 such that
$ o_i = l_{x_i} $
and equivalently
$ x_i = \sum _{c = 1}^m \mathcal {I}(o_i \geq l_c) $
. We can regress a given variable
$ x_i $
on the others by translating the canonical parameters into a linear model
where
$ \alpha _c $
are category-specific intercepts and
$ \beta _j $
are common slopes. The corresponding adjacent category model is
$$\begin{align*}p(x_i = c \mid \mathbf{x}^{(i)}) = \frac{\exp\left( \sum_{k=c}^m \alpha_k + c \sum_{j \neq i} \beta_j x_j \right)}{1 + \sum_{r=1}^m \exp\left( \sum_{k=r}^m \alpha_k + r \sum_{j \neq i} \beta_j x_j \right)}, \end{align*}$$
where
$ \mathbf {x}^{(i)} = (x_1\text {, }\dots \text {, }x_{i-1}\text {, }x_{i+1}\text {, }x_p) \in \{0\text {, }\dots \text {, }m\}^{p-1} $
denotes the vector of all variables except
$ x_i $
.
The formulation of the adjacent category model above depends on the ordinal variables only through the sufficient statistics
$ \mathcal {I}(x_i \geq c) = \mathcal {I}(o_i \geq l_c) $
and the interaction terms
$ x_i x_j = \sum _{c = 1}^m \sum _{c^\prime = 1}^m \mathcal {I}(o_i \geq l_c) \mathcal {I}(o_j \geq l_{c^\prime }) $
. Because these terms are computed from cumulative indicators, the model captures only the ordering of the categories, without relying on any numeric interpretation or spacing of the labels. It is therefore invariant under monotonic recodings of the ordinal variables and does not assume interval-scale measurement. Although the model represents ordinal categories using integer-valued statistics, this encoding reflects only the order, not the scale, of the responses. In contrast to this, commonly used ordinal regression models, such as the cumulative logit and continuation ratio models (Agresti, Reference Agresti2010), lack sufficient statistics that map ordinal responses to interpretable indices. Moreover, as shown by Suggala et al. (Reference Suggala, Yang, Ravikumar, Precup and Teh2017), these models do not correspond to full-conditionals of a joint distribution, and are therefore not compatible with a multivariate MRF formulation.
The adjacent category model, on the other hand, does yield consistent full-conditionals and corresponds to the following joint probability model over
$ p $
ordinal variables (Suggala et al., Reference Suggala, Yang, Ravikumar, Precup and Teh2017):
$$ \begin{align} p(\mathbf{x}) = \frac{\exp \left(\sum_{i = 1}^p \sum_{c = 1}^{m} \mu_{ic}\, \mathcal{I}(x_i = c) + 2 \sum_{i < j} \theta_{ij} x_i x_j \right)}{\sum_{\mathbf{x}} \exp \left(\sum_{i = 1}^p \sum_{c = 1}^{m} \mu_{ic}\, \mathcal{I}(x_i = c) + 2 \sum_{i < j} \theta_{ij} x_i x_j \right)}, \end{align} $$
where
$ \sum _{i < j} $
denotes
$ \sum _{i=1}^{p-1} \sum _{j=i+1}^p $
, and the denominator sums over all possible realizations of
$ {\mathbf {x} \in \{0, \dots , m\}^p }$
. In this formulation, the category threshold parameters
$ \mu _{ic} = \sum _{k = c}^m \alpha _k $
for variable
$ x_i $
are derived from the intercepts in the adjacent category model. They represent the relative tendency to respond in category
$ c $
over the baseline category
$ 0 $
(because we fixed
$ \mu _{i0} = 0 $
), with larger values indicating a greater tendency of responding in category
$ c $
. Comparisons between
$ \mu _{ic} $
and
$ \mu _{ic'} $
reflect relative preference among non-baseline categories. The interaction parameters
$ \theta _{ij} $
are symmetric and correspond to the regression weights, with
$ \theta _{ij} = \tfrac {1}{2} \beta _j $
from the model regressing
$ x_i $
on
$ x_j $
, or equivalently,
$ \theta _{ij} = \tfrac {1}{2} \beta _i $
from the model regressing
$ x_j $
on
$ x_i $
. These parameters capture the strength of the partial association between variables
$ x_i $
and
$ x_j $
, reflecting the conditional dependence that remains after accounting for all other variables in the network.
2.2 Extending ordinal MRFs to two independent samples
We now consider the scenario where we have ordinal data from two groups. The goal of this article is to assess if and how the networks of ordinal variables differ between the two groups. The OMRF is characterized by two types of parameters, the category thresholds
$\mu $
and the pairwise interactions
$\theta $
, and thus we want to assess the differences in terms of these two parameters. In order to assess whether there are differences in these two parameters between the two groups in a Bayesian framework, it is ideal if we can explicitly parameterize these differences. In this section, we formulate these parameterizations.
We model the differences in category thresholds between the two groups by reformulating the category thresholds as follows:
$$ \begin{align} \mu_{ic} = \begin{cases} \lambda_{ic} - \frac{1}{2}\epsilon_{ic} & \text{for Group 1}\\ \lambda_{ic} + \frac{1}{2}\epsilon_{ic} & \text{for Group 2}, \end{cases} \end{align} $$
where
$\lambda _{ic}$
is the overall category threshold, i.e., the category threshold we would find if the data from the two groups were combined and analyzed as a single data set. The parameter
$\epsilon _{ic}$
is then the threshold difference effect, i.e., the category threshold for Group 2 minus that for Group 1. If it is zero, there is no difference in the category thresholds between the groups, and the category threshold parameter in each group is equal to the overall threshold parameter
$\lambda _{ic}$
. However, if the threshold difference effect is not zero, there is a difference in the category thresholds between the two groups.
We model the difference in the pairwise interaction between the two groups in a similar way, reformulating the pairwise interactions as follows:
$$ \begin{align} \theta_{ij} = \begin{cases} \phi_{ij} - \frac{1}{2}\delta_{ij} & \text{for Group 1}\\ \phi_{ij} + \frac{1}{2}\delta_{ij} & \text{for Group 2}, \end{cases} \end{align} $$
where
$\phi _{ij}$
is the overall pairwise interaction, i.e., the pairwise interaction we would find if the data from the two groups were combined and analyzed as a single data set. The parameter
$\delta _{ij}$
is then the pairwise difference effect. If it is zero, there is no difference in the pairwise interactions between the groups, and the pairwise interaction in each group is equal to the overall pairwise interaction
$\phi _{ij}$
. However, if the pairwise difference effect is not zero, there is a difference in the pairwise interactions between the two groups.
Note that the choice of the factor
$ \tfrac {1}{2} $
in Eqs. (2) and (3) serves to evenly distribute the group difference parameter across both groups. This parameterization defines
$ \epsilon $
and
$ \delta $
as contrasts between Group 2 and Group 1 (e.g.,
$ \delta = (\phi + \tfrac {1}{2} \delta ) - (\phi - \tfrac {1}{2} \delta ) $
), such that their sum or average equals zero. As a result, the shared or overall effects
$ \lambda $
and
$ \phi $
are defined as the average across the two groups (e.g.,
$ \phi = \tfrac {1}{2}(\phi + \tfrac {1}{2} \delta ) + \tfrac {1}{2}(\phi - \tfrac {1}{2} \delta ) $
). While alternative parameterizations are possible, such as those using projection matrices to handle more than two groups (e.g., Rouder et al., Reference Rouder, Morey, Speckman and Province2012), our focus here is on the two-group case. This provides a natural and interpretable foundation that can be extended in future work to more general group structures. Figure 1 illustrates the approach to modeling the pairwise interactions of a three-variable network in two groups.

Figure 1 This illustrates how the pairwise interactions and their differences are modeled for the two groups.
Note: The top panel shows the average size of the pairwise interactions,
$\phi $
, that would be obtained if grouping were ignored. The bottom two panels show the pairwise interactions for the two groups. Note that there is no group difference in the interaction between variables one and two (
$\delta _{12} = 0$
), and the group parameters are equal to the average,
$\phi _{12}$
. In contrast, the interaction between variables one and three switches sign and is here fully dictated by the difference effect,
$\delta _{13}$
. Finally, the interaction effect between variables two and three differs in size. Group A has a smaller interaction effect, and group B has a larger interaction effect.
We now express the OMRF separately for each group, using the parameterizations in Eqs. (2) and (3). Let
$\mathbf {x}$
denote the response vector for a participant in Group 1, and
$\mathbf {y}$
for a participant in Group 2. The joint distribution of the ordinal variables in each group is then given by
$$ \begin{align} p(\mathbf{x}) &= \frac{\exp \left(\sum_{i = 1}^p \sum_{c = 1}^{m} \left(\lambda_{ic} - \tfrac{1}{2} \epsilon_{ic}\right) \mathcal{I}(x_{i} = c) + 2 \sum_{i<j} x_{i}x_{j} \left(\phi_{ij} - \tfrac{1}{2} \delta_{ij} \right) \right)}{\sum_{\mathbf{x}} \exp \left(\sum_{i = 1}^p \sum_{c = 1}^{m} \left(\lambda_{ic} - \tfrac{1}{2} \epsilon_{ic}\right) \mathcal{I}(x_{i} = c) + 2 \sum_{i<j} x_{i}x_{j} \left(\phi_{ij} - \tfrac{1}{2} \delta_{ij} \right) \right) }, \end{align} $$
$$ \begin{align} p(\mathbf{y}) &= \frac{\exp \left(\sum_{i = 1}^p \sum_{c = 1}^{m} \left(\lambda_{ic} + \tfrac{1}{2} \epsilon_{ic}\right) \mathcal{I}(y_{i} = c) + 2 \sum_{i<j} y_{i}y_{j} \left(\phi_{ij} + \tfrac{1}{2} \delta_{ij} \right) \right)}{\sum_{\mathbf{y}} \exp \left(\sum_{i = 1}^p \sum_{c = 1}^{m} \left(\lambda_{ic} + \tfrac{1}{2} \epsilon_{ic}\right) \mathcal{I}(y_{i} = c) + 2 \sum_{i<j} y_{i}y_{j} \left(\phi_{ij} + \tfrac{1}{2} \delta_{ij} \right) \right) }. \end{align} $$
These group-specific formulations encode between-group differences explicitly through the parameters
$\epsilon _{ic}$
(for category thresholds) and
$\delta _{ij}$
(for pairwise interactions). When both are zero, the models reduce to a shared structure defined by the common parameters
$\lambda _{ic}$
and
$\phi _{ij}$
. We prove the identifiability of the two-group OMRF in Appendix A.2.
3 Estimation
We adopt a Bayesian approach to estimate the parameters of the two-group OMRF introduced in Section 2. This extends the estimation framework proposed by Marsman et al. (Reference Marsman, van den Bergh and Haslbeck2025) for a single group to the case of two independent samples.
3.1 Data and likelihood
Let
$n_1$
and
$n_2$
denote the number of individuals in Group 1 and Group 2, respectively. Let
${\mathbf {X} \in \{0, \dots , m\}^{n_1 \times p}}$
and
$\mathbf {Y} \in \{0, \dots , m\}^{n_2 \times p}$
be the observed response matrices, where each row corresponds to a response vector
$\mathbf {x}$
or
$\mathbf {y}$
from an individual in the respective group. We assume that individuals are independent within and between groups. Under these assumptions, the likelihood of the observed data is given by
where
$p(\mathbf {x}_v\mid \cdot )$
and
$p(\mathbf {y}_v\mid \cdot )$
are the group-specific models defined in Eqs. (4) and (5).
3.1.1 Likelihood intractability and the pseudolikelihood approximation
The likelihood in Eq. (6) is computationally intractable because it involves normalization constants that are prohibitively expensive to compute. Each group-specific model (Eqs. (4) and (5)) requires summing over all possible configurations of the p ordinal variables, each taking
$m + 1$
possible values. As a result, the normalization constant for each group involves
$(m+1)^p$
terms. For example, with just
$p = 10$
variables rated on a 5-point Likert scale (
$m = 4$
), each normalization constant requires summing over
$9\text {,}765\text {,}625$
combinations. Since the normalization constants must be evaluated repeatedly during MCMC sampling, this renders full-likelihood-based inference computationally infeasible for even modestly sized networks.
While several methods have been proposed to circumvent the need to compute normalization constants in models with intractable likelihoods (for a review, see Park & Haran, Reference Park and Haran2018), the pseudolikelihood approach introduced by Besag (Reference Besag1975) remains the most widely used strategy in network psychometrics. It is a special case of the broader class of composite likelihood methods, which combine low-dimensional marginal or conditional components to approximate the full likelihood (Varin et al., Reference Varin, Reid and Firth2011). Its computational efficiency and strong empirical performance in structure selection have also contributed to its growing popularity in Bayesian modeling (e.g., Marsman et al., Reference Marsman, van den Bergh and Haslbeck2025; Mohammadi et al., Reference Mohammadi, Schoonhoven, Vogels and Birbil2023; Vogels et al., Reference Vogels, Mohammadi, Schoonhoven and Birbilin press). The pseudolikelihood replaces the full multivariate distribution, such as the distribution of the ordinal response vector
$\mathbf {x}$
in Eq. (4), with the product of its full conditionals:
$$ \begin{align*} p(\mathbf{x}) &\approx \prod_{i=1}^p p^\ast(\mathbf{x}_i \mid \mathbf{x}_i^{(i)})\\ &=\prod_{i=1}^p \frac{\exp\left(\sum_{c=1}^{m}\,(\lambda_{ ic} - \tfrac{1}{2}\epsilon_{ i c})\,\mathcal{I}(x_i = c) + x_i\,\sum_{j\neq i}(\phi_{ij} - \tfrac{1}{2}\delta_{ij})\, x_j\right)}{1 + \sum_{u=1}^{m}\exp\left(\lambda_{ iu} - \tfrac{1}{2}\epsilon_{ i u} + u\,\sum_{j\neq i}(\phi_{ij} - \tfrac{1}{2}\delta_{ij})\, x_j\right)}\\\ &= \frac{\exp\left(\sum_{i=1}^p\sum_{c=1}^{m}\left(\lambda_{ ic} - \tfrac{1}{2}\epsilon_{ i c}\right)\,\mathcal{I}(x_i = c) + 2\sum_{i<j}x_ix_j\left(\phi_{ij} - \tfrac{1}{2}\delta_{ij}\right) \right)}{\prod_{i=1}^p \left\lbrace1 + \sum_{u=1}^{m}\exp\left(\lambda_{ iu} - \tfrac{1}{2}\epsilon_{ i u} + u\,\sum_{j\neq i}(\phi_{ij} - \tfrac{1}{2}\delta_{ij})\, x_j\right)\right\rbrace}. \end{align*} $$
This approximation preserves the structure of the full model but replaces its intractable normalization constant
$\text {Z}_1$
with a product of tractable, one-dimensional normalization factors. For example, in the ten-variable network example with five ordinal categories (
$m = 4$
), the full normalization constant contains
$9\text {,}765\text {,}625$
terms, while the pseudolikelihood normalization contains only
$50$
terms. This dramatic reduction in computation makes the pseudolikelihood particularly attractive for inference in high-dimensional settings.
Although the pseudolikelihood is a relatively crude approximation to the full likelihood, the generalized posterior it induces remains consistent (Miller, Reference Miller2021). For both the IGM and the OMRF, it has been shown that the pseudolikelihood introduces no additional bias beyond that already present in the full likelihood (Keetelaar et al., Reference Keetelaar, Sekulovski, Borsboom and Marsman2024; Marsman et al., Reference Marsman, van den Bergh and Haslbeck2025). Moreover, multiple studies have demonstrated that pseudolikelihood-based graph recovery is consistent in high-dimensional settings, where both the number of variables and observations grow (e.g., Barber & Drton, Reference Barber and Drton2015; Ravikumar et al., Reference Ravikumar, Wainwright and Lafferty2010). These results also extend to Bayesian settings (Csiszár & Talata, Reference Csiszár and Talata2006; Pensar et al., Reference Pensar, Nyman, Niiranen and Corander2017). However, despite its favorable theoretical properties, the pseudolikelihood is known to underestimate standard errors (e.g., Keetelaar et al., Reference Keetelaar, Sekulovski, Borsboom and Marsman2024), and consequently, posterior variances. As a result, edge selection procedures may become overly sensitive to small, spurious effects.
Replacing the full likelihood in Eq. (6) with the pseudolikelihood yields a pseudoposterior distribution, which we use for inference:
Here,
$p^\ast (\mathbf {X}\text {, } \mathbf {Y} \mid \cdot )$
denotes the pseudolikelihood defined by the product of full conditionals for each group. Although this is not a proper posterior in the strict Bayesian sense, theoretical results show that the pseudoposterior remains consistent under mild conditions (Miller, Reference Miller2021; Pauli et al., Reference Pauli, Racugno and Ventura2011; Ribatet et al., Reference Ribatet, Cooley and Davison2012). Bayesian inference under the pseudoposterior requires specifying prior distributions for the model parameters, which we outline in the next section.
3.2 Prior specification
The two-group OMRF has two types of parameters. The category threshold
$\lambda _{ic}$
and the pairwise interaction parameters
$\phi _{ij}$
that would be obtained if the data sets from the two groups were combined. If the only goal of the analysis is to test group differences, these parameters are nuisance, and only the threshold difference parameters
$\epsilon _{ic}$
and the pairwise difference parameters
$\delta _{ij}$
are of interest.
We follow Marsman et al. (Reference Marsman, van den Bergh and Haslbeck2025) for the specification of prior distributions for the nuisance parameters. Independent beta-prime distributions are specified on exponentially transformed category threshold parameters
$\eta _{ic} = \exp (\lambda _{ic})$
, or equivalent,
$$\begin{align*}p(\lambda_{ic}) = \frac{1}{\text{B}(\text{a}\text{, }\text{b})} \frac{\exp(\text{a}\, \lambda_{ic})}{\left(1+\exp(\lambda_{ic})\right)^{\text{a}+\text{b}}}. \end{align*}$$
In line with the prior setup of Marsman et al. (Reference Marsman, van den Bergh and Haslbeck2025), independent Cauchy distributions with a scale of
$2.5$
are specified for the pairwise interaction parameters.
We also use independent Cauchy distributions for the two sets of difference parameters. This choice lays the foundation for our testing procedure, which we describe later, and which includes a variable selection procedure based on independent spike and slab prior distributions. The Cauchy distribution is a popular candidate for testing because of its tail behavior. We will use a scale of
$1.0$
.
3.3 A Gibbs sampling procedure for the pseudoposterior distribution
The full Bayesian framework is approximated by combining the pseudolikelihood with the prior distributions specified in the previous section, yielding a pseudoposterior distribution over the model parameters. As this distribution is not available in closed form, we use numerical methods to approximate it.
Specifically, we employ a Gibbs sampling algorithm (Geman & Geman, Reference Geman and Geman1984) to draw dependent samples from the joint pseudoposterior of the two-group OMRF. At each iteration, the algorithm cycles through updates of the four parameter blocks—
$\boldsymbol {\lambda }$
,
$\boldsymbol {\epsilon }$
,
$\boldsymbol {\phi }$
, and
$\boldsymbol {\delta }$
—by drawing from their full conditional pseudoposterior distributions. Since these conditional distributions are not of a known form, we use Metropolis–Hastings (Tierney, Reference Tierney1994) algorithms to simulate from them. The general procedure is summarized in Algorithm 1, with full technical details provided in Appendix B.

This algorithm is implemented in the bgmCompare() function of version 0.1.4.2 of the bgms R package (Marsman et al., Reference Marsman, Arena, Huth, Sekulovski and van den Bergh2024), and serves as the computational backbone of our hypothesis testing framework. In bgms, the Gibbs sampler runs in two stages. The burn-in phase (default:
$\text {T}_{\text {burnin}} = 500$
iterations) initializes parameters at zero, tunes proposal variances (initially set to one), and guides the chain toward regions of high posterior density. This is followed by the main sampling phase (default:
$\text {T}_{\text {main}} = 10,000$
iterations), during which the samples are collected for inference. Both parameter values and proposal variances are carried over from the burn-in stage.
Numerical Check I in Appendix C.1 shows that the MwG procedure is correctly implemented in bgms, and that its posterior estimates agree with those from an alternative rstan implementation.
4 Hypothesis testing
4.1 Bayes factor tests for conditional independence in ordinal MRFs
A key advantage of the OMRF is that hypotheses about the conditional dependence and independence of ordinal variables i and j can be tested via their partial association
$\theta _{ij}$
. Specifically,
$\mathcal {H}_0\colon \theta _{ij} = 0$
indicates conditional independence, and
$\mathcal {H}_{1}\colon \theta _{ij} \neq 0$
indicates conditional dependence. We use the Bayes factor (Jeffreys, Reference Jeffreys1961; Kass & Raftery, Reference Kass and Raftery1995), which is defined as the change in beliefs about the relative plausibility of the two hypotheses before and after observing the data:
$$ \begin{align} \underbrace{\frac{p(\mathcal{H}_1 \mid \text{data})}{p(\mathcal{H}_0 \mid \text{data})}}_{\text{Posterior Odds}} = \underbrace{\frac{p(\text{data}\mid \mathcal{H}_1)}{p( \text{data} \mid \mathcal{H}_0)}}_{\text{BF}_{10}} \times \underbrace{\frac{p(\mathcal{H}_1 )}{p(\mathcal{H}_0 )}}_{\text{Prior Odds}}. \end{align} $$
The term on the left, the posterior odds ratio, conveys the relative plausibility of the two competing hypotheses after viewing the data. The formula above shows that the posterior odds can be expressed as the product of two factors: The Bayes factor,
$\text {BF}_{10}$
, which indicates the relative support for the two hypotheses in the data at hand, and the prior odds, the relative plausibility of the two hypotheses before having seen the data.
The subscripts in the Bayes factor notation indicate the direction in which the support is expressed.
$\text {BF}_{01}$
indicates the relative support for
$\mathcal {H}_0$
over
$\mathcal {H}_1$
, and
$\text {BF}_{10}$
indicates the relative support for
$\mathcal {H}_1$
over
$\mathcal {H}_0$
: Note that the Bayes factor
$\text {BF}_{01}$
is the reciprocal of
$\text {BF}_{10}$
, i.e.,
. Values greater than
$1$
indicate that there is relative support for
$\mathcal {H}_0$
in the data, while values less than
$1$
indicate relative support for
$\mathcal {H}_1$
. When the Bayes factor is
$1$
, both hypotheses predict the data equally well.
4.1.1 Bayes factor test for conditional dependence
Marsman et al. (Reference Marsman, van den Bergh and Haslbeck2025) introduced binary indicator variables
$\gamma _{ij}$
(as commonly used in Bayesian variable selection; see, e.g., Dellaportas et al., Reference Dellaportas, Forster and Ntzoufras2002; George & McCulloch, Reference George and McCulloch1993; Kuo & Mallick, Reference Kuo and Mallick1998) to explicitly model the presence or absence of an association between variables i and j, and used them to define the following spike-and-slab prior on the partial association
$\theta _{ij}$
:
Here,
$\mathbf {1}_{a}(x)$
is an indicator function that restricts
$\theta _{ij}$
to the appropriate region of the parameter space, conditional on the value of
$\gamma _{ij}$
(this formulation is due to Gottardo & Raftery, Reference Gottardo and Raftery2008). When
$\gamma _{ij} = 1$
, the association is considered present, and
$\theta _{ij}$
is treated as a free parameter with a continuous prior density (e.g., a Cauchy distribution). In contrast, when
$\gamma _{ij} = 0$
, the association is absent, and the parameter is fixed to zero.
Because the indicator variable
$\gamma _{ij}$
directly encodes whether a partial association
$\theta _{ij}$
is present or absent, hypotheses about conditional dependence between variables i and j can be naturally expressed in terms of
$\gamma _{ij}$
:
$$ \begin{align*} \mathcal{H}_0\colon \gamma_{ij} &= 0 \Leftrightarrow \theta_{ij} = 0 \quad (\text{Variables } i \text{ and } j \text{ are conditionally independent}),\\ \mathcal{H}_1\colon \gamma_{ij} &= 1 \Leftrightarrow \theta_{ij} \neq 0 \quad (\text{Variables } i \text{ and } j \text{ are conditionally dependent}). \end{align*} $$
In this formulation, testing for conditional independence reduces to testing whether
$\gamma _{ij}$
equals zero.
From Eq. (7), we see that the Bayes factor can be defined as the ratio of the posterior odds to the prior odds of the competing hypotheses:
$$\begin{align*}\mathbf{BF}_{10} = \frac{p(\mathcal{H}_1 \mid \text{data})}{ p(\mathcal{H}_0 \mid \text{data})} \Bigg / \frac{p(\mathcal{H}_1)}{ p(\mathcal{H}_0)}. \end{align*}$$
In the current context, where the hypotheses are defined in terms of the indicator variable
$\gamma _{ij}$
, the Bayes factor becomes:
$$\begin{align*}\text{BF}_{10} = \frac{p(\gamma_{ij} = 1\mid \text{data})}{ p(\gamma_{ij} = 0\mid \text{data})} \Bigg / \frac{p(\gamma_{ij} = 1)}{ p(\gamma_{ij} = 0)}, \end{align*}$$
i.e., the posterior odds of including the association divided the prior odds.
Marsman et al. (Reference Marsman, van den Bergh and Haslbeck2025) proposed the use of Bernoulli
$(\pi )$
and Beta
$(\alpha \text {, }\beta )$
-Bernoulli priors on the indicator variables, and developed an MCMC algorithm to estimate the joint posterior distribution of OMRF parameters and indicator variables. This framework provides the basis for computing the inclusion Bayes factor and has recently been extended with a stochastic block prior on the indicator variables to model network clustering (Sekulovski et al., Reference Sekulovski, Arena, Haslbeck, Huth, Friel and Marsman2025). Note that when using the Bernoulli
$(0.5)$
or the Beta
$(1\text {, }1)$
-Bernoulli prior, the prior odds are equal to one, and the Bayes factor simplifies to the posterior odds of edge inclusion.
The posterior inclusion probability for the association between variables i and j corresponds to the (marginal) posterior expectation of the indicator variable
$\gamma _{ij}$
:
where the sum is over all
$2^{p(p-1)/2}$
possible network structures, i.e., configurations of the vector of indicator variables
$\boldsymbol {\gamma }$
.
Since the exact sum is computationally infeasible except for small networks, we estimate it using posterior samples. Let
$\text {T}$
denote the number of post-burnin MCMC samples, and let
$\gamma ^{\text {t}}$
, for
$\text {t} = 1\text {, }\dots \text {, }\text {T}$
, be the sampled values of
$\gamma _{ij}$
. The posterior inclusion probability is then approximated by the Monte Carlo estimate:
$$\begin{align*}\mathbb{E}(\Gamma_{ij} \mid \text{data}) \approx \bar{\gamma} = \frac{1}{\text{T}}\sum_{\text{t}=1}^{\text{T}} \gamma^{\text{t}}, \end{align*}$$
that is, the proportion of posterior samples in which
$\gamma _{ij} =1$
.
4.2 Bayes factor tests for group differences in ordinal MRFs
The Bayes factor approach to testing for conditional independence carries over seamlessly to testing for differences in the parameters between two groups. We first consider Bayes factor tests for differences in the category threshold parameters, and then for differences in the pairwise interactions.
4.2.1 Group differences in category threshold parameters
We assess group differences in category threshold parameters by testing, for each variable in the network, whether the entire vector of group differences is zero. Formally, the null hypothesis states that all category thresholds for a given variable are invariant across groups—
$\mathcal {H}_0\colon \boldsymbol {\epsilon }_{i.} = \boldsymbol {0}$
—while the alternative allows for differences in one or more thresholds—
$\mathcal {H}_1\colon \boldsymbol {\epsilon }_{i.} \neq \boldsymbol {0}$
. Testing the full vector jointly reduces the number of comparisons and reflects that changes in one threshold typically affect others. That is, due to their cumulative interpretation, a difference in one threshold typically implies a shift in the overall response distribution, making it likely that one or more of the remaining thresholds for that variable will also differ between groups.
Because threshold parameters correspond to specific response categories, their estimation requires that each category be observed in both groups. If a category is present in one group but entirely absent in the other, the corresponding threshold—and hence its group difference—cannot be identified. To address this, bgms offers two modeling strategies. The first strategy is to estimate the thresholds independently in each group without testing for differences. This approach is selected by setting the main_difference_model option to “Free.” In this case, unobserved categories in one group are collapsed, but the other group’s thresholds remain unaffected. For example, if variable i has categories
$0$
,
$1$
, and
$2$
, but category
$0$
is unobserved in Group 1, then Group 1’s responses are recoded as
$1 \rightarrow 0$
and
${2 \rightarrow 1}$
, resulting in one threshold estimate for Group 1 and two for Group 2. The second strategy is to ensure comparability by collapsing categories symmetrically across groups when a category is unobserved in either one. This is selected using the main_difference_model option set to “Collapse.” In this case, the unobserved category is collapsed into the next lower category (or the next higher one if the lowest category is missing). If a category is unobserved in both groups, it is automatically collapsed.
Building on the framework used for testing conditional dependence, we introduce binary indicator variables to explicitly model the presence or absence of group differences in the category threshold parameters. Specifically, for each variable i, we define a vector of group difference parameters
$\boldsymbol {\epsilon }_{i}$
and use an indicator
$\gamma _{ii}$
to define the spike-and-slab prior on the vector of threshold differences
where m is the number of category thresholds for variable i. The indicator function
$\mathbf {1}_{a}(x)$
again ensures that
$\boldsymbol {\epsilon }_{i}$
lies in the appropriate region of the parameter space, conditional on the value of
$\gamma _{ii}$
. When
$\gamma _{ii} = 1$
, at least one threshold differs between groups, and the differences are treated as free parameters with continuous prior densities (e.g., a Cauchy distribution). In contrast, when
$\gamma _{ii} = 0$
, the thresholds are invariant, and the difference vector is fixed to zero. Accordingly, our hypotheses about group differences or equivalence can be formulated in terms of
$\gamma _{ii}$
:
$$ \begin{align*} \mathcal{H}_0\colon \gamma_{ii} &= 0 \Leftrightarrow \boldsymbol{\epsilon}_{i.} = \boldsymbol{0} \quad (\text{The category thresholds are invariant}),\\ \mathcal{H}_1\colon \gamma_{ii} &= 1 \Leftrightarrow \boldsymbol{\epsilon}_{i.} \neq \boldsymbol{0} \quad (\text{The category thresholds differ between the groups}). \end{align*} $$
Testing for
$\gamma _{ii} = 0$
is thus equivalent to testing for parameter invariance of the category thresholds for variable i.
The inclusion Bayes factor quantifies how the data update our beliefs about the presence of a group difference in the category thresholds. Expressed in terms of the difference indicator
$\gamma _{ii}$
, it takes the form:
$$\begin{align*}\text{BF}_{10} = \frac{p(\gamma_{ii} = 1\mid \text{data})}{ p(\gamma_{ii} = 0\mid \text{data})} \Bigg / \frac{p(\gamma_{ii} = 1)}{ p(\gamma_{ii} = 0)}, \end{align*}$$
that is, the ratio of posterior to prior odds for including a difference in the thresholds for variable i. The corresponding posterior inclusion probabilities can be estimated from the Gibbs sampler described in Algorithm 2. A detailed discussion of the prior specification for inclusion indicators is provided in the prior specification paragraph below.
4.2.2 Group differences in pairwise interaction parameters
We test for group differences in the pairwise interaction parameters. For each pair of variables
$(i, j)$
, the null hypothesis assumes no difference in the partial association between the groups—
$\mathcal {H}_0\colon \delta _{ij} = 0$
—while the alternative hypothesis suggests that there is a difference—
$\mathcal {H}_1\colon \delta _{ij} \neq 0$
.
In line with the approach used for testing for group differences in the category thresholds, we introduce
${p \choose 2}$
binary indicator variables
$\gamma _{ij}$
, each corresponding to a specific pairwise interaction, to use them to model the presence or absence of a group difference in the association between variables i and j, and use them to define the following spike-and-slab prior on the difference parameter
$\delta _{ij}$
:
Here,
$\gamma _{ij} = 1$
indicates that there is a non-zero difference between groups and assigns
$\delta _{ij}$
a continuous prior (e.g., a Cauchy distribution), while
$\gamma _{ij} = 0$
fixes the difference to zero, reflecting invariance. This formulation enables us to cast our hypotheses about group differences or equivalence in terms of the inclusion indicators:
$$ \begin{align*} \mathcal{H}_0\colon \gamma_{ij} &= 0 \Leftrightarrow \delta_{ij} = {0} \quad (\text{The pairwise interaction is invariant}),\\ \mathcal{H}_1\colon \gamma_{ij} &= 1 \Leftrightarrow \delta_{ij} \neq 0 \quad (\text{The pairwise interaction differs between groups}). \end{align*} $$
Expressed in terms of the difference indicator
$\gamma _{ij}$
, the inclusion Bayes factor for testing whether there is a group difference in the interaction between variables i and j is given by
$$\begin{align*}\text{BF}_{10} = \frac{p(\gamma_{ij} = 1\mid \text{data})}{ p(\gamma_{ij} = 0\mid \text{data})} \Bigg / \frac{p(\gamma_{ij} = 1)}{ p(\gamma_{ij} = 0)}, \end{align*}$$
that is, the ratio of posterior to prior odds of including a difference in the pairwise interaction. The corresponding posterior inclusion probabilities can be estimated from the Gibbs sampler described in Algorithm 2.
Prior specification for difference indicators In bgms, prior distributions for inclusion indicators can be specified independently for category threshold differences (
$\gamma _{ii}$
) and pairwise interaction differences (
$\gamma _{ij}$
). This separation allows researchers to express different prior beliefs about group differences in marginal distributions (thresholds) versus conditional associations (interactions). The specification of such priors and their influence on model-based inference has been studied in recent work on sensitivity analysis for Bayesian graphical models (Sekulovski et al., Reference Sekulovski, Keetelaar, Haslbeck and Marsman2024). By default, bgms uses Bernoulli priors with fixed inclusion probabilities:
$\pi _\epsilon = 0.5$
for threshold differences and
$\pi _\delta = 0.5$
for interaction differences. These default settings imply that, a priori, each possible configuration of difference indicators is equally likely. But they also project group differences in category thresholds for about half of the variables and for about half of the pairwise associations, regardless of the number of effects under consideration. That is, the Bernoulli prior does not provide a correction for multiplicity (Scott & Berger, Reference Scott and Berger2010). To address this, bgms also provides hierarchical priors for inclusion probabilities. Specifically, assigning a beta hyperprior, such as
$\pi _\epsilon \sim \text {beta}(1, 1)$
and
$\pi _\delta \sim \text {beta}(1, 1)$
, yields a beta-Bernoulli model that induces a uniform distribution on the number of included differences. This hierarchical formulation adjusts the prior inclusion probabilities based on the total number of selected differences (e.g., Consonni et al., Reference Consonni, Fouskakis, Liseo and Ntzoufras2018; Marsman et al., Reference Marsman, Huth, Waldorp and Ntzoufras2022), and has been shown to provide automatic multiplicity correction (Scott & Berger, Reference Scott and Berger2010).
While bgms also supports the stochastic block prior for modeling clustering in the network structure (Sekulovski et al., Reference Sekulovski, Arena, Haslbeck, Huth, Friel and Marsman2025), this functionality is not currently extended to testing for group differences. Although stochastic block priors are well suited for capturing latent community structure in psychometric networks, we do not currently consider such structured assumptions appropriate for modeling group differences in parameters. In the context of threshold and interaction differences, we do not believe that these effects follow a clustered pattern.
4.3 A Gibbs sampling procedure to simulate the posterior distribution
The Bayes factor tests described in the previous sections rely on the posterior distributions of the difference inclusion indicators
$\boldsymbol {\gamma }$
. Because these posterior distributions are not available in closed form, we estimate them using a Gibbs sampler that has the following multivariate posterior distribution as its invariant distribution:
This posterior distribution extends the model from Section 3 by incorporating a hierarchical prior for the group difference parameters. Specifically, it introduces spike-and-slab priors on the group difference parameters
$\boldsymbol {\epsilon }$
and
$\boldsymbol {\delta }$
, conditional on the inclusion indicators
$\boldsymbol {\gamma }$
. In addition, a marginal prior is placed on the indicators themselves.
Estimating the joint pseudoposterior distribution of the model parameters and the difference indicators is a non-trivial task. Since the inclusion or exclusion of parameters changes the dimensionality of the model, a standard Gibbs sampler is not applicable. To address this, we employ a pairwise Metropolis step that jointly updates each indicator and its associated parameter (cf. Gottardo & Raftery, Reference Gottardo and Raftery2008; Marsman et al., Reference Marsman, van den Bergh and Haslbeck2025), avoiding the need for more complex reversible jump methods (e.g., Green, Reference Green1995). Specifically, we extend Algorithm 1 by incorporating a pairwise Metropolis update: for each group difference effect, we jointly update the inclusion indicator—either
$\gamma _{ii}$
for category threshold differences or
$\gamma _{ij}$
for pairwise interaction differences—along with the corresponding parameter vector
$\boldsymbol {\epsilon }_{i.}$
or parameter value
$\delta _{ij}$
. This extension yields a valid sampling procedure for the joint pseudoposterior distribution of model parameters and difference indicators, as schematically summarized in Algorithm 2.
In the bgms R package, we run Algorithm 2 in two stages, for a total of
$\text {T} = 2\text {T}{\text {burnin}} + \text {T}{\text {main}}$
iterations. The first stage is called the burn-in and is split into two parts, each running for
$\text {T}_{\text {burnin}}$
iterations. In the first part, the algorithm searches for a region of high posterior mass and calibrates the proposal variances, but does not yet update the indicator variables (as in Algorithm 1). In the second part, updates to the indicator variables are included. By default, the number of iterations in the burn-in stage in bgms is
$2\text {T}_{\text {burnin}} = 2 \times 500 = 1{,}000$
. The parameter values are initialized at zero, and the proposal variances at one. The second stage is the main simulation, during which the algorithm output is stored for further analysis.
Numerical Check II in Appendix C.2 shows that the proposed algorithm correctly recovers the prior distribution of the pairwise difference indicators and parameters, and Numerical Check III in Appendix C.3 shows that the inclusion Bayes factors based on the output of Algorithm 2 are consistent with theoretical expectations.
5 Numerical experiments
The goal of the numerical checks in Appendix C was to demonstrate that the proposed method yields consistent parameter estimates and selection of group differences. Here, we provide an additional numerical experiment to evaluate the performance of the method in selecting group differences in a setting similar to an empirical psychological study, and how it compares to other popular methods for comparing pairwise association networks. The R script for this numerical experiment can be found at https://osf.io/txhbp/overview?view_only=0a2ed24ecc4448fbba104faa28e6f8c7.
5.1 Simulation setup
The goal of this simulation study is to evaluate how well our method compares to existing methods for testing group differences in scenarios that resemble typical empirical research studies in the field of psychology.

5.1.1 Simulation design
In each simulation, we generate data from two groups whose distributions differ in a controlled number of interaction parameters. We systematically vary four key aspects of the data-generating process: i) the number of variables (
$p \in \{5, 10, 20\}$
), ii) the population network density
$d_{\text {pop}} \in \{0.15, 1\}$
, defined as the proportion of the overall interaction parameters
$\phi $
(see Eq. (3)) that are present, iii) the proportion of interaction parameters that differ between groups (
$d_{\text {dif}} \in \{0.1, 0.2, 0.3\}$
), and iv) the sample size per group (
$n \in \{150, 300, 600, 1200\}$
). This results in a
$3 \times 2 \times 3 \times 4 = 72$
condition design, which we replicate 100 times to account for variability across repetitions.
We chose the variation in the number of variables based on the number of variables that are typically included in empirical applications of network models. The manipulation of
$d_{\text {pop}}$
is not of primary interest in the present study, but we include it to examine whether it affects the performance of the NCT that we analyze below, which uses a permutation test based on regularized regression models that assume sparsity. The range of
$d_{\text {dif}}$
values was selected to reflect what we consider reasonable for the size and prevalence of group differences in cross-sectional psychological data.
5.1.2 Data-generating model
The two-group model, defined by Eqs. (4) and (5), is characterized by two sets of parameters. The first set consists of category thresholds
$\boldsymbol {\lambda }$
and pairwise interactions
$\boldsymbol {\phi }$
, which together characterize the overall model obtained when group membership is ignored. The second set consists of category threshold differences
$\boldsymbol {\epsilon }$
and pairwise interaction differences
$\boldsymbol {\delta }$
. Because the methods we compare to below assess only interaction differences, we assume throughout that
$\boldsymbol {\epsilon } = \mathbf {0}$
and focus on selecting
$\boldsymbol {\delta }$
. For our simulations, we therefore need to define the overall parameters (
$\boldsymbol {\lambda }$
and
$\boldsymbol {\phi }$
), as well as the difference parameters (
$\boldsymbol {\delta }$
). These choices are described in the ensuing sections.
Selecting overall parameter values To ensure that the overall parameters reflect realistic values, we calibrate them using empirical data. Specifically, we estimate an ordinal MRF on data from Billieux et al. (Reference Billieux, Heeren, Rochat, Maurage, Bayard, Bet, Besche-Richard, Challet-Bouju, Carré, Devos and Flayelle2021), which includes responses from
$n = 18{,}702$
individuals to the
$p = 20$
items of the short version of the UPPS-P Impulsive Behavior Scale (Cyders et al., Reference Cyders, Littlefield, Coffey and Karyadi2014). The items include statements such as “I usually think carefully before doing anything” and are rated on a 4-point Likert scale ranging from “Agree strongly” to “Disagree strongly.” These data are representative of the ordinal response formats and content domains commonly used in psychological research, and the large sample size provides stable population-level estimates of interaction parameters. The resulting estimates for the pairwise interactions form a right-skewed distribution with a range of
$[-0.17, 1.38]$
, a mean of
$0.13$
, and a standard deviation of
$0.27$
. When generating the overall model in the simulation, we first determine which interaction parameters are nonzero, and then sample values for the nonzero interactions from this empirical distribution.
Population network density is manipulated by varying the number of pairwise interactions retained in the model. For the condition
$d_{\text {pop}} = 1$
, we use the full set of pairwise interactions and corresponding thresholds estimated from the empirical data. For the condition
$d_{\text {pop}} = 0.15$
, each potential interaction is retained independently with probability 0.15. This sparsification affects the marginal distributions, because both interaction parameters and thresholds jointly determine the observed response patterns. For example, if most interactions are positive, setting many of them to zero while keeping thresholds fixed will reduce the frequency of responses in the higher categories relative to the empirical marginals. To prevent such confounding, we re-estimate the ordinal MRF on the same empirical data for each sampled sparse graph, using the selected structure as a constraint. This ensures that marginal distributions remain comparable across conditions and prevents confounding between structural sparsity and a shift in the marginal distribution. A side effect of this procedure is that the retained interaction parameters under
$d_{\text {pop}} = 0.15$
tend to be stronger than those under
$d_{\text {pop}} = 1$
, since weak or absent edges are excluded. Although we do not have a formal account of how this shift influences the detection of group differences, lower variance in a variable is likely to reduce sensitivity to changes involving that variable. The re-estimation procedure described above mitigates this issue by aligning marginal distributions across density conditions.
Selecting difference values We now specify the group differences used in the simulations. While the pairwise interactions and category thresholds of the overall model are calibrated using empirical data, there is no corresponding population-level information available for group differences. In particular, the distribution of group differences depends on the grouping variable, and unlike the overall associations, it is not shaped by constraints such as questionnaire design. For instance, in empirical data, extremely skewed marginals or overly strong associations are often avoided by construction, whereas this is not the case for latent group differences.
To generate group-specific models, we follow Eq. (3) and construct the two group models defined in Eqs. (4) and (5) by subtracting or adding half of the group difference parameter from the overall interaction parameter. This ensures that the group models differ in a symmetric and centered way, while preserving the structure of the overall model.
In the absence of empirical constraints, a natural choice might be to draw the group differences from a Gaussian distribution centered at zero. However, such a distribution would result in a large number of very small differences, many of which would be of little substantive interest. Instead, we aim to simulate group differences that are meaningfully different in magnitude. While any such definition is necessarily arbitrary, we set a lower bound of
$0.1$
and cap the largest differences at
$1$
. To operationalize this, we sample values from a mixture of uniform distributions
$[-0.6, -0.1]$
and
$[0.1, 0.6]$
, with equal probability of positive and negative differences.
For each simulation condition, we fix a proportion
$d_{\text {int}} \in \{0.1, 0.2, 0.3\}$
of the pairwise interactions to differ between groups. We then randomly select that proportion of interactions from the full set, and assign each a difference value (i.e.,
$\delta $
) drawn from the mixture of uniform distributions described above. This procedure implies that the true data-generating model differs across repetitions of the simulation design, as new difference values are drawn each time. Within a single repetition, however, the same pair of group models is used to generate all data.
Simulating the data We sample
$n \in \{150, 300, 600, 1,200\}$
independent observations from each of the two groups. To avoid extreme marginal distributions that would make the detection of group differences difficult or lead to errors due to insufficient variance, we impose the following constraint on the generated data: we resample the entire process, both the generation of the true model and the sampling of observations, until the second-most frequent response category contains at least
$n \times 0.10$
observations. Across all conditions, this constraint resulted in an average of 0.02 and a maximum of three resampling iterations.
5.1.3 Selecting group differences
To evaluate the performance of our method in identifying group differences in interaction parameters, we compare the two prior formulations for the difference indicators introduced in Section 4.2.2. The first uses a Bernoulli prior with fixed inclusion probability
$\pi _\delta = 0.5$
for each interaction difference. The second uses a beta-Bernoulli prior, where
$\pi _\delta \sim \text {Beta}(1, 1)$
. All other prior distributions are defined as in Section 3.2, and correspond to the default values used in the bgms package. Posterior distributions are approximated using 10,000 MCMC iterations, as implemented in the R package bgms (version 0.1.4.2; Marsman et al., Reference Marsman, Arena, Huth, Sekulovski and van den Bergh2024).
We compare these two versions of our method to two widely used approaches for detecting group differences in GGMs (e.g., Haslbeck, Reference Haslbeck2022). Although these methods assume continuous data, they are often applied to ordinal data in empirical studies, where ordinal responses are treated as continuous to estimate partial association networks. The first method is a frequentist permutation test of edge equality across groups, implemented in the R package NetworkComparisonTest (version 2.2.2; van Borkulo et al., Reference van Borkulo, van Bork, Boschloo, Kossakowski, Tio, Schoever and Waldorp2023). We use 10,000 permutations to generate null distributions for each edge-specific test. The second is a Bayesian approach that uses a single-model Bayes factor, implemented in the R package BGGM (version 2.1.3; Williams & Mulder, Reference Williams and Mulder2020b). This method compares two models: both estimate all pairwise differences except one, which is constrained to equality across groups (Williams et al., Reference Williams, Rast, Pericchi and Mulder2020). Posterior estimation is also based on 10,000 MCMC iterations.
In total, we compare four methods for selecting group differences in interaction parameters: two variants of our method and two GGM-based alternatives.
5.1.4 Evaluating performance
Performance metrics, such as sensitivity, specificity, and precision, depend on the choice of a specific threshold or tuning parameter (e.g., a cutoff on inclusion probability, Bayes factor, or p-value). While these metrics are intuitive and often reported, they are threshold-dependent and can therefore obscure how a method performs across its full range of decision rules. This is particularly relevant in the present context, where the goal is to compare methods with different inferential philosophies and calibration strategies.
To evaluate performance independent of a specific threshold, we use the receiver operating characteristic (ROC) curve. The ROC curve plots the false positive rate (FPR, or 1—specificity; x-axis) against the true positive rate (TPR, or sensitivity; y-axis) across a range of threshold values (Fawcett, Reference Fawcett2006). Thus, each point on the ROC curve corresponds to a specific sensitivity–specificity tradeoff. To summarize overall performance, we compute the area under the ROC curve (AUC), which ranges from 0 to 1. An AUC of 0.5 indicates performance no better than random guessing, while higher AUC values indicate better separation between true and false positives. By aggregating over the full range of thresholds, the AUC avoids the arbitrariness associated with selecting a single threshold and allows for more robust performance comparisons across methods.
5.2 Simulation results
We first report the results for the
$\text {d}_{\text {pop}} = 1.0$
condition, in which the overall network is fully connected, in Figure 2. The figure shows the AUC as a function of sample size (x-axis), with separate columns for the number of variables and rows for the proportion
$d_{\text {dif}}$
of edges that differ between groups. The four compared methods are indicated by color. As expected, the AUC increases with sample size for all methods. The few exceptions to this trend occur in conditions with fewer variables (
$p = 5, 10$
) and are likely due to sampling variability.

Figure 2 Area under the ROC curve (AUC; y-axis) as a function of sample size per group (x-axis), for different numbers of variables (columns) and proportions of true group differences (rows).
Note: The overall model has a density of
$1$
, corresponding to a fully connected network.
When comparing the four methods, we find that the two variants of the proposed method perform best overall (mean AUC: bgms(BB) =
$0.914$
, bgms(B) =
$0.911$
), followed by BGGM (
$0.850$
) and the NCT (
$0.743$
). However, as the sample size increases, performance differences between methods diminish. In particular, the NCT closes the gap with other methods in the condition with
$p = 5$
and large n.
Figure 3 presents results for the
$\text {d}_{\text {pop}} = 0.15$
condition in which the overall network is sparsely connected. This condition was included to examine how graph sparsity affects the detection of group differences, especially given that the NCT relies on LASSO-based estimation, which assumes a sparse structure. As in the previous condition, performance improves with increasing sample size. Average performance across all methods and conditions is somewhat lower in this setting (
$0.835$
vs.
$0.855$
in the dense condition). The proposed methods again perform best, with bgms(BB) achieving the highest mean AUC (
$0.879$
), followed by bgms(B) (
$0.868$
). In this condition, NCT slightly outperforms BGGM (NCT =
$0.802$
, BGGM =
$0.794$
), and again approaches the performance of the other methods at high sample sizes for
$p = 5$
.

Figure 3 Area under the ROC curve (AUC; y-axis) as a function of sample size per group (x-axis), for different numbers of variables (columns) and proportions of true group differences (rows).
Note: The overall model has a fixed network density of
$0.15$
.
Figure 4 summarizes runtime as a function of the number of variables (columns) and sample size per group (x-axis), averaged over the density and group difference parameters, which had negligible effects on runtime. BGGM exhibits minimal runtime across all conditions. NCT runtime increases with the number of variables but is largely unaffected by sample size. In contrast, the runtime for both variants of our method increases with both the number of variables and the sample size.

Figure 4 Average runtime in minutes for the four compared methods (colors) as a function of sample size per group (x-axis) and number of variables (columns), averaged over all other simulation conditions.
5.3 Discussion of simulation results
Our simulation results demonstrate that all methods recover group differences at rates well above chance across realistic conditions. Across network densities, the proposed methods based on the ordinal MRF consistently achieved the highest performance, with the beta-Bernoulli prior yielding slightly higher AUCs than the Bernoulli prior. For the fully connected networks, BGGM performed next best, while the NCT exhibited the lowest performance. In the sparse network condition, NCT slightly outperformed BGGM, although both remained clearly below the proposed methods in overall accuracy.
For applied researchers aiming to test group differences in empirical ordinal data, these results suggest that, when the ordinal MRF provides a reasonable model for the data, the proposed method offers substantial advantages over the alternatives. This holds even in sparse network settings, despite the fact that sparsity is not an explicit modeling assumption of our approach.
The standard error of the AUC estimates ranged from
$0.002$
to
$0.032$
(in conditions with low n), with an average of
$0.011$
. We consider this level of uncertainty acceptable given the contrasts discussed. For example, the smallest AUC difference we report—between bgms(BB) and bgms(B)—is larger than the average standard error. Moreover, the consistency in relative performance across the majority of conditions suggests that these patterns are not attributable to sampling variability.
As with any simulation study, the generalizability of our findings to unexamined scenarios cannot be guaranteed. However, because our data-generating models are calibrated using empirical data from a large and typical dataset, we expect the conclusions to extend to settings with similar characteristics.
6 Discussion
In this article, we have introduced a new Bayesian method for analyzing two-group differences in networks of ordinal variables, addressing the limitations of existing network approaches that focus on Gaussian variables. We extended the OMRF framework to model group-specific networks and developed a novel Markov chain Monte Carlo (MCMC) procedure to estimate model parameters. This procedure enables Bayes factor tests for both parameter equivalence and differences across groups. Three numerical checks confirmed that the proposed methods are correctly implemented in the bgms R package (Marsman et al., Reference Marsman, Arena, Huth, Sekulovski and van den Bergh2024) and that they worked as intended. Moreover, the method uniquely supports testing differences in category thresholds, a capability absent in other approaches. Together, these features enable a richer approach to analyzing network differences in binary and ordinal data.
To evaluate performance in practical settings, we conducted a large-scale simulation study comparing our method to the NCT (van Borkulo et al., Reference van Borkulo, van Bork, Boschloo, Kossakowski, Tio, Schoever and Waldorp2023) and the Bayes factor approach in BGGM (Williams et al., Reference Williams, Rast, Pericchi and Mulder2020). Overall, the proposed method outperformed existing alternatives in detecting group differences in interaction parameters across most conditions, including both dense and sparse networks. In particular, the proposed Bayesian method showed excellent performance even in small sample sizes, a valuable feature that the alternatives could not achieve. While performance differences diminished with increasing sample size, and NCT was competitive, occasionally matching our method’s performance, in large-sample, low-dimensional settings (e.g., large n and small p), our approach consistently performed best in the majority of realistic empirical scenarios. These results suggest that the proposed Bayesian method offers robust and accurate inference across a wide range of conditions encountered in psychological research.
A major advantage of the proposed Bayes factor approach over traditional non-Bayesian methods is its ability to quantify evidence both for and against individual effects. This allows for robust Bayesian tests of conditional dependence or independence in one-group MRF models, as well as tests of parameter equivalence or difference across two independent groups. In contrast, commonly used network comparison methods, such as the NCT, are often interpreted as providing evidence for the null (e.g., no group differences), but in fact do not offer a principled way to support the null hypothesis. The ability to quantify support for parameter equality is a key strength of the Bayes factor approach and sets it apart from existing frequentist alternatives.
The inclusion Bayes factor we propose averages over all possible models, providing a robust approach in the presence of model uncertainty. This property likely contributed to the strong performance of our method, particularly in small sample settings. A significant advantage of the inclusion Bayes factor is its principled handling of multiple testing. When combined with a hierarchical prior on the number of differences, such as the beta-Binomial prior, it automatically accounts for model complexity and the multiplicity of tested hypotheses. This leads to a form of multiplicity control that reduces the likelihood of false discoveries without requiring arbitrary threshold adjustments (Scott & Berger, Reference Scott and Berger2010). In our simulations, this mechanism likely explains the slightly better performance of the beta-Bernoulli model compared to the fixed-inclusion Bernoulli model, and we expect similar advantages over single-model Bayes factor approaches, which do not address multiple testing in the same principled way.
There are several modeling limitations that we would like to acknowledge and address in future developments. Ideally, we would have a single inferential framework for network analysis that covers the full range of psychometric variables and designs. Currently, however, the proposed framework can only account for binary and/or ordinal variables. Although these are the most common types of variables in psychological network analysis, we would like to extend the framework, and thus the underlying models, to other types of variables, such as continuous variables. The conditional GGM (Lauritzen & Wermuth, Reference Lauritzen and Wermuth1989) provides a concrete approach for doing so. Another limitation of the model is that it only considers designs with two independent groups. Extending the underlying idea of an independent samples t test to an analysis of variance (ANOVA) design, following the ideas underlying Bayesian ANOVA (Rouder et al., Reference Rouder, Morey, Speckman and Province2012), provides a concrete direction for designs with more than two independent samples. The extension to dependent samples (e.g., a pre- and post-intervention design) requires a new modeling strategy involving random intercepts. All of these limitations provide concrete directions for future research.
There are also limitations and opportunities for improvement in the computational methodology. Numerical Check I showed that the Metropolis algorithms used to update the model parameters were much less efficient than rstan. While the Metropolis approach that we used is flexible and broadly applicable, it can result in slow mixing, leading to high autocorrelation or a low effective sample size in the sampled chains. To address this issue, we are exploring advanced sampling strategies, including gradient-based methods such as the Metropolis-adjusted Langevin algorithm (Besag, Reference Besag1994; Rossky et al., Reference Rossky, Doll and Friedman1978). A second limitation is our reliance on the pseudolikelihood to avoid computing the intractable normalization constant. While the pseudolikelihood leads to consistent posteriors, it may balance sensitivity and specificity differently than the full likelihood (e.g., Keetelaar et al., Reference Keetelaar, Sekulovski, Borsboom and Marsman2024; Marsman et al., Reference Marsman, van den Bergh and Haslbeck2025). We are investigating other approximations to the full likelihood to better capture its information. Despite these computational limitations, our numerical experiments showed that the proposed methodology performs well and outperforms existing alternatives under most conditions.
We have implemented the proposed Bayesian methods in the R package bgms (version 0.1.4.2 Marsman et al., Reference Marsman, Arena, Huth, Sekulovski and van den Bergh2024), which can be installed from CRAN (see https://cran.r-project.org/web/packages/bgms/index.html). The computational aspects of the software are mostly written in C++ using the Rcpp package (Eddelbuettel, Reference Eddelbuettel2013). In a series of numerical experiments, we have shown that the proposed methodology works and is correctly implemented in the R package. However, the timing results from our simulations revealed that the method can be computationally intensive, particularly when the number of variables or sample size is large. Although the implementation benefits from C++-level optimization, the runtime can still be substantially longer than that of existing alternatives. We are therefore continuously working to improve computational efficiency, both by developing more efficient sampling algorithms and by refining the underlying software implementation. In addition to these efforts, we are currently integrating the independent samples comparison into the easybgm R package (Huth et al., Reference Huth, Keetelaar, Sekulovski, van den Bergh and Marsman2024) to streamline the analysis, and the bgms package into the JASP software (Love et al., Reference Love, Selker, Marsman, Jamil, Dropmann, Verhagen and Wagenmakers2019; Wagenmakers et al., Reference Wagenmakers, Love, Marsman, Jamil, Ly, Verhagen and Morey2018) so that applied researchers without R experience can use the methodology.
7 Conclusion
We proposed a new two-group extension of the OMRF that allows researchers to model group differences in both pairwise interactions and category thresholds in networks of binary and ordinal variables. Alongside this model, we introduced a Bayesian methodology for estimation and inference, implemented in the bgms package, that enables principled evaluation of these differences using empirical data. The proposed methods provide researchers with a coherent and flexible framework for assessing group differences in network structures. By offering both a suitable modeling framework for ordinal data and a robust inferential approach, we aim to help place the growing literature on psychological networks on a firmer methodological foundation.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/psy.2025.10060.
Funding statement
M.M. and N.S. were supported by the European Union (ERC, BAYESIAN P-NETS, Grant No. 101040876). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. L.J.W. and J.M.B.H. were supported by the project “New Science of Mental Disorders” (www.nsmd.eu), supported by the Dutch Research Council and the Dutch Ministry of Education, Culture and Science (NWO gravitation Grant No. 024.004.016).
Competing interests
The authors declare none.
Disclosure of use of AI tools
We used a large language model to edit the text.
Appendix
A Parameter identification of MRF models for ordinal data
Marsman et al. (Reference Marsman, van den Bergh and Haslbeck2025, Section 2.4) showed that the parameters of the OMRF are identified, that is, that the corresponding probability distribution
$p(\mathbf {x})$
in Eq. (1) is an injective function of the parameters. We first give a new proof for the identification of the OMRF, which gives us a strategy to show that the parameters of the two-group OMRF are also identified.
A.1 Parameter identification of the ordinal MRF
The OMRF is in the exponential family and has sufficient statistics for each parameter. The sufficient statistics for the category thresholds
$\mu _{ic}$
are the indicator functions
$\mathcal {I}(x_i = c)$
, for
$c = 1\text {, }\dots \text {, }m$
(i.e., we set
$\mu _{i0} =0$
for identification, see below), and the sufficient statistics for the pairwise interactions are the pairwise products
$x_{i} \, x_{j}$
. We collect all these statistics in the vector
$\textbf {s} :=\textbf {s}(\mathbf {x})$
, where
$\mathbf {x}\in \{0,1,\ldots ,m\}^{p}$
. In the same way, we collect the corresponding parameters
$\boldsymbol {\mu }$
and
$\boldsymbol {\theta }$
in the vector
$\boldsymbol {\eta }$
. We can then rewrite the probability distribution characterized by the OMRF as
We show that the elements in
$\textbf {s}$
have no linear constraints, that is, that they cannot be expressed as a linear function of each other, which implies the identifiability of the parameter vector
$\boldsymbol {\eta }$
(Lehmann & Casella, Reference Lehmann and Casella1998).
For each variable, we set the threshold parameter for the first category to a constant for identification (i.e.,
$\mu _{i0}=0$
). If we had not done this, then the sufficient statistics for the category thresholds of a variable i would satisfy the linear constraint:
and the parameter vector
$\boldsymbol {\eta }$
would not be identifiable. By setting the threshold parameter of the first category to the constant zero for each variable in the network, the statistic
$\mathcal {I}(x_{i} = 0)$
is not part of
$\textbf {s}$
. This implies that the value of the sufficient statistic for the threshold parameter of a category c cannot be a linear combination of the sufficient statistics corresponding to the remaining categories (or pairwise interactions) alone. That is, they no longer satisfy linear constraints. Similarly, the statistics corresponding to the pairwise interactions cannot be expressed as a linear combination of the remaining statistics. To see why, consider the statistic
$x_i \, x_j$
that corresponds to the pairwise interaction parameter
$\theta _{ij}$
. If it can be expressed as a linear combination of the remaining statistics, then this relation must hold for all possible realizations of the vector
$\mathbf {x}$
. Now let’s assume that all other variables are zero, but that
$x_i \, x_j> 0$
. In this case, the only non-zero statistics we have are
$x_i\,x_j$
,
$\mathcal {I}(x_i = c)$
, and
$\mathcal {I}(x_j = d)$
for some integers
$0 < c\text {, }d\leq m$
, and there is clearly no linear combination of
$\mathcal {I}(x_i = c)$
and
$\mathcal {I}(x_j = d)$
that can reproduce the statistic
$x_i\, x_j$
. In the same way, one can show that there is no linear combination that can reproduce the statistic
$x_i\, x_j$
if the remaining variables are all one, or some are one and some are zero. This shows that the pairwise interactions also have no linear constraint, and the parameter vector
$\boldsymbol {\eta }$
is identifiable.
A.2 Parameter identification of the two-group ordinal MRF
Since the parameters of the two-group MRF are a one-to-one function of the parameters of two independent MRFs, and since we have established that the parameters of the independent MRFs are identifiable, we know that the parameters of the two-group MRF must also be identifiable. To verify this, we follow the same strategy as before and focus on the comparison of the OMRF in two groups.
The two-group OMRF is in the exponential family and has sufficient statistics for each parameter. The sufficient statistics for the category threshold parameters
$\lambda _{ic}$
are
$\mathcal {I}(x_i=c) + \mathcal {I}(y_i=c)$
, the sufficient statistics for the category threshold differences
$\epsilon _{ic}$
are
, the sufficient statistics for the pairwise interactions
$\phi _{ij}$
are the sum of the pairwise products
$x_{i} \, x_{j}+y_{i} \, y_{j}$
, and for the pairwise differences
$\delta _{ij}$
the differences of the pairwise products
. We again collect all these statistics in the vector
$\textbf {s} :=\textbf {s}(\mathbf {x}\text {, }\mathbf {y})$
, where
$\mathbf {x},\mathbf {y}\in \{0,1,\ldots ,m\}^{p}$
, and collect the corresponding parameters
$\boldsymbol {\lambda }$
,
$\boldsymbol {\epsilon }$
,
$\boldsymbol {\phi }$
, and
$\boldsymbol {\delta }$
in the vector
$\boldsymbol {\eta }$
. Proof of the identifiability of the parameter vector
$\boldsymbol {\eta }$
follows the same steps and arguments as above.
For reasons stated above, we set the category threshold parameter and category threshold difference parameter for the first category to a constant for identification (i.e.,
$\lambda _{i0}=0$
and
$\epsilon _{i0}=0$
) for each variable. Then, it is clear from the sufficient statistics defined above, that
$\mathcal {I}(x_{i}=c) + \mathcal {I}(y_{i}=c) \neq \mathcal {I}(y_{i}=c) - \mathcal {I}(x_{i}=c)$
, for
$c =1\text {, }\dots \text {, }m$
and every
$x_i = y_i \neq 0$
. That is, we cannot express the sufficient statistics for the category thresholds and their differences as a linear combination of each other or of the remaining statistics.
Similarly, the statistics corresponding to the pairwise interactions and their differences cannot be expressed as a linear combination of the remaining statistics. To see why, consider the statistic
$x_i \, x_j + y_i\,y_j$
that corresponds to the pairwise interaction parameter
$\phi _{ij}$
. If it can be expressed as a linear combination of the remaining statistics, then this relation must hold for all possible realizations of the vectors
$\mathbf {x}$
and
$\mathbf {y}$
. Now, let’s assume that all other variables are zero, but that
$x_i \, x_j + y_i \, y_j> 0$
. In this case, the non-zero statistics we have are
$x_i\,x_j + y_i\,y_j$
,
,
$\mathcal {I}(x_i = c) + \mathcal {I}(y_i = c)$
,
$\mathcal {I}(x_j = d) + \mathcal {I}(y_j = d)$
,
,
for some integers
$0 < c\text {, d}\leq m$
. There is no linear combination of these five statistics that can reproduce the statistic
$x_i\, x_j + y_i\, y_j$
for every possible realization of
$\mathbf {x}$
and
$\mathbf {y}$
. Let’s examine a single example to verify that this statement is true. Let
$x_ix_j + y_i y_j = 1$
, which implies that either
$x_i = x_j = 1$
and at least one of
$y_i$
and
$y_j$
is zero (scenario 1), or at least one of
$x_i$
and
$x_j$
is zero and
$y_i=y_j=1$
(scenario 2). In scenario 1, we have that
,
$\mathcal {I}(x_i = 1) + \mathcal {I}(y_i = 1)$
is one or two,
$\mathcal {I}(x_j = 1) + \mathcal {I}(y_j = 1)$
is one or two,
is zero or minus a half, and
is zero or minus a half. In scenario 2, we have that
,
$\mathcal {I}(x_i = 1) + \mathcal {I}(y_i = 1) $
is one or two,
$\mathcal {I}(x_j = 1) + \mathcal {I}(y_j = 1)$
is one or two,
is zero or a half, and
is zero or a half. From this, we see that in neither scenario can we uniquely express the statistic
$x_ix_j + y_i y_j$
as a linear function of the other statistics.
Since none of the statistics discussed so far could be expressed as a linear combination involving the pairwise difference statistic, the converse must also hold. That is, the pairwise difference statistic cannot be expressed as a linear function of any of the statistics discussed so far. The argument for the sum of the pairwise products can also be used for the difference of the pairwise products to argue that none of the pairwise difference statistics can be expressed as a linear combination of the remaining differences in the pairwise products. Thus, the parameter vector
$\boldsymbol {\eta }$
is identifiable.
B Full conditional simulation steps for the Gibbs sampler
B.1 Sampling category thresholds via independence Metropolis
The full conditional pseudoposterior of the category threshold parameters are of the form
$$ \begin{align*} p^\ast\left(\lambda_{ic} \mid \mathbf{X}\text{, }\mathbf{Y}\text{, }\boldsymbol{\lambda}^{(c)}_i\text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta}\right) & \propto \frac{\exp\left(\lambda_{ic} \right)^{n_{ic1}+n_{ic2}}}{\prod_{v=1}^{n_1}\left(\text{g}_{v1} + \text{q}_{v1}\exp\left(\lambda_{ic}\right)\right)\,\prod_{v=1}^{n_2}\left(\text{g}_{v2} + \text{q}_{v2}\exp\left(\lambda_{ic}\right)\right)}\\ &\times \frac{\exp\left(\lambda_{ic}\right)^\alpha}{\left(1+\exp\left(\lambda_{ic}\right)\right)^{\alpha +\beta}}, \end{align*} $$
for
$c = 1\text {, }\dots \text {, }m$
, and
$i = 1\text {, }\dots \text {, }p$
. Here, we have used the following shorthand notations
$n_{ic1} = \sum _{v=1}^{n_1} \mathcal {I}(x_{vi} = c)$
,
$n_{ic2} = \sum _{v=1}^{n_2} \mathcal {I}(y_{vi} = c)$
, and
$$ \begin{align*} \text{g}_{v1} &= 1 + \sum_{u \neq c}\exp\left({\lambda_{iu} + \frac{1}{2}\epsilon_{i,u} + u\,\sum_{j\neq i}(\phi_{ij}+\frac{1}{2}\delta_{ij})\, x_{vj}}\right)\\ \text{g}_{v2} &= 1 + \sum_{u \neq c}\exp\left({\lambda_{iu} - \frac{1}{2}\epsilon_{i,u} + u\,\sum_{j\neq i}(\phi_{ij}-\frac{1}{2}\delta_{ij})\, y_{vj}}\right)\\ \ln(\text{q}_{v1}) &= \frac{1}{2}\epsilon_{i,c} + c\,\sum_{j\neq i}(\phi_{ij} + \frac{1}{2}\delta_{ij})\, x_{vj}\\ \ln(\text{q}_{v2}) &= - \frac{1}{2}\epsilon_{i,c} + c\,\sum_{j\neq i}(\phi_{ij} - \frac{1}{2}\delta_{ij})\, y_{vj}. \end{align*} $$
The full conditional resembles a generalized beta-prime distribution
$$\begin{align*}p(\lambda_{ic}) \propto \frac{\left(\text{d}\,\exp\left(\lambda_{ic}\right)\right)^{\text{a}}}{\left(1+\text{d}\,\exp\left(\lambda_{ic}\right)\right)^{\text{a}+\text{b}}}. \end{align*}$$
Maris et al. (Reference Maris, Bechger and San Martin2015) suggested using the beta-prime as the proposal in a Metropolis algorithm, and using the properties of the target distribution to set the proposal parameters. We have successfully used this idea before to simulate the threshold parameters of the IGM and OMRF (e.g., Marsman et al. Reference Marsman, Huth, Waldorp and Ntzoufras2022, Reference Marsman, van den Bergh and Haslbeck2025).
The log of the target distribution is concave and has linear tails:
$$\begin{align*}\frac{\text{d}}{\text{d} \lambda_{ic}}\, \ln\left\lbrace p^\ast\left(\lambda_{ic} \mid \mathbf{X}\text{, }\mathbf{Y}\text{, }\boldsymbol{\lambda}_i^{(c)}\text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta}\right) \right\rbrace \longrightarrow \begin{cases} n_{ic1} + n_{ic2} - n_1 - n_2 -\beta &\text{as }\lambda_{ic} \rightarrow \infty,\\ n_{ic1} + n_{ic2} + \alpha &\text{as }\lambda_{ic} \rightarrow -\infty, \end{cases} \end{align*}$$
and the same holds for the proposal distribution:
$$\begin{align*}\frac{\text{d}}{\text{d} \lambda_{ic}}\, \ln\left\lbrace p\left(\lambda_{ic} \right) \right\rbrace \longrightarrow \begin{cases} -\text{b} &\text{as }\lambda_{ic} \rightarrow \infty,\\ \text{a} &\text{as }\lambda_{ic} \rightarrow -\infty. \end{cases} \end{align*}$$
We match the tails of the proposal distribution to that of the target distribution by setting
$\text {a} = n_{ic1} + n_{ic2} + \alpha $
and
$\text {b} = n_1 + n_2 - n_{ic1}-n_{ic2} + \beta $
. The last free parameter,
$\text {d}$
, is used to ensure that the proposal closely matches the target distribution at the current state of the Markov chain. Specifically,
$\text {d}$
is used to make the derivatives of the logarithms of the proposal and target distributions equal. If
$\hat {\lambda }$
is the current state of
$\lambda _{ic}$
in the Markov chain, then we equate the two derivatives and solve for d, which yields
$$\begin{align*}\text{d} = \frac{\sum_{v=1}^{n_1}\frac{\text{q}_{v1}}{\text{g}_{v1}+\text{q}_{v1}\,\exp(\hat{\lambda})} + \sum_{v=1}^{n_2}\frac{\text{q}_{v2}}{\text{g}_{v2}+\text{q}_{v2}\,\exp(\hat{\lambda})} + (\alpha+\beta) \, \frac{1}{1+ \,\exp(\hat{\lambda})}}{\alpha+\beta+n_1+n_2 - \sum_{v=1}^{n_1}\frac{\text{q}_{v1}\exp(\hat{\lambda})}{\text{g}_{v1}+\text{q}_{v1}\,\exp(\hat{\lambda})} + \sum_{v=1}^{n_2}\frac{\text{q}_{v2}\exp(\hat{\lambda})}{\text{g}_{v2}+\text{q}_{v2}\,\exp(\hat{\lambda})} + (\alpha+\beta) \, \frac{\exp(\hat{\lambda})}{1+ \,\exp(\hat{\lambda})} }. \end{align*}$$
Now that we have the value for
$\text {d}$
, we can sample a proposal from the generalized beta-prime distribution in the following way: we sample W from a Beta
$(\text {a}\text {, }\text {b})$
distribution, and set the proposed value
$\lambda ^\prime $
equal to
$ \log \left (\text {d}^{-1}\frac {W}{1-W}\right )/2$
. Here,
$\lambda ^\prime $
is a sample from the generalized beta-prime distribution. We accept the proposed value with probability
$$ \begin{align*} \min&\left\lbrace 1\text{, } \frac{\prod_{v=1}^{n_1}\left(\text{g}_{v1} + \text{q}_{v1}\exp\left(\hat{\lambda}\right)\right)}{\prod_{v=1}^{n_1}\left(\text{g}_{v1} + \text{q}_{v1}\exp\left(\lambda^\prime\right)\right)}\, \frac{\prod_{v=1}^{n_2}\left(\text{g}_{v2} + \text{q}_{v2}\exp\left(\hat{\lambda}\right)\right)}{\prod_{v=1}^{n_2}\left(\text{g}_{v2} + \text{q}_{v2}\exp\left(\lambda^\prime\right)\right)}\, \right.\\ &\times \left.\frac{\left(1+\exp\left(\hat{\lambda}\right)\right)^{\alpha +\beta}}{\left(1+\exp\left(\lambda^\prime\right)\right)^{\alpha +\beta}} \frac{\left(1+\text{d}\,\exp\left(\lambda^\prime\right)\right)^{\alpha+\beta+n}}{\left(1+\text{d}\,\exp\left(\hat{\lambda}\right)\right)^{\alpha+\beta+n}} \right\rbrace, \end{align*} $$
and retain the current value
$\hat {\lambda }$
otherwise.
B.2 Sampling threshold differences via adaptive Metropolis
The full conditional pseudoposterior distribution of the threshold difference parameters are of the form
$$ \begin{align*} p^\ast\left(\epsilon_{ic} \mid \mathbf{X}\text{, }\mathbf{Y}\text{, }\boldsymbol{\lambda}_i\text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta}\right) & \propto \frac{\exp\left(\frac{1}{2}\epsilon_{i c}\,n_{ic1}\right)}{\prod_{v=1}^{n_1}\left\lbrace1 + \sum_{u=1}^{m}\exp\left(\lambda_{iu} + \frac{1}{2}\epsilon_{i u} + u\,\sum_{j\neq i}(\phi_{ij} + \frac{1}{2}\delta_{ij})\, x_{vj}\right)\right\rbrace}\\ &\times \frac{\exp\left( - \frac{1}{2}\epsilon_{i c}\,n_{ic2}\right)}{\prod_{v=1}^{n_2} \left\lbrace 1 + \sum_{u=1}^{m}\exp\left(\lambda_{iu} - \frac{1}{2}\epsilon_{i u} + u\,\sum_{j\neq i}(\phi_{ij} - \frac{1}{2}\delta_{ij})\, y_{vj}\right)\right\rbrace}\\ &\times \frac{\text{s}_\epsilon}{\text{s}_\epsilon^2 + \epsilon_{ic}^2}, \end{align*} $$
where
$\text {s}_\epsilon $
is the scale of the Cauchy prior. We will sample from these full conditional pseudoposterior distributions using adaptive Metropolis (Atchadé & Rosenthal, Reference Atchadé and Rosenthal2005; Griffin & Steel, Reference Griffin, Steel, Tadesse and Vanucci2022).
The adaptive Metropolis algorithm is implemented as follows. We propose a new value
$\epsilon ^\prime $
from a normal density centered at the current state
$\epsilon ^\ast $
. The variance of this proposal density is
$\nu _{\epsilon _{i c}}$
. We accept the proposed value with probability
$$ \begin{align*} \pi = \text{min}&\left\lbrace1\text{, }\exp\left(\frac{1}{2} \,(n_{ic1}-n_{ic2})\, (\epsilon^\prime - \epsilon^\ast)\right)\right.\\ &\times \prod_{v=1}^{n_1} \frac{\left\lbrace1 + \sum_{u=1\neq c}^{m}e^{\lambda_{i u} + \frac{1}{2}\epsilon_{i u} + u\,\sum_{j\neq i}(\phi_{ij} + \frac{1}{2}\delta_{ij})\, x_{vj}}+e^{\lambda_{i c} + \frac{1}{2}\epsilon^\ast + c\,\sum_{j\neq i}(\phi_{ij} + \frac{1}{2}\delta_{ij})\, x_{vj}}\right\rbrace }{\left\lbrace1 + \sum_{u=1\neq c}^{m}e^{\lambda_{i u} + \frac{1}{2}\epsilon_{i u} + u\,\sum_{j\neq i}(\phi_{ij} + \frac{1}{2}\delta_{ij})\, x_{vj}}+e^{\lambda_{ ic} + \frac{1}{2}\epsilon^\prime + c\,\sum_{j\neq i}(\phi_{ij} + \frac{1}{2}\delta_{ij})\, x_{vj}}\right\rbrace}\\ &\times \prod_{v=1}^{n_2} \frac{\left\lbrace1 + \sum_{u=1\neq c}^{m}e^{\lambda_{ iu} - \frac{1}{2}\epsilon_{ i u} + u\,\sum_{j\neq i}(\phi_{ij} - \frac{1}{2}\delta_{ij})\, y_{vj}}+e^{\lambda_{ ic} - \frac{1}{2}\epsilon^\ast + c\,\sum_{j\neq i}(\phi_{ij} - \frac{1}{2}\delta_{ij})\, y_{vj}}\right\rbrace }{\left\lbrace1 + \sum_{u=1\neq c}^{m}e^{\lambda_{ iu} - \frac{1}{2}\epsilon_{ i u} + u\,\sum_{j\neq i}(\phi_{ij} - \frac{1}{2}\delta_{ij})\, y_{vj}}+e^{\lambda_{ ic} - \frac{1}{2}\epsilon^\prime + c\,\sum_{j\neq i}(\phi_{ij} - \frac{1}{2}\delta_{ij})\, y_{vj}}\right\rbrace}\\ &\left. \times \frac{s^2 + (\epsilon^\ast)^2}{s^2 + (\epsilon^\prime)^2}\right\rbrace, \end{align*} $$
and retain the current value otherwise.
The variance of the proposal distribution is calibrated using a algorithm (Robbins–Monro Reference Robbins and Monro1951). Specifically, at initialization t of the Gibbs sampler, we set the logarithm of the proposal variance
$\nu _{\epsilon _{ i c}}^{(t)}$
equal to
where
$\pi ^{(t)}$
was the acceptance probability in the current Metropolis step. This approach adjusts the proposal variance so that the Metropolis algorithm’s target acceptance probability is
$0.234$
, which is generally considered optimal for a Metropolis–Hastings random walk.
B.3 Sampling pairwise interactions via adaptive Metropolis
The full conditional pseudoposterior distribution of the pairwise interactions are of the form
$$ \begin{align*} &p^\ast\left(\phi_{ij} \mid \mathbf{X}\text{, }\mathbf{Y}\text{, }\boldsymbol{\lambda}_i\text{, }\boldsymbol{\lambda}_j\text{, }\boldsymbol{\epsilon}_i\text{, }\boldsymbol{\epsilon}_j\text{, }\boldsymbol{\phi}^{(ij)}\text{, }\boldsymbol{\delta}\right)\\ &\propto \frac{\exp\left(\sum_{v = 1}^{n_1} x_{vi}x_{vj}\phi_{ij} \right)}{\prod_{v=1}^{n_1} \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} + \frac{1}{2}\epsilon_{ i u} + u\,\sum_{j\neq i}(\phi_{ij} + \frac{1}{2}\delta_{ij})\, x_{vj}}\right\rbrace \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} + \frac{1}{2}\epsilon_{j\, u} + u\,\sum_{i\neq j}(\phi_{ij} + \frac{1}{2}\delta_{ij})\, x_{vi}}\right\rbrace}\\ &\times \frac{\exp\left(\sum_{v = 1}^{n_2} y_{vi}y_{vj}\phi_{ij} \right)}{\prod_{v=1}^{n_2} \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} - \frac{1}{2}\epsilon_{ i u} + u\,\sum_{j\neq i}(\phi_{ij} - \frac{1}{2}\delta_{ij})\, y_{vj}}\right\rbrace \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} - \frac{1}{2}\epsilon_{j\, u} + u\,\sum_{i\neq j}(\phi_{ij} - \frac{1}{2}\delta_{ij})\, y_{vi}}\right\rbrace}\\ &\times \frac{\text{s}_\phi}{\text{s}_\phi^2 + \phi_{ij}^2}, \end{align*} $$
where
$\text {s}_\phi $
is the scale of the Cauchy prior.
The adaptive Metropolis algorithm to simulate from these pseudoposterior distributions is implemented as follows. We propose a new value
$\phi ^\prime $
from a normal density centered at the current state
$\phi ^\ast $
. The variance of this proposal density is
$\nu _{\phi _{ij}}$
. We accept the proposed value with probability
$$ \begin{align*} \text{min}&\left\lbrace1\text{, }\exp\left(\left(\sum_{v=1}^{n_1}x_{vi}x_{vj} + \sum_{v=1}^{n_2}y_{vi}y_{vj}\right)(\phi^\prime - \phi^\ast)\right)\right.\\ &\prod_{v=1}^{n_1} \frac{\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} + \frac{1}{2}\epsilon_{ i u} + u x_{vj}(\phi^\ast+\frac{1}{2}\delta_{ij}) +u\,\sum_{k\neq i\neq j}(\phi_{ik} + \frac{1}{2}\delta_{ik})\, x_{vj}}\right\rbrace} {\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} + \frac{1}{2}\epsilon_{ i u} + u x_{vj}(\phi^\prime+\frac{1}{2}\delta_{ij}) + u\,\sum_{k\neq i \neq j}(\phi_{ik} + \frac{1}{2}\delta_{ik})\, x_{vj}}\right\rbrace}\\ &\prod_{v=1}^{n_1} \frac{ \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} + \frac{1}{2}\epsilon_{j\, u} + u x_{vi}(\phi^\ast+\frac{1}{2}\delta_{ij}) + u\,\sum_{k\neq i \neq j}(\phi_{kj} + \frac{1}{2}\delta_{kj})\, x_{vi}}\right\rbrace}{\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} + \frac{1}{2}\epsilon_{j\, u} + u x_{vi}(\phi^\prime+\frac{1}{2}\delta_{ij}) + u\,\sum_{k\neq i \neq j}(\phi_{kj} + \frac{1}{2}\delta_{kj})\, x_{vi}}\right\rbrace}\\ &\prod_{v=1}^{n_2} \frac{\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} - \frac{1}{2}\epsilon_{ i u} + u y_{vj}(\phi^\ast-\frac{1}{2}\delta_{ij}) +u\,\sum_{k\neq i\neq j}(\phi_{ik} -\frac{1}{2}\delta_{ik})\, y_{vj}}\right\rbrace} {\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} - \frac{1}{2}\epsilon_{ i u} + u y_{vj}(\phi^\prime-\frac{1}{2}\delta_{ij}) + u\,\sum_{k\neq i \neq j}(\phi_{ik}- \frac{1}{2}\delta_{ik})\, y_{vj}}\right\rbrace}\\&\prod_{v=1}^{n_2} \frac{ \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} - \frac{1}{2}\epsilon_{j\, u} + u y_{vi}(\phi^\ast-\frac{1}{2}\delta_{ij}) + u\,\sum_{k\neq i \neq j}(\phi_{kj} - \frac{1}{2}\delta_{kj})\, y_{vi}}\right\rbrace}{\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} - \frac{1}{2}\epsilon_{j\, u} + u y_{vi}(\phi^\prime-\frac{1}{2}\delta_{ij}) + u\,\sum_{k\neq i \neq j}(\phi_{kj} - \frac{1}{2}\delta_{kj})\, y_{vi}}\right\rbrace}\\ &\left. \times \frac{s^2 + (\phi^\ast)^2}{s^2 + (\phi^\prime)^2}\right\rbrace, \end{align*} $$
and update the proposal variance
$\nu _{\phi _{ij}}$
using the algorithm (Robbins–Monro Reference Robbins and Monro1951) as outlined in the previous section of this Appendix (i.e., B.2).
B.4 Sampling pairwise differences via adaptive Metropolis
The full conditional pseudoposterior distribution of the pairwise differences are of the form
$$ \begin{align*} &p^\ast\left(\delta_{ij} \mid \mathbf{X}\text{, }\mathbf{Y}\text{, }\boldsymbol{\lambda}_i\text{, }\boldsymbol{\lambda}_j\text{, }\boldsymbol{\epsilon}_i\text{, }\boldsymbol{\epsilon}_j\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta}^{(ij)}\right)\\ &\propto \frac{\exp\left(\frac{1}{2}\sum_{v = 1}^{n_1} x_{vi}x_{vj}\delta_{ij} \right)}{\prod_{v=1}^{n_1} \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} + \frac{1}{2}\epsilon_{ i u} + u\,\sum_{j\neq i}(\phi_{ij} + \frac{1}{2}\delta_{ij})\, x_{vj}}\right\rbrace \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} + \frac{1}{2}\epsilon_{j\, u} + u\,\sum_{i\neq j}(\phi_{ij} + \frac{1}{2}\delta_{ij})\, x_{vi}}\right\rbrace}\\ &\times \frac{\exp\left(-\frac{1}{2}\sum_{v = 1}^{n_2} y_{vi}y_{vj}\delta_{ij} \right)}{\prod_{v=1}^{n_2} \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} - \frac{1}{2}\epsilon_{ i u} + u\,\sum_{j\neq i}(\phi_{ij} - \frac{1}{2}\delta_{ij})\, y_{vj}}\right\rbrace \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} - \frac{1}{2}\epsilon_{j\, u} + u\,\sum_{i\neq j}(\phi_{ij} - \frac{1}{2}\delta_{ij})\, y_{vi}}\right\rbrace}\\ &\times \frac{\text{s}_\delta}{\text{s}_\delta^2 + \delta_{ij}^2}, \end{align*} $$
where
$\text {s}_\delta $
is the scale of the Cauchy prior.
The adaptive Metropolis algorithm to simulate from these pseudoposterior distributions is implemented as follows. We propose a new value
$\delta ^\prime $
from a normal density centered at the current state
$\delta ^\ast $
. The variance of this proposal density is
$\nu _{\delta _{ij}}$
. We accept the proposed value with probability
$$ \begin{align*} \text{min}&\left\lbrace1\text{, }\exp\left(\frac{1}{2}\left(\sum_{v=1}^{n_1}x_{vi}x_{vj} - \sum_{v=1}^{n_2}y_{vi}y_{vj}\right)(\delta^\prime - \delta^\ast)\right)\right.\\ &\prod_{v=1}^{n_1} \frac{\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} + \frac{1}{2}\epsilon_{ i u} + u x_{vj}(\phi_{ij}+\frac{1}{2}\delta^\ast) +u\,\sum_{k\neq i\neq j}(\phi_{ik} + \frac{1}{2}\delta_{ik})\, x_{vj}}\right\rbrace} {\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} + \frac{1}{2}\epsilon_{ i u} + u x_{vj}(\phi_{ij}+\frac{1}{2}\delta^\prime) + u\,\sum_{k\neq i \neq j}(\phi_{ik} + \frac{1}{2}\delta_{ik})\, x_{vj}}\right\rbrace}\\ &\prod_{v=1}^{n_1} \frac{ \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} + \frac{1}{2}\epsilon_{j\, u} + u x_{vi}(\phi_{ij}+\frac{1}{2}\delta^\ast) + u\,\sum_{k\neq i \neq j}(\phi_{kj} + \frac{1}{2}\delta_{kj})\, x_{vi}}\right\rbrace}{\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} + \frac{1}{2}\epsilon_{j\, u} + u x_{vi}(\phi_{ij}+\frac{1}{2}\delta^\prime) + u\,\sum_{k\neq i \neq j}(\phi_{kj} + \frac{1}{2}\delta_{kj})\, x_{vi}}\right\rbrace}\\ &\prod_{v=1}^{n_2} \frac{\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} - \frac{1}{2}\epsilon_{ i u} + u y_{vj}(\phi_{ij}-\frac{1}{2}\delta^\ast) +u\,\sum_{k\neq i\neq j}(\phi_{ik} -\frac{1}{2}\delta_{ik})\, y_{vj}}\right\rbrace} {\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{ iu} - \frac{1}{2}\epsilon_{ i u} + u y_{vj}(\phi_{ij}-\frac{1}{2}\delta^\prime) + u\,\sum_{k\neq i \neq j}(\phi_{ik}- \frac{1}{2}\delta_{ik})\, y_{vj}}\right\rbrace}\\ &\prod_{v=1}^{n_2} \frac{ \left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} - \frac{1}{2}\epsilon_{j\, u} + u y_{vi}(\phi_{ij}-\frac{1}{2}\delta^\ast) + u\,\sum_{k\neq i \neq j}(\phi_{kj} - \frac{1}{2}\delta_{kj})\, y_{vi}}\right\rbrace}{\left\lbrace1 + \sum_{u=1}^{m}e^{\lambda_{j\,u} - \frac{1}{2}\epsilon_{j\, u} + u y_{vi}(\phi_{ij}-\frac{1}{2}\delta^\prime) + u\,\sum_{k\neq i \neq j}(\phi_{kj} - \frac{1}{2}\delta_{kj})\, y_{vi}}\right\rbrace}\\ &\left. \times \frac{s^2 + (\delta^\ast)^2}{s^2 + (\delta^\prime)^2}\right\rbrace, \end{align*} $$
and update the proposal variance
$\nu _{\delta _{ij}}$
using the algorithm (Robbins–Monro Reference Robbins and Monro1951) as outlined in a previous section of this Appendix (i.e., B.2).
B.5 Joint sampling of threshold differences and indicators
We use a Metropolis step and propose a new value for the difference indicator
$\gamma ^\prime $
and the difference parameters
$\epsilon ^\prime $
based on their current states (i.e.,
$\gamma ^\ast = \gamma _{ii}$
and
$\epsilon ^\ast = \boldsymbol {\epsilon }_{i.}$
). We factor the proposal distribution as
We will also use the MoMS formulation for the two proposal distributions. First, we consider the proposal distribution for the edge indicator,
which suggests changing the edge, i.e., it sets
$\gamma ^\prime = 1 - \gamma ^\ast $
to make the between model move. Next, we define the conditional proposal distribution for the threshold difference effect,
Here, the proposal value
$\epsilon ^\prime $
is equal to
$\boldsymbol {0}$
if
$\gamma ^\prime = 0$
, otherwise, it is drawn from a proposal density
$p_{\text {proposal}}(\epsilon \mid \epsilon ^\ast )$
. We use the adaptive proposals from the adaptive Metropolis (cf. Appendix B.2).
Let
$\gamma ^\ast $
and
$\epsilon ^\ast $
denote the current states of the threshold difference indicator and the threshold difference parameter vector for the variable i, and let
$\gamma ^\prime $
and
$\epsilon ^\prime $
denote their proposed states. We accept the proposed states with probability
$\min \left \lbrace 1\text {, }\pi \right \rbrace $
, where
$$\begin{align*}\pi = \overbrace{\frac{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}^\prime\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta})\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}^\prime\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta}) }{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}^\ast\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta})\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda}\text{, }\boldsymbol{\epsilon}^\ast\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta}) }}^{\text{Pseudolikelihood Ratio}}\, \overbrace{\frac{p(\epsilon^\prime \mid \gamma^\prime)\,p(\gamma^\prime)}{p(\epsilon^\ast \mid \gamma^\ast)\,p(\gamma^\ast)}}^{\substack{\text{Prior Ratio}}}\, \overbrace{\frac{p(\epsilon^\ast \mid \epsilon^\prime\text{, }\gamma^\ast)\, p(\gamma^\ast \mid \gamma^\prime)}{p(\epsilon^\prime \mid \epsilon^\ast\text{, }\gamma^\prime)\, p(\gamma^\prime \mid \gamma^\ast)}}^{\text{Proposal Ratio}}, \end{align*}$$
where
$\boldsymbol {\epsilon }^\ast $
is the current state of the matrix of threshold differences and
$\boldsymbol {\epsilon }^\prime $
its proposed state with elements in row i set equal to the proposed value
$\epsilon ^\prime $
. If
$\gamma ^\ast =0$
, we propose
$\gamma ^\prime = 1$
and
$\epsilon ^\prime _c \sim \mathcal {N}(0 \text {, }\nu _{ic})$
, for
$c = 1\text {, }\dots \text {, }m$
, and accept the proposed values with probability
$$\begin{align*}\min\left\lbrace 1\text{, } \frac{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}^\prime\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta})\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}^\prime\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta})\, p_{\text{slab}}(\epsilon^\prime)\, \pi_\epsilon }{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda}\text{, }\boldsymbol{\epsilon}^\ast \text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta})\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}^\ast\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta})\, p_{\text{proposal}}(\epsilon^\prime \mid \epsilon^\ast)\,(1-\pi_\epsilon)) }\right\rbrace. \end{align*}$$
Conversely, if
$\gamma ^\ast =1$
, we propose
$\gamma ^\prime = 0$
and
$\epsilon ^\prime =\boldsymbol {0}$
. We accept
$\gamma ^\prime $
and
$\epsilon ^\prime $
with probability
$$\begin{align*}\min\left\lbrace 1\text{, } \frac{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}^\prime\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta})\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}^\prime\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta})\, p_{\text{proposal}}(\epsilon^\ast \mid \epsilon^\prime)\, (1-\pi_\epsilon) }{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}^\ast\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta})\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}^\ast\text{, }\boldsymbol{\phi}\text{, }\boldsymbol{\delta})\, p_{\text{slab}}(\epsilon^\ast)\, \pi_\epsilon }\right\rbrace. \end{align*}$$
B.6 Joint sampling of pairwise differences and indicators
Let
$\gamma ^\ast $
and
$\delta ^\ast $
denote the current states of the pairwise difference indicator and the pairwise difference parameter for the variables i and j, and let
$\gamma ^\prime $
and
$\delta ^\prime $
denote their proposed states. We use the adaptive proposals from the adaptive Metropolis (cf. Appendix B.4). We accept the proposed states with probability
$\min \left \lbrace 1\text {, }\pi \right \rbrace $
, where
$$\begin{align*}\pi = \overbrace{\frac{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\prime)\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\prime) }{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\ast)\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\ast) }}^{\text{Pseudolikelihood Ratio}}\, \overbrace{\frac{p(\delta^\prime \mid \gamma^\prime)\,p(\gamma^\prime)}{p(\delta^\ast \mid \gamma^\ast)\,p(\gamma^\ast)}}^{\substack{\text{Prior Ratio}}}\, \overbrace{\frac{p(\delta^\ast \mid \delta^\prime\text{, }\gamma^\ast)\, p(\gamma^\ast \mid \gamma^\prime)}{p(\delta^\prime \mid \delta^\ast\text{, }\gamma^\prime)\, p(\gamma^\prime \mid \gamma^\ast)}}^{\text{Proposal Ratio}}, \end{align*}$$
where
$\boldsymbol {\delta }^\ast $
is the current state of the matrix of pairwise differences and
$\boldsymbol {\delta }^\prime $
is its proposed state, with elements in row i column j and row j column i set equal to the proposed value
$\delta ^\prime $
. If
$\gamma ^\ast =0$
, we propose
$\gamma ^\prime = 1$
and
$\delta ^\prime \sim \mathcal {N}(0 \text {, }\nu _{ij})$
, and accept the proposed values with probability
$$\begin{align*}\min\left\lbrace 1\text{, } \frac{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\prime)\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\prime)\, p_{\text{slab}}(\delta^\prime) \,\pi_\delta }{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\ast)\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\ast) \,p_{\text{proposal}}(\delta^\prime \mid \delta^\ast)\,(1-\pi_\delta)}\right\rbrace. \end{align*}$$
Conversely, if
$\gamma ^\ast =1$
, we propose
$\gamma ^\prime = 0$
and
$\delta ^\prime = 0$
. We accept
$\gamma ^\prime $
and
$\delta ^\prime $
with probability
$$\begin{align*}\min\left\lbrace 1\text{, } \frac{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\prime)\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\prime)\,p_{\text{proposal}}(\delta^\ast \mid \delta^\prime)\,(1-\pi_\delta) }{ p^\ast(\mathbf{x} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\ast)\, p^\ast(\mathbf{y} \mid \boldsymbol{\lambda} \text{, }\boldsymbol{\epsilon}\text{, }\boldsymbol{\phi} \text{, }\boldsymbol{\delta}^\ast)\,p_{\text{slab}}(\delta^\ast)\,\pi_\delta}\right\rbrace. \end{align*}$$
C Numerical checks
The R scripts used for the numerical checks and the rstan implementation used in the first numerical check can be found at https://osf.io/txhbp/overview?view_only=0a2ed24ecc4448fbba104faa28e6f8c7.
C.1 Numerical Check I: Estimating the pseudoposterior with bgms and rstan
Our approach to Bayesian hypothesis testing about parameter differences between the two groups relies on the numerical procedures for Bayesian parameter estimation. Our software implementation consists of more than
$3\text {,}000$
lines of C++ code, which calls for a careful evaluation of whether the software works correctly. To verify that these numerical procedures are implemented correctly, we compare their estimates with the estimates from an alternative software implementation. Here, we consider the popular Stan language (Gelman et al., Reference Gelman, Lee and Guo2015) using the R package rstan (Stan Development Team, 2024a), which is a general-purpose probabilistic software package for Bayesian estimation in which the user can code their preferred model and prior distributions, and the software compiles a Hamiltonian Monte Carlo (HMC) procedure to simulate from the corresponding posterior distribution (Hoffman & Gelman, Reference Hoffman and Gelman2014; Neal, Reference Neal1996).

Figure C1 Scatterplots of the posterior means estimated using the Hamiltonian Monte Carlo approach, as implemented in the R package rstan, against the posterior means estimated using the Metropolis-within-Gibbs approach, as implemented in the R package bgms.
Our rstan implementation for simulating from the pseudoposterior distribution of the parameters of the two-group OMRF, which must also use the pseudolikelihood approximation, is new and can be used to as an alternative to bgms to analyze the two-group OMRF. However, the caveat of this software, which is the limitation of the HMC method implemented in Stan, is that it cannot handle discrete parameters, such as the indicator variables used in the Bayesian variable selection methods that we will use later to construct Bayes factor hypothesis tests about parameter differences between the two groups. However, the Savage-Dickey Bayes factor can be computed from the Stan output, which we will use as an alternative to the inclusion Bayes factor, which we can compute from the output of a fully Bayesian specification, that is, one that also takes the structure of the network into account.
To compare estimates from both implementations, we use the empirical dataset reported by Adamkovic et al. (Reference Adamkovič, Fedáková, Kentoš, Bozogáňová, Havrillová, Baník, Dedová and Piterová2022), in which the authors assessed life satisfaction, posttraumatic growth, coping strategies, and resilience in 696 cancer survivors. We consider an analysis in which they contrasted the network structure of (ordinal) responses to five items of the Satisfaction with Life Scale and six items of the Brief Resilience Scale between men and women. Their data are openly available at an online repository at https://osf.io/gxrje. We reanalyzed their data using the two-sample OMRF with the HMC procedure as implemented in the R package rstan and the MwG procedure as implemented in the R package bgms. The HMC procedure was run using the rstan defaults of four separate Markov chains, each running for
$2\text {,}000$
iterations with
$1\text {,}000$
burnin iterations, and the MwG procedure was run using a chain running for
$101\text {,}000$
iterations with
$1\text {,}000$
burnin iterations. Scatter plots of the posterior means estimated by the two methods are displayed in Figure 5, along with the correlation of their estimates, showing that the results are nearly identical for the two methods. Additional comparisons between the bgms and rstan results are shown in Appendix C.4.
C.2 Numerical Check II: Prior recovery
In a second numerical check, we verify that the Bayesian variable selection method works and is implemented correctly. In the absence of data, we know exactly what the target distribution of the MCMC procedure is, so a simple numerical check is to see if the procedures can recover the prior. To this end, we consider two checks. First, we check whether the MCMC procedure can recover the prior inclusion probabilities and the prior densities of the threshold differences in the two-group model. Second, we check whether the MCMC procedure can recover the prior inclusion probabilities and the prior densities of the pairwise differences in the two-group model. Since the procedure for estimating and selecting the pairwise interactions in the one-group model works in exactly the same way, we do not perform this additional check here.
For the numerical check of the threshold difference estimation and selection procedure, we consider a network with ten variables and three response categories (i.e.,
$m = 2$
threshold difference parameters per variable). This results in a total of twenty threshold difference parameters and ten difference indicators to be sampled. We ran the MCMC procedure for one million iterations. The estimated prior inclusion probabilities for the ten difference indicators and the cumulative density of the first threshold difference parameter are shown in Figure 6, and show that the method was able to recover both well. There is a slight discrepancy between the tails of the Cauchy prior and the recovered density using a normal proposal. This appears to be unique to the tails of the Cauchy distribution, as additional checks with other types of distributions did not reveal any differences.

Figure C2 The estimated prior inclusion probability of the threshold differences based on the outlined procedure is shown in the left panel, and the estimated prior density for one of the threshold difference parameters is shown in the right panel.
For the numerical check of the pairwise difference estimation and selection procedure, we consider a network with ten variables. This results in a total of 45 pairwise difference parameters and difference indicators to be sampled. We ran the MCMC procedure for one million iterations. The estimated prior inclusion probabilities for the 45 difference indicators and the cumulative density of the first pairwise difference parameter are shown in Figure 7 and show that the method was able to recover both well. Again, there is a slight discrepancy between the tails of the Cauchy prior and the recovered density using a normal proposal.

Figure C3 The estimated prior inclusion probability of the pairwise differences based on the outlined procedure is shown in the left panel, and the estimated prior density for one of the pairwise difference parameters is shown in the right panel.
C.3 Numerical Check III: A good check on the difference Bayes factors
Our first numerical check verified that the MCMC procedure was implemented correctly by showing that the estimates from bgms were consistent with estimates from an implementation of rstan, and our second check showed that the variable selection methodology was able to recover the prior distributions for the difference indicators and the difference parameters. Next, we want to verify that the MCMC procedure for Bayesian hypothesis testing is implemented correctly. That is, that the inclusion Bayes factor for testing for the presence of differences in MRF parameters between two independent groups is correctly computed. To implement this check, we cannot compare the bgms estimates with the rstan estimates because rstan cannot model the inclusion indicators. Therefore, we have to take a different approach and perform a numerical check proposed by Sekulovski et al. (Reference Sekulovski, Keetelaar, Haslbeck and Marsman2024) based on two theorems about the properties of Bayes factors attributed to Alan Turing and Jack Good (e.g., Good, Reference Good1984).
The first theorem states that the expected Bayes factor in favor of a false hypothesis is equal to one, that is,
$\mathbb {E}(\text {BF}_{01} \mid \mathcal {H}_1) = 1$
, while the second theorem generalizes this result to higher order moments. Sekulovski et al. (Reference Sekulovski, Keetelaar, Haslbeck and Marsman2024) have previously used this method to verify the computational accuracy of the inclusion Bayes factor for testing for the presence of edges in a single group, as proposed in Marsman et al. (Reference Marsman, van den Bergh and Haslbeck2025). We extend this approach to evaluate the computation of Bayes factors for testing differences in category thresholds and pairwise interaction parameters between two independent groups using the difference selection approach.
Following the procedure outlined by Sekulovski et al. (Reference Sekulovski, Keetelaar, Haslbeck and Marsman2024), we simulate
$N = 50{,}000$
data sets with
$p = 5$
binary variables for two groups consisting of
$n = 250$
cases each, both under
$\mathcal {H}_1$
and
$\mathcal {H}_0$
. Under
$\mathcal {H}_1$
, we simulate the data after setting all overall category threshold parameters
$\lambda _{ic}$
to
$-0.5$
, all interaction parameters
$\phi _{ij}$
to
$0. 5$
, all threshold differences
$\epsilon _{ic}$
and pairwise differences
$\delta _{ij}$
to zero, but sampling one threshold difference
$\epsilon _{11}$
and one pairwise difference
$\delta _{45}$
from a Cauchy(0,1) distribution. Under
$\mathcal {H}_0$
we simulate the data by also setting
$\epsilon _{11}$
and
$\delta _{45}$
to zero. We run the MCMC procedure on each of the simulated data sets with
$10{,}000$
iterations. Of the
$50,000$
analyses,
$43,632$
were successful, and for each of them we compute the inclusion Bayes factor for the threshold differences for variable 1 and for the pairwise difference for variables 4 and 5.
For data sets simulated under
$\mathcal {H}_1$
, we compute the cumulative first and second moments of the Bayes factors in favor of
$\mathcal {H}_0$
about the origin (i.e.,
$\mathbb {E}(\text {BF}_{01}\mid \mathcal {H}_1)$
and
$\mathbb {E}(\text {BF}_{01}\mid \mathcal {H}_1)^2$
) for both the threshold and partial association difference parameters. We also compute the first moment of the Bayes factor in favor of
$\mathcal {H}_0$
using the data simulated under
$\mathcal {H}_0$
. The results are shown in Figure 8.

Figure C4 Monte Carlo estimates of the expected Bayes factor
$\text {BF}_{01}$
as a function of the number of synthetic data sets generated under
$\mathcal {H}_1$
.
Note: Both the threshold and the pairwise difference Bayes factors tend to one. In addition, the first- and second-order moments are similar.
Figure 8 shows that both the Bayes factor for the category threshold difference and the Bayes factor for the pairwise difference converge to one. In addition, the mean Bayes factor in favor of the true hypothesis for the data simulated under
$\mathcal {H}_0$
is approximately equal to the second moment of the Bayes factor in favor of the false hypothesis for the data simulated under
$\mathcal {H}_1$
. According to Sekulovski et al. (Reference Sekulovski, Keetelaar, Haslbeck and Marsman2024), this result provides strong evidence for the correct calculation of Bayes factors.
C.4 Numerical Check I, continued: Parameter estimates and convergence diagnosis
The first numerical check verified that the MCMC procedure was correctly implemented in bgms, since the posterior means obtained from bgms and rstan were indistinguishable. Here we examine the quality of the MCMC output of the two approaches.
We generated a data set using the same setup we used for Numerical Check III under
$\mathcal {H}_1$
. We ran the rstan software with its default settings: Four independent chains of
$2,000$
iterations each, discarding the first
$1,000$
iterations as a warm-up. Similarly, we ran four independent chains of
$10,000$
iterations each for bgms, discarding the first
$1,000$
iterations as a warm-up. This is the default number of iterations for bgms, although unlike rstan, the bgms software runs a single chain by default.
The posterior means and standard deviations for the estimates from the two procedures are listed in Table C1. Again, these are very similar between the two methods.
To assess the convergence of the Markov chains, we compute the Gelman-Rubin statistic (
$\hat {R}$
, Gelman & Rubin, Reference Gelman and Rubin1992), which compares the within-chain variance to the between-chain variance.
$\hat {R}$
values close to one indicate convergence, but there is some variability in what is considered close to one, with cutoff values ranging from 1.1, 1.05, to 1.01 (e.g. Gelman & Rubin, Reference Gelman and Rubin1992; Johnson et al., Reference Johnson, Ott and Dogucu2022; Vehtari et al., Reference Vehtari, Gelman, Simpson, Carpenter and Bürkner2021). For our example, the
$\hat {R}$
statistic is equal to or less than 1.01 for all parameters except one, which is equal to 1.02. This indicates satisfactory convergence for the two liberal cutoffs, but not for the most stringent, suggesting that a few more samples are needed.
We also computed the effective sample size (
$n_{eff}$
), which estimates how many independent samples from the posterior would carry the same information as the dependent samples produced by the two methods. The methods show a striking difference in effective sample sizes (
$n_{eff}$
), with the rstan samples consistently producing larger
$n_{eff}$
values than the bgms samples, even though the former ran for far fewer iterations than the latter. The Hamiltonian Monte Carlo (HMC) procedure in rstan produces much less autocorrelation between successive states of the Markov chain than the adaptive Metropolis–Hastings within Gibbs procedure in bgms. Note that the
$n_{eff}$
values for the Stan model for some of the parameters exceed the total number of iterations of the sampler (excluding the warm-up). For some estimation problems in Stan, HMC can produce such anticorrelated draws, which can cause the effective sample size to exceed the number of iterations (Stan Development Team, 2024b). This suggests that while the Markov chain converges quickly, it requires many samples to efficiently explore the posterior distribution, e.g., to estimate the tails of the posterior distribution. For this reason, we usually suggest running the MCMC procedure for many more iterations (e.g., Reference Huth, Keetelaar, Sekulovski, van den Bergh and Marsman2024).
Table C1 Posterior means and standard deviations, and convergence diagnostics, from bgms and rstan output

Note: Results are rounded to two decimal places.
















