1 Introduction
 A wide variety of cognitive diagnosis models (CDMs) exist in the literature. Typically, these models were proposed for dichotomous attributes appropriate for determining, say, skill mastery or nonmastery. However, in many applications, classifying students into more than two categories is more instructionally relevant. Such classifications require polytomous attributes, where the attribute levels can be ordinal categories (e.g., no mastery, basic mastery, and advanced mastery). For example, the proportional reasoning (PR) assessment developed to measure the PR skills for middle school (equivalently, secondary) students (Tjoe & de la Torre, Reference Tjoe and de la Torre2013, Reference Tjoe and de la Torre2014) involves two three-level polytomous attributes, namely, (a) comparing and ordering of fractions, where level 0 represents nonmastery of the attribute, level 1 the ability to compare two fractions, and level 2 the ability to order three or more fractions and (b) constructing ratios and proportions, where level 0 represents nonmastery, level 1 the ability to construct a single ratio, and level 2 the ability to construct a proportion, which is made up of two ratios. Attribute levels can also be nominal categories representing different content domains. For example, level 1 represents the prerequisite skills (e.g., add and subtract) in the attribute hierarchy, whereas level 2 the advanced skills (e.g., multiply and divide). In general, any M-level polytomous attribute can be equivalently represented by 
 $M-1$
 dichotomous attributes that follow a linear hierarchy (Leighton et al., Reference Leighton, Gierl and Hunka2004). In the example above, the polytomous attribute comparing/ordering fractions can be split into two dichotomous attributes, where the first deals with two fractions and the second with three or more fractions, and the mastery of the former is a prerequisite to the mastery the latter. The prerequisite relationship will constrain the number of possible mastery combinations to three, namely, 00, 10, and 11, which is equivalent to levels 0, 1, and 2 of the original polytomous attribute. Finally, it is important to underscore that this paper focuses on the attributes with more than two categories (i.e., polytomous attributes), rather than responses with more than two categories (i.e., polytomous responses). This distinction is necessary because both CDMs for polytomous attributes and those for polytomous responses have been referred to as polytomous CDMs in the literature.
$M-1$
 dichotomous attributes that follow a linear hierarchy (Leighton et al., Reference Leighton, Gierl and Hunka2004). In the example above, the polytomous attribute comparing/ordering fractions can be split into two dichotomous attributes, where the first deals with two fractions and the second with three or more fractions, and the mastery of the former is a prerequisite to the mastery the latter. The prerequisite relationship will constrain the number of possible mastery combinations to three, namely, 00, 10, and 11, which is equivalent to levels 0, 1, and 2 of the original polytomous attribute. Finally, it is important to underscore that this paper focuses on the attributes with more than two categories (i.e., polytomous attributes), rather than responses with more than two categories (i.e., polytomous responses). This distinction is necessary because both CDMs for polytomous attributes and those for polytomous responses have been referred to as polytomous CDMs in the literature.
To accommodate polytomous attributes, several CDMs have been developed in the literature, which are summarized and shown in Table 1 according to several key features such as general or constrained, link function, and core assumptions. For example, Templin (Reference Templin2004) extended the reparameterized unified model (RUM; Hartz, Reference Hartz2002) for polytomous attributes (RUM-PA) and proposed a constrained version (cRUM-PA), whereas Karelitz (Reference Karelitz2004) proposed the ordered category attribute coding (OCAC) framework in conjunction with the deterministic input, noisy “and” gate (DINA) model (Junker & Sijtsma, Reference Junker and Sijtsma2001) to define the mastery levels as multiple ordered categories. By defining accuracy with fast speed as the highest level of an attribute, accuracy with slow speed as intermediate level, and nonmastery as the lowest level, Wang and Chen (Reference Wang and Chen2020) extended the DINA model to be the response accuracy model (RAM) model to measure students’ fluency in answering the test items. Recently, Yakar et al. (Reference Yakar, Dogăn and de la Torre2021) developed a fully additive model for polytomous attributes (fA-M), which accounts for the effects of each attribute levels. However, these models are deemed to be not general enough mainly because the models focused on a specific and constrained CDM.
Table 1 Summary of existing cognitive diagnosis models for polytomous attributes

Note: RUM-PA: reparameterized unified model for polytomous attributes; cRUM-PA: constrained RUM-PA; OCAC-DINA: deterministic input, noisy “and” gate model with ordered category attribute coding; fA-M: fully additive model for polytomous attributes; PDCM: polytomous diagnostic classification model; cPDCM: constrained PDCM; GDM-PA: general diagnostic model for polytomous attributes; pG-DINA: polytomous generalized DINA model; SALM: specific attribute level mastery; RAM: response accuracy model.
The existing general CDMs for polytomous attributes include those proposed within the log-linear cognitive diagnosis model (LCDM; Henson et al., Reference Henson, Templin and Willse2009), that within the general diagnostic model (GDM; von Davier, Reference von Davier2008), and that within the generalized deterministic input, noisy “and” gate (G-DINA) model (de la Torre, Reference de la Torre2011).
 The polytomous diagnostic classification model (PDCM) framework (Bao, Reference Bao2019) extends the measurement and structural models of the LCDM to the polytomous attribute setting. Note that in the PDCM framework, only the attribute patterns (
 $\boldsymbol {\alpha }_l$
) are polytomous, whereas the Q-matrix entries remain binary. In contrast, both the attribute pattern and Q-matrix entries are polytomous, for other CDMs in this paper, such as the OCAC framework, GDM, and the proposed framework. The probabilities between different levels in PDCM can be varied for greater flexibility or be equal for smaller number of parameters. The PDCM uses the dummy coding approach in which the M levels of an attribute are coded as
$\boldsymbol {\alpha }_l$
) are polytomous, whereas the Q-matrix entries remain binary. In contrast, both the attribute pattern and Q-matrix entries are polytomous, for other CDMs in this paper, such as the OCAC framework, GDM, and the proposed framework. The probabilities between different levels in PDCM can be varied for greater flexibility or be equal for smaller number of parameters. The PDCM uses the dummy coding approach in which the M levels of an attribute are coded as 
 $(M-1)$
 dummy variables and the combinations of the dummy variables—representing different knowledge states—are treated as the polytomous attribute levels. For example, for three levels of an attribute, they are coded with two dummy variables and as (0,0), (1,0), and (1,1) to represent nonmastery, intermediate mastery, and mastery. This coding approach might be workable when the number of levels in attributes and the number of attributes in a test are moderate. It becomes tedious and hard to interpret the representation of the knowledge states when the number of levels and attributes are large.
$(M-1)$
 dummy variables and the combinations of the dummy variables—representing different knowledge states—are treated as the polytomous attribute levels. For example, for three levels of an attribute, they are coded with two dummy variables and as (0,0), (1,0), and (1,1) to represent nonmastery, intermediate mastery, and mastery. This coding approach might be workable when the number of levels in attributes and the number of attributes in a test are moderate. It becomes tedious and hard to interpret the representation of the knowledge states when the number of levels and attributes are large.
 With a proper choice of the central component function, as in, the function 
 $h(\cdot )$
 that maps the attribute levels using the Q-matrix entries, the GDM can flexibly accommodate polytomous attributes. For example, a useful and reasonable choice of
$h(\cdot )$
 that maps the attribute levels using the Q-matrix entries, the GDM can flexibly accommodate polytomous attributes. For example, a useful and reasonable choice of 
 $h(\cdot )$
 is defined as
$h(\cdot )$
 is defined as 
 $h= min(q_k,\alpha _k)$
. As a result, an attribute level that is higher than
$h= min(q_k,\alpha _k)$
. As a result, an attribute level that is higher than 
 $h(\cdot )$
 will not increase the probability of solving an item, whereas that is lower than
$h(\cdot )$
 will not increase the probability of solving an item, whereas that is lower than 
 $h(\cdot )$
 results in a lower success probability. In other words, while there is no distinction between groups who possess the required attribute level and who have an even higher level, there is distinction between groups whose attribute levels are lower than the required level. Nonetheless, the GDM for polytomous attributes (GDM-PA) has neither been examined with simulation studies in enough details nor applied to the real data.
$h(\cdot )$
 results in a lower success probability. In other words, while there is no distinction between groups who possess the required attribute level and who have an even higher level, there is distinction between groups whose attribute levels are lower than the required level. Nonetheless, the GDM for polytomous attributes (GDM-PA) has neither been examined with simulation studies in enough details nor applied to the real data.
With respect to the G-DINA model for polytomous attributes, namely, the pG-DINA model (Chen & de la Torre, Reference Chen and de la Torre2013), the model relies on a core assumption, which is referred to the specific attribute level mastery (SALM), where each item is assumed to separate examinees into two reduced latent groups—those who are on or above a specific attribute level, and those who are below it. With the SALM assumption, some levels in the pG-DINA do not increase the success probability. Such a constraint may be too stringent because attribute vectors in the same reduced latent group are very likely to have varying levels with respect to the required attributes and thus their probabilities of success may not be identical.
The first and primary aim of this paper is to propose a general CDM framework for polytomous attributes, the saturated polytomous cognitive diagnosis model (sp-CDM), which is analogous to the G-DINA model for dichotomous attributes. Specifically, the proposed model extends the pG-DINA model by relaxing the SALM assumption and allows for the different attribute levels to contribute differentially to the success probability. This work also aims to derive the special cases of the sp-CDM under different constraints and show the mathematical relationships between the sp-CDM and the existing CDMs for polytomous attributes and its special cases. Third, this work aims to address the estimation of the sp-CDM, to examine parameter recovery using the proposed estimation algorithms, and the consequences of fitting constrained and unnecessarily complex models across a range of conditions. Finally, the study aims to demonstrate the application of the sp-CDM with a real data of PR assessment.
This paper contributes to the literature by developing a unified framework for polytomous attributes. The proposed model has three unique features: (1) Compared to the existing CDMs for polytomous attributes, where some attribute levels share identical success probabilities, the sp-CDM allows for different attribute levels to have their unique contributions to the success probability; (2) the sp-CDM is formulated with alternative link functions, thus, making it more general; and (3) due to the different model formulations, the existing models can be mathematically shown to be special cases of the various forms of the sp-CDM with appropriate constraints. Despite the similar structure of this work to that of the G-DINA (de la Torre, Reference de la Torre2011) or the pG-DINA (Chen & de la Torre, Reference Chen and de la Torre2013) model, the fundamental differences are substantial. Specifically, the formulations, the estimations, and the implications of three models are substantially different.
2 The generalized cognitive diagnosis model framework for polytomous attributes
 The generalized CDM framework for polytomous attributes can be expressed as three saturated models under different link functions. Let J be the number of items, K the number of attributes, and 
 $M_k$
 the number of levels of attribute k. For notational convenience, but without loss of generality, it can be assumed that
$M_k$
 the number of levels of attribute k. For notational convenience, but without loss of generality, it can be assumed that 
 $M_k=M$
, indicating the number of levels is identical for all attributes. Thus, there will be a total of
$M_k=M$
, indicating the number of levels is identical for all attributes. Thus, there will be a total of 
 $\prod _{k=1}^{K}M_k=M^K$
 attribute patterns or latent classes. Let
$\prod _{k=1}^{K}M_k=M^K$
 attribute patterns or latent classes. Let 
 $K_j^{*}=\sum _{k=1}^{K}I(q_{jk}> 0)$
 be the number of required attributes for the item j, j=1, …, J. Again, for notational convenience, let the first
$K_j^{*}=\sum _{k=1}^{K}I(q_{jk}> 0)$
 be the number of required attributes for the item j, j=1, …, J. Again, for notational convenience, let the first 
 $K_j^*$
 attributes be the required attributes for item j. We use
$K_j^*$
 attributes be the required attributes for item j. We use 
 $\textbf {q}_j=(q_1,\ldots \,q_{K_j^*},\textbf {0})_{1 \times K}$
 to denote the required levels in the
$\textbf {q}_j=(q_1,\ldots \,q_{K_j^*},\textbf {0})_{1 \times K}$
 to denote the required levels in the 
 $K_j^*$
 attributes to answer the item j correctly, and
$K_j^*$
 attributes to answer the item j correctly, and 
 $\boldsymbol \alpha ^{*}_{l} = (\alpha _{1},\ldots ,\alpha _{k},\ldots ,\alpha ^{*}_{K_j^*})_{1 \times K_j^*}$
 the lth reduced attribute pattern or latent group. As can be seen from above, the entries in both the
$\boldsymbol \alpha ^{*}_{l} = (\alpha _{1},\ldots ,\alpha _{k},\ldots ,\alpha ^{*}_{K_j^*})_{1 \times K_j^*}$
 the lth reduced attribute pattern or latent group. As can be seen from above, the entries in both the 
 $\textbf {q}$
 and
$\textbf {q}$
 and 
 $\boldsymbol \alpha ^{*}$
 can have more than two categories.
$\boldsymbol \alpha ^{*}$
 can have more than two categories.
 To illustrate, consider 
 $K=3$
,
$K=3$
, 
 $M=3$
, and
$M=3$
, and 
 $\textbf {q}=(1,2,0)$
, which indicates that the first level of
$\textbf {q}=(1,2,0)$
, which indicates that the first level of 
 $\alpha _1$
 and the second level of
$\alpha _1$
 and the second level of 
 $\alpha _2$
 are required for the item, hence,
$\alpha _2$
 are required for the item, hence, 
 $K_j^*=2$
. For this example, there are
$K_j^*=2$
. For this example, there are 
 $M^K=27$
 latent classes, which will be partitioned into
$M^K=27$
 latent classes, which will be partitioned into 
 $M^{K_j^*}=9$
 latent groups. Specifically, for this item, the latent classes
$M^{K_j^*}=9$
 latent groups. Specifically, for this item, the latent classes 
 $\boldsymbol {\alpha }_{l}$
 and
$\boldsymbol {\alpha }_{l}$
 and 
 $\boldsymbol {\alpha }_{l'}$
 are classified in the same latent group when
$\boldsymbol {\alpha }_{l'}$
 are classified in the same latent group when 
 $\alpha _{l1}=\alpha _{l'1}$
 and
$\alpha _{l1}=\alpha _{l'1}$
 and 
 $\alpha _{l2}=\alpha _{l'2}$
. For example, the latent classes 000, 001, and 002 all belong to the latent group 00.
$\alpha _{l2}=\alpha _{l'2}$
. For example, the latent classes 000, 001, and 002 all belong to the latent group 00.
The item response function (IRF) of the proposed model using the identity link function is given by
 $$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)&=\delta_{j0}+\sum_{k=1}^{K_j^*} \sum_{m=1}^{M-1} \delta_{jkm} I \big[\alpha_{lk}=m\big] + \sum_{k'>k}^{K_j^*} \sum_{k=1}^{{K_j^*}-1} \sum_{m=1}^{M-1} \sum_{m'=1}^{M-1} \delta_{jkm k'm'} I \big[\alpha_{lk}=m\big] I \big[\alpha_{lk'}=m'\big] \nonumber\\ &\quad+\dotsb+\sum_{m_1=1}^{M-1} \sum_{m_2=1}^{M-1} \dotso \sum_{m_{K_j^*}=1}^{M-1} \delta_{j1{m_1}2{m_2} \dotso {K_j^*} m_{K_j^*}} \prod_{k=1}^{K_j^*} I \big[\alpha_{lk}=m_k\big], \end{align} $$
$$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)&=\delta_{j0}+\sum_{k=1}^{K_j^*} \sum_{m=1}^{M-1} \delta_{jkm} I \big[\alpha_{lk}=m\big] + \sum_{k'>k}^{K_j^*} \sum_{k=1}^{{K_j^*}-1} \sum_{m=1}^{M-1} \sum_{m'=1}^{M-1} \delta_{jkm k'm'} I \big[\alpha_{lk}=m\big] I \big[\alpha_{lk'}=m'\big] \nonumber\\ &\quad+\dotsb+\sum_{m_1=1}^{M-1} \sum_{m_2=1}^{M-1} \dotso \sum_{m_{K_j^*}=1}^{M-1} \delta_{j1{m_1}2{m_2} \dotso {K_j^*} m_{K_j^*}} \prod_{k=1}^{K_j^*} I \big[\alpha_{lk}=m_k\big], \end{align} $$
where 
 $\delta _{j0}$
 is the intercept,
$\delta _{j0}$
 is the intercept, 
 $\delta _{jkm}$
 is the main effect of the mth level of attribute k,
$\delta _{jkm}$
 is the main effect of the mth level of attribute k, 
 $\delta _{jkmk'm'}$
 is the two-way interaction effect of the mth level of attribute k and the
$\delta _{jkmk'm'}$
 is the two-way interaction effect of the mth level of attribute k and the 
 $m'$
th level of attribute
$m'$
th level of attribute 
 $k'$
, and
$k'$
, and 
 $\delta _{j1{m_1}2{m_2} \dots {K_j^*}m_{K_j^*}}$
 is the
$\delta _{j1{m_1}2{m_2} \dots {K_j^*}m_{K_j^*}}$
 is the 
 $K_j^*$
-way interaction effect of the
$K_j^*$
-way interaction effect of the 
 $m_1$
th level of the
$m_1$
th level of the 
 $\alpha _1$
,
$\alpha _1$
, 
 $m_2$
th level of the
$m_2$
th level of the 
 $\alpha _2$
, up to the
$\alpha _2$
, up to the 
 $m_{K_j^*}$
th level of
$m_{K_j^*}$
th level of 
 $\alpha _{K_j^*}$
. It can be further noted that the subscript m of
$\alpha _{K_j^*}$
. It can be further noted that the subscript m of 
 $\delta _{jkm}$
 indicates that each attribute level in
$\delta _{jkm}$
 indicates that each attribute level in 
 $\alpha _k$
 contributes differentially to the success probability, as in, the steps between adjacent levels vary (e.g., the step between “no mastery” and “basic mastery” is different from that between “basic mastery” and “advanced mastery”). To reduce the number of parameters and, hence, model complexity, it can be assumed that the steps between levels within
$\alpha _k$
 contributes differentially to the success probability, as in, the steps between adjacent levels vary (e.g., the step between “no mastery” and “basic mastery” is different from that between “basic mastery” and “advanced mastery”). To reduce the number of parameters and, hence, model complexity, it can be assumed that the steps between levels within 
 $\alpha _k$
 are identical, which reduces
$\alpha _k$
 are identical, which reduces 
 $\delta _{jkm}$
 to
$\delta _{jkm}$
 to 
 $\delta _{jk}$
.
$\delta _{jk}$
.
In addition to the identity link function, the sp-CDM can also be formulated with the logit and log links. Despite the similar forms, the models using different link functions are essentially different in terms of the values and interpretations of the parameters. For this reason, different notations are used for parameters under formulations with logit and log link functions.
For the logit link,
 $$ \begin{align} \textrm{logit}[P_j(\boldsymbol\alpha_{l}^*)]&=\lambda_{j0}+\sum_{k=1}^{K_j^*} \sum_{m=1}^{M-1} \lambda_{jkm} I \big[\alpha_{lk}=m\big] + \sum_{k'>k}^{K_j^*} \sum_{k=1}^{{K_j^*}-1} \sum_{m=1}^{M-1} \sum_{m'=1}^{M-1} \lambda_{jkmk'm'} I \big[\alpha_{lk}=m\big] I \big[\alpha_{lk'}=m'\big] \nonumber \\ &\quad+\dotsb+\sum_{m_1=1}^{M-1} \sum_{m_2=1}^{M-1} \dotso \sum_{m_{K_j^*}=1}^{M-1} \lambda_{j1{m_1}2{m_2} \dotso {K_j^*} m_{K_j^*}} \prod_{k=1}^{K_j^*} I \big[\alpha_{lk}=m_k\big]. \end{align} $$
$$ \begin{align} \textrm{logit}[P_j(\boldsymbol\alpha_{l}^*)]&=\lambda_{j0}+\sum_{k=1}^{K_j^*} \sum_{m=1}^{M-1} \lambda_{jkm} I \big[\alpha_{lk}=m\big] + \sum_{k'>k}^{K_j^*} \sum_{k=1}^{{K_j^*}-1} \sum_{m=1}^{M-1} \sum_{m'=1}^{M-1} \lambda_{jkmk'm'} I \big[\alpha_{lk}=m\big] I \big[\alpha_{lk'}=m'\big] \nonumber \\ &\quad+\dotsb+\sum_{m_1=1}^{M-1} \sum_{m_2=1}^{M-1} \dotso \sum_{m_{K_j^*}=1}^{M-1} \lambda_{j1{m_1}2{m_2} \dotso {K_j^*} m_{K_j^*}} \prod_{k=1}^{K_j^*} I \big[\alpha_{lk}=m_k\big]. \end{align} $$
For the log link,
 $$ \begin{align} \textrm{log}[P_j(\boldsymbol\alpha_{l}^*)]&=\nu_{j0}+\sum_{k=1}^{K_j^*} \sum_{m=1}^{M-1} \nu_{jkm} I \big[\alpha_{lk}=m\big] + \sum_{k'>k}^{K_j^*} \sum_{k=1}^{{K_j^*}-1} \sum_{m=1}^{M-1} \sum_{m'=1}^{M-1} \nu_{jkmk'm'} I \big[\alpha_{lk}=m\big] I \big[\alpha_{lk'}=m' \big] \nonumber \\ &\quad +\dotsb+\sum_{m_1=1}^{M-1} \sum_{m_2=1}^{M-1} \dotso \sum_{m_{K_j^*}=1}^{M-1} \nu_{j1{m_1}2{m_2} \dotso {K_j^*} m_{K_j^*}} \prod_{k=1}^{K_j^*} I \big[\alpha_{lk}=m_k\big]. \end{align} $$
$$ \begin{align} \textrm{log}[P_j(\boldsymbol\alpha_{l}^*)]&=\nu_{j0}+\sum_{k=1}^{K_j^*} \sum_{m=1}^{M-1} \nu_{jkm} I \big[\alpha_{lk}=m\big] + \sum_{k'>k}^{K_j^*} \sum_{k=1}^{{K_j^*}-1} \sum_{m=1}^{M-1} \sum_{m'=1}^{M-1} \nu_{jkmk'm'} I \big[\alpha_{lk}=m\big] I \big[\alpha_{lk'}=m' \big] \nonumber \\ &\quad +\dotsb+\sum_{m_1=1}^{M-1} \sum_{m_2=1}^{M-1} \dotso \sum_{m_{K_j^*}=1}^{M-1} \nu_{j1{m_1}2{m_2} \dotso {K_j^*} m_{K_j^*}} \prod_{k=1}^{K_j^*} I \big[\alpha_{lk}=m_k\big]. \end{align} $$
 Equations (1), (2), and (3) are referred to as the saturated polytomous cognitive diagnosis model (sp-CDM) under the identity, logit, and log link functions, respectively. The number of parameters for item j for the three models is equal to the number of latent groups (i.e., 
 $M^{K_j^*}$
). Thus, the models offer much greater generality compared to the existing CDMs for polytomous attributes. Although flexible, the large number of parameters in these models can make their estimation challenging. Therefore, simpler and more interpretable models with fewer parameters are sometimes warranted. Note that the number of parameters for the saturated models does not take into account the required attribute levels—it is computed as the product of the maximum levels of the required attributes. Thus, in the example above, in addition to
$M^{K_j^*}$
). Thus, the models offer much greater generality compared to the existing CDMs for polytomous attributes. Although flexible, the large number of parameters in these models can make their estimation challenging. Therefore, simpler and more interpretable models with fewer parameters are sometimes warranted. Note that the number of parameters for the saturated models does not take into account the required attribute levels—it is computed as the product of the maximum levels of the required attributes. Thus, in the example above, in addition to 
 $(1,2,0)$
, the q-vectors
$(1,2,0)$
, the q-vectors 
 $(1,1,0)$
,
$(1,1,0)$
, 
 $(2,1,0)$
, and
$(2,1,0)$
, and 
 $(2,2,0)$
 will result in the same saturated models.
$(2,2,0)$
 will result in the same saturated models.
3 Special cases
This section introduces several simplified CDMs for polytomous attributes with different assumptions, namely, the conjunctive, disjunctive, and additive assumptions, and how they can be derived from the sp-CDM by imposing appropriate constraints.
The conjunctive version of the sp-CDM
In the conjunctive version of the sp-CDM (conj-sp-CDM), it is assumed that examinees should possess levels that are equal to or higher than the required levels in all the required attributes are expected to answer the item correctly. Alternatively, persons who lack at least one of the required attributes, or possess levels lower than those required in at least one of the required attributes, are expected to answer the item incorrectly. Hence, the IRF of the conj-sp-CDM can be expressed as
 $$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\left\{\begin{matrix} g_{j} & \textrm{if} \;\; I[\{\boldsymbol\alpha_{l}^{*}\geq \textbf{q}_j\}] \prec \textbf{1}_{K_j^{*}}, \\ 1-s_{j} & \textrm{otherwise}, \end{matrix}\right. \end{align} $$
$$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\left\{\begin{matrix} g_{j} & \textrm{if} \;\; I[\{\boldsymbol\alpha_{l}^{*}\geq \textbf{q}_j\}] \prec \textbf{1}_{K_j^{*}}, \\ 1-s_{j} & \textrm{otherwise}, \end{matrix}\right. \end{align} $$
where the symbol 
 $\{\}$
 insides
$\{\}$
 insides 
 $ I[\cdot ]$
 indicates that the operation be carried out attribute by attribute and
$ I[\cdot ]$
 indicates that the operation be carried out attribute by attribute and 
 $ I[\cdot ]=1$
 if
$ I[\cdot ]=1$
 if 
 $\boldsymbol \alpha _{l}^{*}\geq \textbf {q}_j$
 and 0 otherwise.
$\boldsymbol \alpha _{l}^{*}\geq \textbf {q}_j$
 and 0 otherwise. 
 $\textbf {1}_{K_j^{*}}$
 is a vector of ones and of length
$\textbf {1}_{K_j^{*}}$
 is a vector of ones and of length 
 ${K_j^{*}}$
. The
${K_j^{*}}$
. The 
 $\prec $
 symbol indicates with respect to
$\prec $
 symbol indicates with respect to 
 ${K_j^{*}}$
 required attributes, which is a partially order set of K attributes, at least one of the elements in the results of
${K_j^{*}}$
 required attributes, which is a partially order set of K attributes, at least one of the elements in the results of 
 $I[\cdot ]$
 is less than 1. As shown in Equation (4), the conj-sp-CDM has two parameters for item j.
$I[\cdot ]$
 is less than 1. As shown in Equation (4), the conj-sp-CDM has two parameters for item j.
 On the surface, the formulation of the conj-sp-CDM is similar to that of the DINA model. However, the parameters in Equation (4) require more complicated interpretations. For example, the 
 $g_{j}$
 in the equation is the probability of correctly answering item j for individuals who lack at least one of the prescribed attributes, or who possess the required attributes, but with levels lower than required levels in the prescribed attributes; the
$g_{j}$
 in the equation is the probability of correctly answering item j for individuals who lack at least one of the prescribed attributes, or who possess the required attributes, but with levels lower than required levels in the prescribed attributes; the 
 $1-s_{j}$
 represents the probability of individuals who have attribute levels that are all at least equal to the required levels answering the item incorrectly. Thus, in the conj-sp-CDM model, the
$1-s_{j}$
 represents the probability of individuals who have attribute levels that are all at least equal to the required levels answering the item incorrectly. Thus, in the conj-sp-CDM model, the 
 $M^{K_j^*}$
 attribute vectors are classified into two latent groups—attribute vectors that jointly satisfy the required levels prescribed for item are classified in one group and the rest of the attribute vectors in the other group.
$M^{K_j^*}$
 attribute vectors are classified into two latent groups—attribute vectors that jointly satisfy the required levels prescribed for item are classified in one group and the rest of the attribute vectors in the other group.
The conj-sp-CDM model can be derived from the identity sp-CDM (i.e., Equation (1)) by imposing the following three constraints:
 (1) All the 
 $(M-1)K_j^*$
 main effects are equal to zero, as in,
$(M-1)K_j^*$
 main effects are equal to zero, as in, 
 $\delta _{jkm}=0$
 for
$\delta _{jkm}=0$
 for 
 $k=1,\ldots ,K_j^*$
 and
$k=1,\ldots ,K_j^*$
 and 
 $m=1,\ldots ,M-1$
;
$m=1,\ldots ,M-1$
;
 (2) All the 
 $\prod _{k=1}^{K_j^*}(M-q_k)$
 interaction terms involving attribute levels at least equal to the required attribute levels are identical, as in,
$\prod _{k=1}^{K_j^*}(M-q_k)$
 interaction terms involving attribute levels at least equal to the required attribute levels are identical, as in, 
 $\delta _{j1m_1^{(+)}2m_2^{(+)} \dots K_{j}^{*}m_{K_j^*}^{(+)} }=\delta _{j1q_{1}2q_{2} \dots K_j^*q_{K_j^*} }$
, where
$\delta _{j1m_1^{(+)}2m_2^{(+)} \dots K_{j}^{*}m_{K_j^*}^{(+)} }=\delta _{j1q_{1}2q_{2} \dots K_j^*q_{K_j^*} }$
, where 
 $m_k^{(+)}= q_k,\ldots ,M-1$
; and
$m_k^{(+)}= q_k,\ldots ,M-1$
; and
 (3) The remaining 
 $M^{K_j^*}-(M-1)K_j^*-\prod _{k=1}^{K_j^*}(M-q_k)-1$
 interaction effects involving at least one attribute level below the required level are all equal to zero.
$M^{K_j^*}-(M-1)K_j^*-\prod _{k=1}^{K_j^*}(M-q_k)-1$
 interaction effects involving at least one attribute level below the required level are all equal to zero.
With the conjunctive assumptions imposed on the identity sp-CDM, the IRF of the conj-sp-CDM can also be expressed as
 $$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\delta_{j0}+ \delta_{j1{q_1}2{q_2} \dotso {K_j^*} q_{K_j^*}} \prod_{k=1}^{K_j^*}I\big[\alpha_{lk} \geq q_{jk}\big]. \end{align} $$
$$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\delta_{j0}+ \delta_{j1{q_1}2{q_2} \dotso {K_j^*} q_{K_j^*}} \prod_{k=1}^{K_j^*}I\big[\alpha_{lk} \geq q_{jk}\big]. \end{align} $$
The disjunctive version of the sp-CDM
In the disjunctive version of the sp-CDM (disj-sp-CDM), the IRF is given by
 $$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\left\{\begin{matrix} 1-s^{\prime}_{j} & \textrm{if} \;\; I[\{\boldsymbol\alpha^{*}_{l} \ \geq \textbf{q}_j \}] \succ \textbf{0}_{K_j^{*}}, \\ g^{\prime}_{j} & \textrm{otherwise}, \end{matrix}\right. \end{align} $$
$$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\left\{\begin{matrix} 1-s^{\prime}_{j} & \textrm{if} \;\; I[\{\boldsymbol\alpha^{*}_{l} \ \geq \textbf{q}_j \}] \succ \textbf{0}_{K_j^{*}}, \\ g^{\prime}_{j} & \textrm{otherwise}, \end{matrix}\right. \end{align} $$
where 
 $1-s^{\prime }_{j}$
 is the probability of not slipping for persons who possess levels that are equal to or higher than the required levels in at least one required attribute, and
$1-s^{\prime }_{j}$
 is the probability of not slipping for persons who possess levels that are equal to or higher than the required levels in at least one required attribute, and 
 $g^{\prime }_{j}$
 is the success probability that persons who possess none of required attributes, or possess at least one of the required attributes, but all of which have levels that are lower than the required levels. As such, the disj-sp-CDM has two parameters for each item.
$g^{\prime }_{j}$
 is the success probability that persons who possess none of required attributes, or possess at least one of the required attributes, but all of which have levels that are lower than the required levels. As such, the disj-sp-CDM has two parameters for each item.
The disj-sp-CDM can be derived from the identity sp-CDM with the following constraints:
 (1) The main effects 
 $ \delta _{jkm^{(-)}}=0$
 and
$ \delta _{jkm^{(-)}}=0$
 and 
 $\delta _{jkm^{(+)}} = \delta _{jkq_k}$
, where
$\delta _{jkm^{(+)}} = \delta _{jkq_k}$
, where 
 $m_k^{(-)}=1,\ldots ,q_{k}-1$
 represents levels of attribute k that are lower than the required level
$m_k^{(-)}=1,\ldots ,q_{k}-1$
 represents levels of attribute k that are lower than the required level 
 $q_k$
, and
$q_k$
, and 
 $m_k^{(+)}$
 is as defined as above.
$m_k^{(+)}$
 is as defined as above.
 (3) The remaining interaction effects (i.e., those involving at least one 
 $m_k^{(-)}$
) are of equal to zero.
$m_k^{(-)}$
) are of equal to zero.
With these assumptions, the identity sp-CDM can be reduced to be the disj-sp-CDM as
 $$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)&=\delta_{j0}+ \sum_{k=1}^{K_j^*}\delta_{jkq_k} I\big[\alpha_{lk} \geq q_k\big]+ \sum_{k'>k}^{K_j^*}\sum_{k=1}^{K_j^*}\delta_{jkq_{k}k'q_{k'}}I\big[\alpha_{lk} \geq q_{k}\big] I\big[\alpha_{lk'} \geq q_{k'}\big] + \nonumber\\ &\quad \dots +\delta_{j1q_{1}2q_{2}\dots K_j^*q_{K_j^*} } I\big[\alpha_{l1} \geq q_{1}\big] I\big[\alpha_{l2} \geq q_{2}\big] \dots I\big[\alpha_{lK_j^* } \geq q_{K_j^*}\big]. \end{align} $$
$$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)&=\delta_{j0}+ \sum_{k=1}^{K_j^*}\delta_{jkq_k} I\big[\alpha_{lk} \geq q_k\big]+ \sum_{k'>k}^{K_j^*}\sum_{k=1}^{K_j^*}\delta_{jkq_{k}k'q_{k'}}I\big[\alpha_{lk} \geq q_{k}\big] I\big[\alpha_{lk'} \geq q_{k'}\big] + \nonumber\\ &\quad \dots +\delta_{j1q_{1}2q_{2}\dots K_j^*q_{K_j^*} } I\big[\alpha_{l1} \geq q_{1}\big] I\big[\alpha_{l2} \geq q_{2}\big] \dots I\big[\alpha_{lK_j^* } \geq q_{K_j^*}\big]. \end{align} $$
The fully additive model for polytomous attributes
 The fully additive model for polytomous attributes (fA-M; Yakar et al., Reference Yakar, Dogăn and de la Torre2021) assumes that mastering level m of required attribute k increases the success probability on item j by 
 $\delta _{jkm}$
, and its contribution is independent of the contributions of the levels of the other attributes. By retaining only the main effects in Equation (1), the fA-M can be obtained, which has the following IRF:
$\delta _{jkm}$
, and its contribution is independent of the contributions of the levels of the other attributes. By retaining only the main effects in Equation (1), the fA-M can be obtained, which has the following IRF: 
 $$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\delta_{j0}+\sum_{k=1}^{K_j^*} \sum_{m=1}^{M-1} \delta_{jkm} I \big[\alpha_{lk} =m\big]. \end{align} $$
$$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\delta_{j0}+\sum_{k=1}^{K_j^*} \sum_{m=1}^{M-1} \delta_{jkm} I \big[\alpha_{lk} =m\big]. \end{align} $$
 The fA-M has 
 $K^*_j(M-1)+1$
 parameters for item j. As with the saturated models, the number of parameters of the fA-M does not depend on the required levels, only the maximum levels of the required attributes. Using the required level
$K^*_j(M-1)+1$
 parameters for item j. As with the saturated models, the number of parameters of the fA-M does not depend on the required levels, only the maximum levels of the required attributes. Using the required level 
 $q_k$
 as a cutoff value, two simplified and interpretable models can be obtained from the fA-M. Specifically, in the first simplified fA-M, mastering an attribute level that is higher than
$q_k$
 as a cutoff value, two simplified and interpretable models can be obtained from the fA-M. Specifically, in the first simplified fA-M, mastering an attribute level that is higher than 
 $q_k$
 will contribute to higher probability of answering an item correctly, whereas mastering those that are lower than
$q_k$
 will contribute to higher probability of answering an item correctly, whereas mastering those that are lower than 
 $q_k$
 will have equal success probability. Hence,
$q_k$
 will have equal success probability. Hence, 
 $q_k$
 serves as a minimum bar and will be denoted as min-fA-M. The IRF for the min-fA-M can be expressed as
$q_k$
 serves as a minimum bar and will be denoted as min-fA-M. The IRF for the min-fA-M can be expressed as 
 $$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\delta_{j0}+\sum_{k=1}^{K_j^*} (\delta_{jk1_k} I \big[ \alpha_{lk} < q_{jk} \big] + \delta_{jkm} I \big[ \alpha_{lk} \geq q_{jk} \big] ), \end{align} $$
$$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\delta_{j0}+\sum_{k=1}^{K_j^*} (\delta_{jk1_k} I \big[ \alpha_{lk} < q_{jk} \big] + \delta_{jkm} I \big[ \alpha_{lk} \geq q_{jk} \big] ), \end{align} $$
where 
 $\delta _{jk1_k}$
 represents the main effect of
$\delta _{jk1_k}$
 represents the main effect of 
 $m=1$
 in attribute k. The number of parameters for item j in the min-fA-M reduces to
$m=1$
 in attribute k. The number of parameters for item j in the min-fA-M reduces to 
 $\sum _k^{K_j^*}(M-q_k+1) + 1$
.
$\sum _k^{K_j^*}(M-q_k+1) + 1$
.
 In contrast to the min-fA-M, the second simplified fA-M assumes 
 $q_k$
 is a maximum requirement and will be denoted as max-fA-M. The model is similar to the GDM in that the success probabilities for attribute levels that are higher than
$q_k$
 is a maximum requirement and will be denoted as max-fA-M. The model is similar to the GDM in that the success probabilities for attribute levels that are higher than 
 $q_k$
 are equal to each other in both models, which equal to the probability of level
$q_k$
 are equal to each other in both models, which equal to the probability of level 
 $q_k$
. However, those of levels that are lower than
$q_k$
. However, those of levels that are lower than 
 $q_k$
 are different. The lower the level, the lower the success probability. The IRF for the max-fA-M can be expressed as
$q_k$
 are different. The lower the level, the lower the success probability. The IRF for the max-fA-M can be expressed as 
 $$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\delta_{j0}+\sum_{k=1}^{K_j^*} (\delta_{jkm} I \big[ \alpha_{lk} < q_k \big] + \delta_{jkq_k} I \big[ \alpha_{lk} \geq q_k \big] ), \end{align} $$
$$ \begin{align} P_j(\boldsymbol\alpha_{l}^*)=\delta_{j0}+\sum_{k=1}^{K_j^*} (\delta_{jkm} I \big[ \alpha_{lk} < q_k \big] + \delta_{jkq_k} I \big[ \alpha_{lk} \geq q_k \big] ), \end{align} $$
where 
 $\delta _{jkq_k}$
 is the main effect of level
$\delta _{jkq_k}$
 is the main effect of level 
 $m=q_k$
 in attribute k. The max-fA-M has
$m=q_k$
 in attribute k. The max-fA-M has 
 $\sum _k^{K_j^*}q_k+1$
 parameters for item j.
$\sum _k^{K_j^*}q_k+1$
 parameters for item j.
4 The connections between the sp-CDM and the existing models
This section shows, both mathematically and graphically, the connections between the sp-CDM and the existing models. Specifically, the existing CDMs for polytomous attributes can be formulated as special cases of the sp-CDM using different functions. A diagram (i.e., Figure 1) and a table (i.e., Table 2) help illustrate the connections between the sp-CDM and the existing models and those among the existing models.

Figure 1 The generalized cognitive diagnosis model framework for polytomous attributes.Note: sp-CDM: saturated polytomous cognitive diagnosis model; fA-M: fully additive model for polytomous attributes; pG-DINA: generalized deterministic input, noisy “and” gate model for polytomous attributes with the specific attribute level mastery (SALM) assumption; PDCM: saturated polytomous-attribute diagnostic classification model; RUM-PA: reparameterized unified model for polytomous attributes; min-fA-M: fA-M using 
 $q_k$
 as a minimum requirement; max-fA-M: fA-M using
$q_k$
 as a minimum requirement; max-fA-M: fA-M using 
 $q_k$
 as a maximum requirement; cPDCM: constrained PDCM; GDM-PA: general diagnostic model for polytomous attributes; cRUM-PA: constrained RUM-PA; pA-CDM: additive model for polytomous attributes; pDINA: deterministic input, noisy “and” gate model for polytomous attributes; pDINO: deterministic input, noisy “or” gate model for polytomous attributes; conj-sp-CDM: conjunctive version of sp-CDM; disj-sp-CDM: disjunctive version of sp-CDM; OCAC: ordered category attribute coding framework. The colors orange, blue and green can be interpreted as the number of steps (i.e., 1, 2, and 3 steps) for the reduced models to be derived from the saturated model of a particular link function. For example, the pA-CDM can be derived from the identity sp-CDM through pG-DINA (two steps) or through fA-M then either min-fA-M or max-fA-M (three steps). The dashed lines indicate that the reduced models can also be shown to be special cases of sp-CDM with logit or log link functions.
$q_k$
 as a maximum requirement; cPDCM: constrained PDCM; GDM-PA: general diagnostic model for polytomous attributes; cRUM-PA: constrained RUM-PA; pA-CDM: additive model for polytomous attributes; pDINA: deterministic input, noisy “and” gate model for polytomous attributes; pDINO: deterministic input, noisy “or” gate model for polytomous attributes; conj-sp-CDM: conjunctive version of sp-CDM; disj-sp-CDM: disjunctive version of sp-CDM; OCAC: ordered category attribute coding framework. The colors orange, blue and green can be interpreted as the number of steps (i.e., 1, 2, and 3 steps) for the reduced models to be derived from the saturated model of a particular link function. For example, the pA-CDM can be derived from the identity sp-CDM through pG-DINA (two steps) or through fA-M then either min-fA-M or max-fA-M (three steps). The dashed lines indicate that the reduced models can also be shown to be special cases of sp-CDM with logit or log link functions.
Table 2 The relationships between sp-CDMs and the existing CDMs for polytomous attributes

Note: sp-CDM: saturated polytomous cognitive diagnosis models; pG-DINA: generalized deterministic input, noisy “and” gate model for polytomous attributes with the specific attribute level mastery (SALM) assumption; OCAC: ordered category attribute coding framework; fA-M: fully additive model for polytomous attributes; PDCM: saturated polytomous-attribute diagnostic classification models; cPDCM: constrained PDCM; GDM-PA: general diagnostic model for polytomous attributes; RUM-PA: reparameterized unified model for polytomous attributes; cRUM-PA: constrained RUM-PA.
4.1 The identity model: pG-DINA and fA-M
 The pG-DINA model can be obtained from the sp-CDM by replacing 
 $I \big [\alpha _{l\cdot \cdot }=m\big ]$
 with
$I \big [\alpha _{l\cdot \cdot }=m\big ]$
 with 
 $I\big [\alpha _{l\cdot } \geq q_{jk}\big ]$
. With respect to the fA-M, as shown in Equation (8), it is the identity sp-CDM that retains the main effects only.
$I\big [\alpha _{l\cdot } \geq q_{jk}\big ]$
. With respect to the fA-M, as shown in Equation (8), it is the identity sp-CDM that retains the main effects only.
4.2 The logit model: PDCM and GDM-PA
In the PDCM framework (Bao, Reference Bao2019), the logit of the success probably in answering item j correctly is given by
 $$ \begin{align} \textrm{log} \frac{P(X_j=1 | \tilde {\boldsymbol\alpha} ) }{P(X_j=0 | \tilde {\boldsymbol\alpha} )} =\lambda_{j0}+ \sum_{k=1}^{K} \sum_{m=1}^{M-1} \lambda_{jk}^m \tilde{\alpha}_{k}^m q_{jk} + \sum_{k=1}^{K-1} \sum_{m=1}^{M-1} \sum_{k'=k+1}^{K} \sum_{m'=1}^{M-1} \lambda_{jkk'}^{mm'} \tilde {\alpha}_{k}^m \tilde{\alpha}_{k'}^{m'} q_{jk} q_{jk'} +\dotsb, \end{align} $$
$$ \begin{align} \textrm{log} \frac{P(X_j=1 | \tilde {\boldsymbol\alpha} ) }{P(X_j=0 | \tilde {\boldsymbol\alpha} )} =\lambda_{j0}+ \sum_{k=1}^{K} \sum_{m=1}^{M-1} \lambda_{jk}^m \tilde{\alpha}_{k}^m q_{jk} + \sum_{k=1}^{K-1} \sum_{m=1}^{M-1} \sum_{k'=k+1}^{K} \sum_{m'=1}^{M-1} \lambda_{jkk'}^{mm'} \tilde {\alpha}_{k}^m \tilde{\alpha}_{k'}^{m'} q_{jk} q_{jk'} +\dotsb, \end{align} $$
where 
 $\tilde {\alpha }_{k}^m $
 is the dummy variable for level m of attribute k,
$\tilde {\alpha }_{k}^m $
 is the dummy variable for level m of attribute k, 
 $q_{jk}$
 is equal to 1 if kth attribute is measured by item j.
$q_{jk}$
 is equal to 1 if kth attribute is measured by item j. 
 $\lambda _{jk}^m$
 is the main effect of level m for attribute k, and
$\lambda _{jk}^m$
 is the main effect of level m for attribute k, and 
 $\lambda _{jkk'}^{mm'}$
 is the two-way interaction effect for level m of attribute k and level
$\lambda _{jkk'}^{mm'}$
 is the two-way interaction effect for level m of attribute k and level 
 $m'$
 of attribute
$m'$
 of attribute 
 $k'$
. As mentioned earlier,
$k'$
. As mentioned earlier, 
 $q_{jk}$
 in the PDCM are still binary values. Hence, the PDCM is a special case of the logit sp-CDM.
$q_{jk}$
 in the PDCM are still binary values. Hence, the PDCM is a special case of the logit sp-CDM.
In the GDM (von Davier, Reference von Davier2008), the IRF can be expressed as
 $$ \begin{align} \textrm{logit} \left[P(X_j=1|\beta,q_{k},\gamma,\alpha_k) \right]=\beta+\gamma^{T}h(q_{k},\alpha_k). \end{align} $$
$$ \begin{align} \textrm{logit} \left[P(X_j=1|\beta,q_{k},\gamma,\alpha_k) \right]=\beta+\gamma^{T}h(q_{k},\alpha_k). \end{align} $$
As mentioned earlier, for polytomous attributes with 
 $q_{k} \in \{0,1,2,\ldots ,m\}$
 and
$q_{k} \in \{0,1,2,\ldots ,m\}$
 and 
 $\alpha _{k} \in \{0,1,2,\ldots ,m\}$
, a useful and reasonable choice of
$\alpha _{k} \in \{0,1,2,\ldots ,m\}$
, a useful and reasonable choice of 
 $h(\cdot )$
 in GDM is defined as
$h(\cdot )$
 in GDM is defined as 
 $$ \begin{align} h(q_{k},\alpha_k)=\left\{\begin{matrix} q_{k} & \textrm{if} \ \alpha_k> q_{k} \\ \alpha_k & \textrm{if} \ \alpha_k \leq q_{k} \end{matrix}\right.. \end{align} $$
$$ \begin{align} h(q_{k},\alpha_k)=\left\{\begin{matrix} q_{k} & \textrm{if} \ \alpha_k> q_{k} \\ \alpha_k & \textrm{if} \ \alpha_k \leq q_{k} \end{matrix}\right.. \end{align} $$
To this end, Equation (12) is equivalent to
 $$ \begin{align} \textrm{logit} \left[P(X_j=1|\beta,q_{k},\gamma,\alpha_k) \right]=\beta+ \sum_{k=1}^{K_j^*} \sum_{m=1}^{M} \gamma_{jk}\cdot h(q_{k},\alpha_k),  \end{align} $$
$$ \begin{align} \textrm{logit} \left[P(X_j=1|\beta,q_{k},\gamma,\alpha_k) \right]=\beta+ \sum_{k=1}^{K_j^*} \sum_{m=1}^{M} \gamma_{jk}\cdot h(q_{k},\alpha_k),  \end{align} $$
where 
 $\gamma _{jk}$
 is the increase in the logit of the probability of success for every level of
$\gamma _{jk}$
 is the increase in the logit of the probability of success for every level of 
 $\alpha _k$
 mastered up to the required level (i.e.,
$\alpha _k$
 mastered up to the required level (i.e., 
 $q_k$
).
$q_k$
).
4.3 The log model: RUM-PA
 In the polytomous attribute RUM (Templin, Reference Templin2004) with 
 $q_{k} \in \{0,1\}$
 and
$q_{k} \in \{0,1\}$
 and 
 $\alpha _{k} \in \{0,1,2,\ldots ,m\}$
, the success probability is given by
$\alpha _{k} \in \{0,1,2,\ldots ,m\}$
, the success probability is given by 
 $$ \begin{align} P(X_j=1|q_{k},\alpha_k, \pi^*, r_{k}^*)=\pi^* \prod_{k=1}^{K}(r_{k}^*)^{f_{k}(q_{k},\alpha_{k})}, \end{align} $$
$$ \begin{align} P(X_j=1|q_{k},\alpha_k, \pi^*, r_{k}^*)=\pi^* \prod_{k=1}^{K}(r_{k}^*)^{f_{k}(q_{k},\alpha_{k})}, \end{align} $$
where
 $$ \begin{align} f(q_{k},\alpha_k)=\left\{\begin{matrix} 1 & \textrm{if} \: q_{k} =1, \alpha_k =0 \\ 0 & \textrm{if} \: q_{k} =1 \leq \alpha_k =m \end{matrix}\right.. \end{align} $$
$$ \begin{align} f(q_{k},\alpha_k)=\left\{\begin{matrix} 1 & \textrm{if} \: q_{k} =1, \alpha_k =0 \\ 0 & \textrm{if} \: q_{k} =1 \leq \alpha_k =m \end{matrix}\right.. \end{align} $$
Like in the PDCM, the Q-matrices in the RUM-PA are assumed to be dichotomous, whereas the person attributes are assumed to be polytomous. Equation (15) can be written with respect to different attribute level m as
 $$ \begin{align} P(X_j=1|q_{k},\alpha_k, \pi^*, r_{k}^*)=\pi^* \prod_{k=1}^{K} \prod_{m=1}^{M} (r_{k}^*)^ {f_{km}(q_{km},\alpha_{km})}. \end{align} $$
$$ \begin{align} P(X_j=1|q_{k},\alpha_k, \pi^*, r_{k}^*)=\pi^* \prod_{k=1}^{K} \prod_{m=1}^{M} (r_{k}^*)^ {f_{km}(q_{km},\alpha_{km})}. \end{align} $$
Equation (17) can be rewritten using 
 $\alpha _k^*$
 with
$\alpha _k^*$
 with 
 $q_k=1$
 as
$q_k=1$
 as 
 $$ \begin{align} P(X_j=1|q_{k},\alpha_k^*, \pi^*, r_{k}^*)=\pi^* \prod_{k=1}^{K} \prod_{m=1}^{M} r_{km}^* \times \prod_{k=1}^{K} \prod_{m=1}^{M} ( \frac{1}{r_{km}^*}) ^ {\alpha_{km}}. \end{align} $$
$$ \begin{align} P(X_j=1|q_{k},\alpha_k^*, \pi^*, r_{k}^*)=\pi^* \prod_{k=1}^{K} \prod_{m=1}^{M} r_{km}^* \times \prod_{k=1}^{K} \prod_{m=1}^{M} ( \frac{1}{r_{km}^*}) ^ {\alpha_{km}}. \end{align} $$
Thus,
 $$ \begin{align} \textrm{log} \left[P(X_j=1|q_{k},\alpha_k^*, \pi^*, r_{k}^*) \right] = \textrm{log} (\pi^*) + \sum_{k=1}^{K} \sum_{m=1}^{M} \textrm{log}(r_{km}^*) - \sum_{k=1}^{K} \sum_{m=1}^{M} \textrm{log}(r_{km}^*) \alpha_{km}. \end{align} $$
$$ \begin{align} \textrm{log} \left[P(X_j=1|q_{k},\alpha_k^*, \pi^*, r_{k}^*) \right] = \textrm{log} (\pi^*) + \sum_{k=1}^{K} \sum_{m=1}^{M} \textrm{log}(r_{km}^*) - \sum_{k=1}^{K} \sum_{m=1}^{M} \textrm{log}(r_{km}^*) \alpha_{km}. \end{align} $$
By setting 
 $\textrm {log} (\pi ^*) + \sum _{k=1}^{K} \sum _{m=1}^{M} \textrm {log}(r_{km}^*)= \nu _0$
 and
$\textrm {log} (\pi ^*) + \sum _{k=1}^{K} \sum _{m=1}^{M} \textrm {log}(r_{km}^*)= \nu _0$
 and 
 $- \sum _{k=1}^{K} \sum _{m=1}^{M} \textrm {log}(r_{km}^*) = - \sum _{k=1}^{K} \sum _{m=1}^{M} \nu _{km}$
, Equation (19) is a special case of log sp-CDM without the interaction terms.
$- \sum _{k=1}^{K} \sum _{m=1}^{M} \textrm {log}(r_{km}^*) = - \sum _{k=1}^{K} \sum _{m=1}^{M} \nu _{km}$
, Equation (19) is a special case of log sp-CDM without the interaction terms.
5 Model estimation
The estimation algorithms for the parameters, the corresponding standard errors (SEs), and the person attribute patterns in the sp-CDM model are primarily similar to those in estimating the parameters of the G-DINA model described in de la Torre (Reference de la Torre2011) and those of the DINA model in de la Torre (Reference de la Torre2009). Specifically, parameters for the saturated forms of the sp-CDM can be estimated via the marginal maximum likelihood estimation method with an expectation-maximization (MMLE/EM), and those of reduced models can be estimated by incorporating appropriate design matrix in the MMLE/EM procedure. Details can be found in Appendix A.
6 Simulation study
6.1 Design
 Two research questions were investigated in the simulation study. First, how well can the parameters of the sp-CDM be recovered with the proposed estimation algorithms? And second, how does the fit of the sp-CDM compare with those of constrained and simplified models across different data generation assumptions. Due to time and space constraints, we focus on the identity sp-CDM and its two special cases, namely, the pG-DINA and 
 ${f}$
A-M, in this study. The details of the design are summarized in Table 3. The levels for the manipulated factors followed previous studies (e.g., de la Torre et al., Reference de la Torre, Qiu and Santos2021). In particular, the levels of item quality, defined as a function of the guessing and slip parameters, were computed as
${f}$
A-M, in this study. The details of the design are summarized in Table 3. The levels for the manipulated factors followed previous studies (e.g., de la Torre et al., Reference de la Torre, Qiu and Santos2021). In particular, the levels of item quality, defined as a function of the guessing and slip parameters, were computed as 
 $p_0$
 and (
$p_0$
 and (
 $1-p_1$
). Specifically, items with (
$1-p_1$
). Specifically, items with (
 $p_0$
,
$p_0$
, 
 $1-p_1$
)
$1-p_1$
) 
 $\in U(.05,\ .15)$
,
$\in U(.05,\ .15)$
, 
 $U(.10,\ .20)$
, and
$U(.10,\ .20)$
, and 
 $U(.15,\ .30)$
 were classified as high, moderate, and low-quality items, respectively. To this end, the generating values for the intercept parameters were set to be .05, .10, and .15 under the three types of quality items. For the main effect parameters, the mean of the generating values are .16, .15, and .13, respectively, and for the interaction effect parameters, they are .1, .08, and .07, respectively. Attribute patterns were generated from a uniform distribution where all possible attribute patterns were equally likely. The Q-matrix for
$U(.15,\ .30)$
 were classified as high, moderate, and low-quality items, respectively. To this end, the generating values for the intercept parameters were set to be .05, .10, and .15 under the three types of quality items. For the main effect parameters, the mean of the generating values are .16, .15, and .13, respectively, and for the interaction effect parameters, they are .1, .08, and .07, respectively. Attribute patterns were generated from a uniform distribution where all possible attribute patterns were equally likely. The Q-matrix for 
 $K=3$
 and
$K=3$
 and 
 $K=5$
 with 26 items is shown in Tables 4 and B.1, respectively, and the Q-matrix with 52 items is duplicate of respective Q-matrices. The Q-matrix in the simulation study was specified to satisfy the sufficient conditions similar to Theorem 4 in Fang et al. (Reference Fang, Liu and Ying2019). The GDINA package (Ma & de la Torre, Reference Ma and de la Torre2019) and a customized program were used to generate the data and estimate the models. The monotonic constraints were imposed when estimating the models. A total of 168 conditions were examined, and each condition was replicated 100 times.
$K=5$
 with 26 items is shown in Tables 4 and B.1, respectively, and the Q-matrix with 52 items is duplicate of respective Q-matrices. The Q-matrix in the simulation study was specified to satisfy the sufficient conditions similar to Theorem 4 in Fang et al. (Reference Fang, Liu and Ying2019). The GDINA package (Ma & de la Torre, Reference Ma and de la Torre2019) and a customized program were used to generate the data and estimate the models. The monotonic constraints were imposed when estimating the models. A total of 168 conditions were examined, and each condition was replicated 100 times.
Table 3 Summary of the simulation design

Note: sp-CDM: saturated polytomous cognitive diagnosis models; pG-DINA: generalized deterministic input, noisy “and” gate model for polytomous attributes with the specific attribute level mastery (SALM) assumption; fA-M: fully additive model for polytomous attributes.
Table 4 Q-matrix for conditions of three attributes in simulation study

 To answer the research questions, the simulation study was carried out in three steps. Step 1 was designed to answer the first research question. In this step, data under different conditions were generated following the sp-CDM (i.e., Equation (1)) and fitted with the true model (i.e., the sp-CDM). Steps 2 and 3 were designed to answer the second research question. In the second step, the pG-DINA model and 
 ${f}$
A-M were also fitted to the generated data in step 1 to investigate the consequences of fitting reduced models when neither the SALM nor the additive assumption holds. In step 3, two sets of data were generated following the pG-DINA model and
${f}$
A-M were also fitted to the generated data in step 1 to investigate the consequences of fitting reduced models when neither the SALM nor the additive assumption holds. In step 3, two sets of data were generated following the pG-DINA model and 
 ${f}$
A-M, and fitted with sp-CDM, as well as their respective true model to investigate the consequences of fitting an unnecessarily complicated model (i.e., the sp-CDM) when the SALM or the additive assumption holds. For the sake of discussion, step 1 is referred to as the parameter recovery study, whereas steps 2 and 3 as the comparison study.
${f}$
A-M, and fitted with sp-CDM, as well as their respective true model to investigate the consequences of fitting an unnecessarily complicated model (i.e., the sp-CDM) when the SALM or the additive assumption holds. For the sake of discussion, step 1 is referred to as the parameter recovery study, whereas steps 2 and 3 as the comparison study.
Evaluation criteria
 The dependent variables in the parameter recovery study were the bias and root mean square error (RMSE) of the estimated success probabilities of the reduced attribute patterns in item j (denoted as 
 $\hat {P}_j(\boldsymbol {\alpha }^*_l)$
), and were defined as
$\hat {P}_j(\boldsymbol {\alpha }^*_l)$
), and were defined as 
 $$ \begin{align} \text{Bias}\Big(\hat{P}_j(\boldsymbol{\alpha}^*_l) \Big)= \sum_{j=1}^{J} \sum_{l=1}^{L^*_j} \Big[ \bar {\hat{P}}_j(\boldsymbol{\alpha}^*_l) - P_j(\boldsymbol{\alpha}^*_l) \Big] / \sum_{j=1}^J L^*_j, \end{align} $$
$$ \begin{align} \text{Bias}\Big(\hat{P}_j(\boldsymbol{\alpha}^*_l) \Big)= \sum_{j=1}^{J} \sum_{l=1}^{L^*_j} \Big[ \bar {\hat{P}}_j(\boldsymbol{\alpha}^*_l) - P_j(\boldsymbol{\alpha}^*_l) \Big] / \sum_{j=1}^J L^*_j, \end{align} $$
and
 $$ \begin{align} \text{RMSE}\Big(\hat{P}_j(\boldsymbol{\alpha}^*_l) \Big)=\,\sqrt[]{\sum_{j=1}^{J} \sum_{l=1}^{L^*_j}\sum_{r=1}^{R}\Big[ \hat{P}^{(r)}_j(\boldsymbol{\alpha}^*_l) - P_j(\boldsymbol{\alpha}^*_l) \Big]^2/ \Big[R \times \sum_{j=1}^J L^*_j \Big] }, \end{align} $$
$$ \begin{align} \text{RMSE}\Big(\hat{P}_j(\boldsymbol{\alpha}^*_l) \Big)=\,\sqrt[]{\sum_{j=1}^{J} \sum_{l=1}^{L^*_j}\sum_{r=1}^{R}\Big[ \hat{P}^{(r)}_j(\boldsymbol{\alpha}^*_l) - P_j(\boldsymbol{\alpha}^*_l) \Big]^2/ \Big[R \times \sum_{j=1}^J L^*_j \Big] }, \end{align} $$
respectively, where 
 $P_j(\boldsymbol {\alpha }^*_l)$
 is the generating probability of
$P_j(\boldsymbol {\alpha }^*_l)$
 is the generating probability of 
 $\boldsymbol {\alpha }^*_l$
 in item j,
$\boldsymbol {\alpha }^*_l$
 in item j, 
 $\hat {P}^{(r)}_j(\boldsymbol {\alpha }^*_l)$
 is the estimate of
$\hat {P}^{(r)}_j(\boldsymbol {\alpha }^*_l)$
 is the estimate of 
 $\hat {P}_j(\boldsymbol {\alpha }^*_l)$
 in the rth replication,
$\hat {P}_j(\boldsymbol {\alpha }^*_l)$
 in the rth replication, 
 $\bar {\hat {P}}_j(\boldsymbol {\alpha }^*_l)$
 is the mean of
$\bar {\hat {P}}_j(\boldsymbol {\alpha }^*_l)$
 is the mean of 
 $\hat {P}_j(\boldsymbol {\alpha }^*_l)$
 across R replications, and
$\hat {P}_j(\boldsymbol {\alpha }^*_l)$
 across R replications, and 
 $L^*_j$
 is the number of attribute patterns in item j.
$L^*_j$
 is the number of attribute patterns in item j.
In the comparison study, the dependent variables were the proportion of correctly classified attributes (PCA) and vectors (PCV), which were computed as
 $$ \begin{align} \text{PCA}_r=\frac{\sum_{n=1}^{N} \sum_{k=1}^{K} \sum_{m=1}^{M} I[ \hat{\alpha}_{nkm} = \alpha_{nkm} ] }{ N \times K}, \end{align} $$
$$ \begin{align} \text{PCA}_r=\frac{\sum_{n=1}^{N} \sum_{k=1}^{K} \sum_{m=1}^{M} I[ \hat{\alpha}_{nkm} = \alpha_{nkm} ] }{ N \times K}, \end{align} $$
and
 $$ \begin{align} \text{PCV}_r=\frac{\sum_{n=1}^{N} I[ \hat{ \boldsymbol\alpha }_{n} = \boldsymbol\alpha_{n} ] }{ N }, \end{align} $$
$$ \begin{align} \text{PCV}_r=\frac{\sum_{n=1}^{N} I[ \hat{ \boldsymbol\alpha }_{n} = \boldsymbol\alpha_{n} ] }{ N }, \end{align} $$
respectively, where 
 $I[ \hat {\alpha }_{nkm} = \alpha _{nkm} ]$
 was used to evaluate the match between the estimated and generated attribute in the rth replication and
$I[ \hat {\alpha }_{nkm} = \alpha _{nkm} ]$
 was used to evaluate the match between the estimated and generated attribute in the rth replication and 
 $I[ \hat { \boldsymbol \alpha }_{n} = \boldsymbol \alpha _{n} ]$
 to attribute vectors. For both studies, the results are summarized using the average values of the variables across replications.
$I[ \hat { \boldsymbol \alpha }_{n} = \boldsymbol \alpha _{n} ]$
 to attribute vectors. For both studies, the results are summarized using the average values of the variables across replications.
Results
 For 
 $K=3$
, the bias and RMSE under different conditions are shown in Figures 2 and 3, respectively, and PCA and PCV under the high, moderate, and low quality item conditions in Tables 5, 6, and 7, respectively. In particular, the upper panels of Figures 2 and 3 give the biases and RMSEs from fitting the sp-CDM to the sp-CDM data under different numbers of attributes, item qualities, test lengths, and sample sizes. The figures show that the parameters of the sp-CDM can be well recovered with the proposed estimation algorithms, particularly when high quality items were involved. For example, the mean biases were between
$K=3$
, the bias and RMSE under different conditions are shown in Figures 2 and 3, respectively, and PCA and PCV under the high, moderate, and low quality item conditions in Tables 5, 6, and 7, respectively. In particular, the upper panels of Figures 2 and 3 give the biases and RMSEs from fitting the sp-CDM to the sp-CDM data under different numbers of attributes, item qualities, test lengths, and sample sizes. The figures show that the parameters of the sp-CDM can be well recovered with the proposed estimation algorithms, particularly when high quality items were involved. For example, the mean biases were between 
 $-0.007$
 and
$-0.007$
 and 
 $0.000,$
 and the mean RMSEs were between 0.000 and 0.009 across all conditions. The upper panels of Tables 5, 6, and 7 reveal that the classification of attributes and vectors are satisfactory under the sp-CDM. For example, for the high quality item conditions (i.e., Table 5), the PCAs were between 88.2% and 96.9%, and the PCVs were between 71.9% and 89.3%.
$0.000,$
 and the mean RMSEs were between 0.000 and 0.009 across all conditions. The upper panels of Tables 5, 6, and 7 reveal that the classification of attributes and vectors are satisfactory under the sp-CDM. For example, for the high quality item conditions (i.e., Table 5), the PCAs were between 88.2% and 96.9%, and the PCVs were between 71.9% and 89.3%.

Figure 2 Bias in parameter recovery with three attributes.Note: sp-CDM: saturated polytomous cognitive diagnosis models; fA-M: fully additive model for polytomous attributes; pG-DINA: generalized deterministic input, noisy “and” gate model for polytomous attributes with the specific attribute level mastery (SALM) assumption. J: test length; N: sample size.

Figure 3 Root mean square error (RMSE) in parameter recovery with three attributes.Note: sp-CDM: saturated polytomous cognitive diagnosis models; fA-M: fully additive model for polytomous attributes; pG-DINA: generalized deterministic input, noisy “and” gate model for polytomous attributes with the specific attribute level mastery (SALM) assumption. J: test length; N: sample size.
Table 5 Correctly classified attributes (PCA) and vectors (PCV) (in %) with three attributes and high quality items

Note: Gen: Generating model; EC: Evaluation criteria; 1: sp-CDM; 2: fA-M; 3: pG-DINA; PCAk: PCA of attribute k.
Table 6 Correctly classified attributes (PCA) and vectors (PCV) (in %) with three attributes and moderate quality items

Note: Gen: Generating model; EC: Evaluation criteria; 1: sp-CDM; 2: fA-M; 3: pG-DINA; PCAk: PCA of attribute k.
Table 7 Correctly classified attributes (PCA) and vectors (PCV) (in %) with three attributes and low quality items

Note: Gen: Generating model; EC: Evaluation criteria; 1: sp-CDM; 2: fA-M; 3: pG-DINA; PCAk: PCA of attribute k.
In comparison, a closer inspection of the panels of Figures 2 and 3 and Tables 5, 6, and 7 reveals that fitting the reduced models, either the fA-M or the pG-DINA, to the sp-CDM data resulted in larger biases and RMSEs, or lower PCAs and PCVs, particularly when high quality items were used. Dramatically different results were obtained when the sp-CDM was fitted to the data generated using the reduced models, the biases, RMSEs, PCAs, and PCVs were similar to those obtained when the corresponding true models were fitted. These results can have important practical implications—when it is unclear which is the true model for an item, it is safer to fit the more general sp-CDM, rather than a particular reduced model.
 Due to space constraints, the results for 
 $K=5$
 are given in Appendix B, where Figures B.1 and B.2 contain the bias and RMSE, respectively, and Tables B.2, B.3, and B.4 the PCA and PCV under the three item quality conditions, respectively. In general, the findings for
$K=5$
 are given in Appendix B, where Figures B.1 and B.2 contain the bias and RMSE, respectively, and Tables B.2, B.3, and B.4 the PCA and PCV under the three item quality conditions, respectively. In general, the findings for 
 $K=5$
 were similar to those for
$K=5$
 were similar to those for 
 $K=3$
. However, it should be noted that for two conditions in Figure B.2 (i.e.,
$K=3$
. However, it should be noted that for two conditions in Figure B.2 (i.e., 
 $K=5$
,
$K=5$
, 
 $J=26$
,
$J=26$
, 
 $N=1000$
, and items are either of low or moderate quality), the sp-CDM produced larger RMSEs than f-AM even though the data were generated from the sp-CDM. This could be attributed to the instability of estimating the sp-CDM when a large number of parameters are involved, and the data are not sufficiently informative.
$N=1000$
, and items are either of low or moderate quality), the sp-CDM produced larger RMSEs than f-AM even though the data were generated from the sp-CDM. This could be attributed to the instability of estimating the sp-CDM when a large number of parameters are involved, and the data are not sufficiently informative.
 One reviewer noted that the RMSE results from both the 
 $K=3$
 and
$K=3$
 and 
 $K=5$
 conditions for the sp-CDM and fAM tend to yield decreased RMSEs when tests contain more items and use smaller samples. For example, in the top panel of Figure 3, the RMSE results under the condition
$K=5$
 conditions for the sp-CDM and fAM tend to yield decreased RMSEs when tests contain more items and use smaller samples. For example, in the top panel of Figure 3, the RMSE results under the condition 
 $J52/N1000$
 are smaller than those for the condition
$J52/N1000$
 are smaller than those for the condition 
 $J26/N2000$
. To investigate this phenomena, an additional simulation was conducted where the dataset was generated from the sp-CDM with
$J26/N2000$
. To investigate this phenomena, an additional simulation was conducted where the dataset was generated from the sp-CDM with 
 $K=3$
 and high item quality for the two conditions (i.e.,
$K=3$
 and high item quality for the two conditions (i.e., 
 $J52/N1000$
 and
$J52/N1000$
 and 
 $J26/N2000$
). Three analyses were carried out as follows: In the first analysis, for the condition of
$J26/N2000$
). Three analyses were carried out as follows: In the first analysis, for the condition of 
 $J52/N1000$
, the true attribute patterns for each person were assigned a posterior probability of
$J52/N1000$
, the true attribute patterns for each person were assigned a posterior probability of 
 $0.950$
, and the remaining
$0.950$
, and the remaining 
 $26$
 patterns for the person a probability of
$26$
 patterns for the person a probability of 
 $(1-0.950)/26\approx 0.002$
. In contrast, for the condition of
$(1-0.950)/26\approx 0.002$
. In contrast, for the condition of 
 $J26/N2000$
, the true attribute patterns for each person were assigned a posterior probability of
$J26/N2000$
, the true attribute patterns for each person were assigned a posterior probability of 
 $0.700$
 and other patterns a probability of
$0.700$
 and other patterns a probability of 
 $(1-0.700)/26\approx 0.011$
. This analysis mimics a scenario where the attribute patterns are better estimated in a smaller sample size condition than in a larger one. In the second analysis, the setting for the patterns’ posterior probabilities was reversed for the two conditions to mimic a scenario where the attribute patterns are better estimated in a larger sample size than in a smaller one. Finally, the patterns’ posterior probabilities were set using the results of the original simulation study. Specifically, the mean of the posterior probabilities for the true patterns across persons was computed for the two conditions—which are
$(1-0.700)/26\approx 0.011$
. This analysis mimics a scenario where the attribute patterns are better estimated in a smaller sample size condition than in a larger one. In the second analysis, the setting for the patterns’ posterior probabilities was reversed for the two conditions to mimic a scenario where the attribute patterns are better estimated in a larger sample size than in a smaller one. Finally, the patterns’ posterior probabilities were set using the results of the original simulation study. Specifically, the mean of the posterior probabilities for the true patterns across persons was computed for the two conditions—which are 
 $0.601$
 and
$0.601$
 and 
 $0.456$
 for the conditions of
$0.456$
 for the conditions of 
 $J52/N1000$
 and
$J52/N1000$
 and 
 $J26/N2000$
, respectively—and are used in the third analysis.
$J26/N2000$
, respectively—and are used in the third analysis.
 The biases and RMSEs for the additional simulation study are shown in Table 8. It was found that, for each analysis, the test that was assigned more accurate attribute pattern estimates resulted in better item parameter estimates (i.e., smaller bias and RMSE), even when the sample size was supposedly smaller. For example, in the first and third analyses, the biases and RMSEs under 
 $J52/N1000$
 are smaller than those under
$J52/N1000$
 are smaller than those under 
 $J26/N2000$
. These results demonstrate how more informative (i.e., longer) tests calibrated with a smaller sample size can produce better item parameter estimates.
$J26/N2000$
. These results demonstrate how more informative (i.e., longer) tests calibrated with a smaller sample size can produce better item parameter estimates.
Table 8 Additional simulation study: Bias and root mean square error (RMSE)

 
Note: In the first analysis, the posterior probabilities for the true attribute patterns are set to be 
 $0.95$
 and
$0.95$
 and 
 $0.70$
 for the conditions of
$0.70$
 for the conditions of 
 $J52/N1000$
 and
$J52/N1000$
 and 
 $J26/N2000$
, respectively. In the second analysis, they are
$J26/N2000$
, respectively. In the second analysis, they are 
 $0.70$
 and
$0.70$
 and 
 $0.95$
 for the two conditions, respectively. In the third analysis, the probabilities are
$0.95$
 for the two conditions, respectively. In the third analysis, the probabilities are 
 $0.601$
 and
$0.601$
 and 
 $0.456$
, respectively.
$0.456$
, respectively.
7 Real data example
7.1 Data and analysis
 The responses in this example consisted of 1,408 middle school students in Hong Kong to a PR assessment described earlier. The assessment uses 31 multiple-choice items measuring six PR attributes, namely, (
 $1$
) prerequisite skills and concepts required in PR, (
$1$
) prerequisite skills and concepts required in PR, (
 $2$
) comparing and ordering fractions, (
$2$
) comparing and ordering fractions, (
 $3$
) constructing ratios and proportions, (
$3$
) constructing ratios and proportions, (
 $4$
) identifying a multiplicative relationship between sets of values, (
$4$
) identifying a multiplicative relationship between sets of values, (
 $5$
) differentiating a proportional relationship from a non-proportional relationship, and (
$5$
) differentiating a proportional relationship from a non-proportional relationship, and (
 $6$
) applying algorithms in solving PR problems, among which, the second and third attributes are polytomous with
$6$
) applying algorithms in solving PR problems, among which, the second and third attributes are polytomous with 
 $M=3$
 and other attributes are dichotomous. The Q-matrix for the empirical example is provided in Table 9, where each item requires one to four attributes (i.e.,
$M=3$
 and other attributes are dichotomous. The Q-matrix for the empirical example is provided in Table 9, where each item requires one to four attributes (i.e., 
 $ 1 \leq K^*_j \leq 4)$
. This Q-matrix satisfies the identifiability conditions by Fang et al. (Reference Fang, Liu and Ying2019). The sp-CDM, fA-M, and pG-DINA model were fitted to the data. No monotonicity constraint was imposed in this analysis. The deviance, Akaike information criterion (AIC), and Bayesian information criterion (BIC) were used to compare the three fitted models.
$ 1 \leq K^*_j \leq 4)$
. This Q-matrix satisfies the identifiability conditions by Fang et al. (Reference Fang, Liu and Ying2019). The sp-CDM, fA-M, and pG-DINA model were fitted to the data. No monotonicity constraint was imposed in this analysis. The deviance, Akaike information criterion (AIC), and Bayesian information criterion (BIC) were used to compare the three fitted models.
Table 9 Q-matrix for the proportional reasoning data and the number of parameters under the sp-CDM, fAM, and pG-DINA model

 
Note: #Par: number of parameters; 
 $M_1$
: sp-CDM, saturated generalized deterministic, input, noisy, “and” gate model for polytomous attributes;
$M_1$
: sp-CDM, saturated generalized deterministic, input, noisy, “and” gate model for polytomous attributes; 
 $M_2$
: fA-M, fully additive model for polytomous attributes;
$M_2$
: fA-M, fully additive model for polytomous attributes; 
 $M_3$
: pG-DINA, polytomous generalized DINA model.
$M_3$
: pG-DINA, polytomous generalized DINA model.
7.2 Results
Table 10 shows the number of parameters and the fit statistics (i.e., deviance, AIC, and BIC) for the sp-CDM, fA-M, and pG-DINA models in the empirical example. All the fit statistics indicate that the sp-CDM fitted the data the best and the fA-M the worst.
Table 10 Fit statistics of the sp-CDM, fA-M, and pG-DINA model for the empirical example

Note: sp-CDM: saturated generalized deterministic, input, noisy, “and” gate model for polytomous attributes; fA-M: fully additive model for polytomous attributes; pG-DINA: polytomous generalized DINA model.
To quantify the discrepancies between the parameter estimates of item j, the root mean squared difference (RMSD) between any model pair is computed as follows:
 $$ \begin{align} \text{RMSD}_j(M_1,M_2)=\sqrt{ \sum_{l=1}^{L_j} w_{jl} (P_{jlM_1} - P_{jlM_2})^2 }, \end{align} $$
$$ \begin{align} \text{RMSD}_j(M_1,M_2)=\sqrt{ \sum_{l=1}^{L_j} w_{jl} (P_{jlM_1} - P_{jlM_2})^2 }, \end{align} $$
where 
 $M_1$
 and
$M_1$
 and 
 $M_2$
 are pair of models,
$M_2$
 are pair of models, 
 $w_{jl} $
 and
$w_{jl} $
 and 
 $L_j$
 are the posterior probability of latent group l and the number of latent groups for item j based on the sp-CDM, respectively, and
$L_j$
 are the posterior probability of latent group l and the number of latent groups for item j based on the sp-CDM, respectively, and 
 $P_{jlm}$
 is the success probability of latent group l on item j based on a model m. Given in Table 11 are the results of RMSD for the 31 items, as well as the average RMSD for the entire test. On the average, fA-M and pGDINA model had the most similar the item parameter estimates (average RMSD = 0.171), whereas the sp-CDM and pG-DINA model had the most disparate estimates (average RMSD = 0.230). These results suggest that, at least for this empirical example, the two reduced model behaved more similarly to each other than they did to the saturated model. However, this pattern did not necessarily hold for all items. For example, the discrepancies between fA-M and pGDINA model turned out to be the largest for items 13 and 19.
$P_{jlm}$
 is the success probability of latent group l on item j based on a model m. Given in Table 11 are the results of RMSD for the 31 items, as well as the average RMSD for the entire test. On the average, fA-M and pGDINA model had the most similar the item parameter estimates (average RMSD = 0.171), whereas the sp-CDM and pG-DINA model had the most disparate estimates (average RMSD = 0.230). These results suggest that, at least for this empirical example, the two reduced model behaved more similarly to each other than they did to the saturated model. However, this pattern did not necessarily hold for all items. For example, the discrepancies between fA-M and pGDINA model turned out to be the largest for items 13 and 19.
Table 11 Root mean squared differences between the sp-CDM, fA-M, and pG-DINA model for the empirical example

 
Note: 
 $M_1$
: sp-CDM, saturated generalized deterministic, input, noisy, “and” gate model for polytomous attributes;
$M_1$
: sp-CDM, saturated generalized deterministic, input, noisy, “and” gate model for polytomous attributes; 
 $M_2$
: fA-M, fully additive model for polytomous attributes;
$M_2$
: fA-M, fully additive model for polytomous attributes; 
 $M_3$
: pG-DINA, polytomous generalized DINA model.
$M_3$
: pG-DINA, polytomous generalized DINA model.
 To better understand how similar and disparate item parameter estimates look like, the side-by-side success probability bar graphs of two items requiring one dichotomous and one polytomous attributes are given in Figure 4. Item 17 had estimates that can be considered more similar, whereas item 22 more disparate. It should be noted that because RMSD is computed using latent group weights, large differences in the success probability estimates can have a limited impact on the RMSD, and vice versa. As can be seen from the upper panel, despite having more similar item parameter estimates, the success probabilities of item 17 for some latent groups (i.e., 
 $10$
 and
$10$
 and 
 $11$
) can be quite different. In contrast, the lower panel shows that item parameter estimates for item 22 were quite disparate, and huge discrepancies the success probabilities for latent groups for
$11$
) can be quite different. In contrast, the lower panel shows that item parameter estimates for item 22 were quite disparate, and huge discrepancies the success probabilities for latent groups for 
 $01$
,
$01$
, 
 $02$
, and
$02$
, and 
 $11$
 can be found, particularly, between sp-CDM and pG-DINA model.
$11$
 can be found, particularly, between sp-CDM and pG-DINA model.

Figure 4 Success probabilities of latent groups in two items in the empirical example.Note: sp-CDM: saturated polytomous cognitive diagnosis models; fA-M: fully additive model for polytomous attributes; pG-DINA: generalized deterministic input, noisy, “and” gate model for polytomous attributes with the specific attribute level mastery (SALM) assumption.
The above comparisons were not meant to establish in general the similarities or differences between the three polytomous CDMs. Rather, it sought to better understand how the three models behave for a very particular empirical data set, which may provide insights into how future studies can be designed for the different models to be compared in a more systematic and comprehensive way.
 The results can have important practical implications for diagnosing the mastery status of students. As shown in Figure 4, based on the sp-CDM, latent groups 
 $10$
 and
$10$
 and 
 $11$
 have slightly higher success probabilities on item
$11$
 have slightly higher success probabilities on item 
 $17$
 than latent groups
$17$
 than latent groups 
 $00$
,
$00$
, 
 $01$
, and
$01$
, and 
 $02$
. This suggests that mastering the prerequisite skills and concepts increases the probability of getting item
$02$
. This suggests that mastering the prerequisite skills and concepts increases the probability of getting item 
 $17$
 correctly; however, the success probability for this item is the highest when, in addition to mastering the prerequisite skills and concepts, an examinee also masters ordering fractions (level
$17$
 correctly; however, the success probability for this item is the highest when, in addition to mastering the prerequisite skills and concepts, an examinee also masters ordering fractions (level 
 $2$
 of
$2$
 of 
 $\alpha _2$
). In comparison, again based on the sp-CDM, the estimated success probability on item
$\alpha _2$
). In comparison, again based on the sp-CDM, the estimated success probability on item 
 $22$
 is the highest for latent group
$22$
 is the highest for latent group 
 $12$
, whereas the success probabilities for other latent groups (i.e.,
$12$
, whereas the success probabilities for other latent groups (i.e., 
 $00$
,
$00$
, 
 $01$
,
$01$
, 
 $02$
,
$02$
, 
 $10$
, and
$10$
, and 
 $11$
) are similar to each other, indicating a conjunctive process for the item. Overall, an examinee should master levels
$11$
) are similar to each other, indicating a conjunctive process for the item. Overall, an examinee should master levels 
 $1$
 and
$1$
 and 
 $2$
 of the two required attributes, respectively, to optimize their success probabilities on these two items.
$2$
 of the two required attributes, respectively, to optimize their success probabilities on these two items.
 Of the possible 144 latent classes, 25 did not have any student. However, it should be noted that, although some latent classes had no observations, all the latent groups, from which estimates were derived, were nonempty, albeit some were small. For example, one of the 24 latent groups of items 3 had an expected size of 7.12. Of the remaining latent classes, 
 $122111$
 was the largest—about 35.5% of the students had this attribute pattern. Finally, the individual attribute prevalences were as follows: 88.2% mastered
$122111$
 was the largest—about 35.5% of the students had this attribute pattern. Finally, the individual attribute prevalences were as follows: 88.2% mastered 
 $\alpha _1$
; 16.2% and 68.5% were in levels 1 and 2 of
$\alpha _1$
; 16.2% and 68.5% were in levels 1 and 2 of 
 $\alpha _2$
, respectively; 11.7% and 74.7% were in levels 1 and 2 of
$\alpha _2$
, respectively; 11.7% and 74.7% were in levels 1 and 2 of 
 $\alpha _3$
, respectively; and 75.4%, 70.5%, and 63.5% mastered
$\alpha _3$
, respectively; and 75.4%, 70.5%, and 63.5% mastered 
 $\alpha _4$
,
$\alpha _4$
, 
 $\alpha _5$
, and
$\alpha _5$
, and 
 $\alpha _6$
, respectively.
$\alpha _6$
, respectively.
8 Discussion
Finer-grained feedback in the form of polytomous attributes can better inform classroom instruction and learning. However, the existing CDMs for polytomous attributes are deemed to be not general enough because most of them focused on a specific and constrained CDM or were proposed with very stringent assumptions. To this end, a more general framework, referred to as the sp-CDM, was proposed. The proposed model is a straightforward extension of the pG-DINA model (Chen & de la Torre, Reference Chen and de la Torre2013), which itself is generalized from the G-DINA model for polytomous attributes, by relaxing its SALM assumption and can be formulated using the identity, logit, and log link functions. As such, the sp-CDM includes all the existing CDMs for polytomous attributes as its special cases. This paper has also illustrated the relationships between the sp-CDM and the existing CDMs mathematically and graphically.
In addition to the theoretical illustration, the estimation of the proposed model and the consequences of using constrained polytomous-attribute CDMs were examined via simulation study. The results showed that the parameter of the proposed model can be well recovered using the proposed estimation algorithms. On the other hand, improperly fitting a constrained polytomous-attribute CDM can lead to poor item parameter estimates and misdiagnosis of students’ true mastery levels while unnecessarily fitting the complex sp-CDM does little harm to the item and person estimation, particularly when high quality items were used. Moreover, the PR assessment example demonstrated the applicability of the proposed model to real data and its advantages over the constrained models.
Despite the promising results, this study is not without its limitations. First, although the simulation study manipulated several important factors, other relevant factors such as the number of attribute levels, the distribution of attribute patterns, and the link functions were fixed. Additional simulation studies are needed in the future to investigate the performance of the proposed model across a wider range of conditions. For example, the current simulation studies used the same number of attribute levels (i.e., three) across the items, which might not always the case in practice. Nevertheless, both the model and estimation algorithms proposed in this work are sufficiently general to apply to varying and a larger number of attribute levels, as evidenced by the empirical example. It would be interesting for future studies to extend the simulation design to incorporate attributes with more, as well as varying levels. This extension would provide useful information on how the increased attribute levels, which will lead to a greater number of latent classes, affects the sample size and test length required for the parameters of the proposed model to be estimated accurately. Moreover, this work focused on the uniform distributed attribute patterns. Future studies can extend the proposed model to other attribute distributions (e.g., higher-order distribution; de la Torre & Douglas, Reference de la Torre and Douglas2004), to understand the model performance across a wider variety of conditions.
Second, it has been recognized that failure to satisfy the Q-matrix identifiability conditions can result in poor parameter estimates. The existing necessary and sufficient conditions for the identifiability of CDMs in the literature focus on dichotomous attributes. (For example, see Chen et al. (Reference Chen, Liu, Xu and Ying2015), Chen et al. (Reference Chen, Culpepper, Chen and Douglas2018), Chiu et al. (Reference Chiu, Douglas and Li2009), DeCarlo (Reference DeCarlo2011), Gu and Xu (Reference Gu and Xu2019), Liu et al. (Reference Liu, Xu and Ying2012), and Xu and Zhang (Reference Xu and Zhang2016) for the conditions for the DINA model, and Fang et al. (Reference Fang, Liu and Ying2019), Gu and Xu (Reference Gu and Xu2021), Köhn and Chiu (Reference Köhn and Chiu2018), and Xu (Reference Xu2017) for general models.) In contrast, at present, only Fang et al. (Reference Fang, Liu and Ying2019) have discussed the identifiability conditions for the Q-matrix for polytomous attributes. However, the relevant results (i.e., Theorem 4) were limited to the sufficient conditions. To optimize the process of developing assessments that involve polytomous attributes, further research is needed to establish both the necessary and sufficient conditions specific to the identifiability of the sp-CDM and potentially its special cases.
Third, the current work focuses on polytomous attributes used in conjunction with dichotomous responses. Future research should extend the sp-CDM to also cover polytomous responses (e.g., Ma & de la Torre, Reference Ma and de la Torre2020), as well as develop the associated estimation algorithms and computer program to implement such a model.
Fourth, although it has been noted that a polytomous attribute can be equivalently represented as a set of linearly structured dichotomous attributes, it is not clear to what extent the equivalence extends to methodologies that are specifically developed for each attribute type. For example, what modifications are needed for the empirical Q-matrix validation procedures developed for dichotomous attributes (e.g., de la Torre & Chiu, Reference de la Torre and Chiu2016) to be equivalent to Q-matrix validation procedures developed for polytomous attributes (e.g., de la Torre et al., Reference de la Torre, Qiu and Santos2021). Incidentally, a more general Q-matrix validation procedure that can be used with proposed model needs to be considered in future research.
Finally, this work proposes an MMLE/EM algorithm for estimating the sp-CDM and its special cases. The results of the simulation study demonstrate that the algorithm provides accurate estimates and is efficient in estimating the proposed models. However, challenges arise when the algorithm has to deal with the complexities associated with the sp-CDM in its most general form. For example, the estimation of standard errors in the saturated models becomes particularly challenging due to often encountered singular Hessian matrices.
 Furthermore, the parameter estimation becomes notably challenging in situations when the sample size is small relative to the number of attributes and attribute levels. To investigate this, an additional simulation was conducted with a sample size of 
 $500$
, maintaining the same settings as the primary simulation study. The results are given in Figure B.3 and Table B.5 in Appendix B. It was found that, under
$500$
, maintaining the same settings as the primary simulation study. The results are given in Figure B.3 and Table B.5 in Appendix B. It was found that, under 
 $K=3$
 conditions, item parameter recovery and the PCA and PCV are satisfactory. However, for
$K=3$
 conditions, item parameter recovery and the PCA and PCV are satisfactory. However, for 
 $K=5$
, although item parameter recovery and the PCA are only marginally acceptable, the PCV exhibits a significant deterioration, particularly when the item quality was low. The deterioration in the PCV performance may be attributed to the sparse latent classes—the expected number of individuals are 18 (
$K=5$
, although item parameter recovery and the PCA are only marginally acceptable, the PCV exhibits a significant deterioration, particularly when the item quality was low. The deterioration in the PCV performance may be attributed to the sparse latent classes—the expected number of individuals are 18 (
 $500/3^3)$
 and two (
$500/3^3)$
 and two (
 $500/3^5$
) when
$500/3^5$
) when 
 $K=3$
 and
$K=3$
 and 
 $K=5$
, respectively. These findings suggest that the sp-CDM may not be well suited for stand-alone small-sample settings (e.g., classroom assessment). Nonetheless, small-sample applications are still possible provided items can be calibrated a priori using a sufficiently large pool of individuals. In future research, it would be beneficial to explore the use of alternative estimation procedures such as nonparametric methods (e.g., Chiu et al., Reference Chiu, Sun and Bian2018) or Bayesian modal estimation (Ma & Jiang, Reference Ma and Jiang2021) to obtain robust person classification when polytomous attributes and small sizes are involved. It can be noted that small sample sizes impact not only the quality of item parameter estimates and attribute classification accuracy, but more so the standard error estimates. Thus, exploring various estimators of the CDM standard errors (Philipp et al., Reference Philipp, Strobl, de la Torre and Zeileis2018) in the context of the proposed model need to be considered, particularly when the sample size is small.
$K=5$
, respectively. These findings suggest that the sp-CDM may not be well suited for stand-alone small-sample settings (e.g., classroom assessment). Nonetheless, small-sample applications are still possible provided items can be calibrated a priori using a sufficiently large pool of individuals. In future research, it would be beneficial to explore the use of alternative estimation procedures such as nonparametric methods (e.g., Chiu et al., Reference Chiu, Sun and Bian2018) or Bayesian modal estimation (Ma & Jiang, Reference Ma and Jiang2021) to obtain robust person classification when polytomous attributes and small sizes are involved. It can be noted that small sample sizes impact not only the quality of item parameter estimates and attribute classification accuracy, but more so the standard error estimates. Thus, exploring various estimators of the CDM standard errors (Philipp et al., Reference Philipp, Strobl, de la Torre and Zeileis2018) in the context of the proposed model need to be considered, particularly when the sample size is small.
Funding statement
This research was funded by a General Research Fund grant (17602818) from the Hong Kong Research Grants Council.
Competing interests
The authors declare none.
Appendix A
Item parameter estimation for the saturated polytomous cognitive diagnosis model via marginal maximum likelihood with an expectation-maximization algorithm
This appendix provides details of item parameter estimation in the sp-CDM. Due to space constraints, it focuses on the identity sp-CDM in particular and the algorithms are ready to be implemented to the sp-CDM with the logit and log link functions.
 Based on the IRF of the saturated identity sp-CDM (i.e., Equation (1)) and assuming local independence, the marginalized likelihood of the data, denoted as 
 $L(\textbf {X})$
, is given by
$L(\textbf {X})$
, is given by 
 $$ \begin{align} \textit{L}(\textbf{X})=\prod_{n=1}^{N} L( \textbf{X}_n)=\prod_{n=1}^{N} \sum_{l=1}^{L} L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta})p(\boldsymbol{\alpha}_l), \end{align} $$
$$ \begin{align} \textit{L}(\textbf{X})=\prod_{n=1}^{N} L( \textbf{X}_n)=\prod_{n=1}^{N} \sum_{l=1}^{L} L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta})p(\boldsymbol{\alpha}_l), \end{align} $$
where 
 $L(\textbf {X}_n)$
 is the marginalized likelihood of the response vector of examinee n,
$L(\textbf {X}_n)$
 is the marginalized likelihood of the response vector of examinee n, 
 $L(\textbf {X}_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta })$
 is the joint probability of the examinee’s response vector
$L(\textbf {X}_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta })$
 is the joint probability of the examinee’s response vector 
 $\textbf {X}_n$
 conditional on attribute vectors
$\textbf {X}_n$
 conditional on attribute vectors 
 $\boldsymbol {\alpha }_l$
 and the item parameters
$\boldsymbol {\alpha }_l$
 and the item parameters 
 $\boldsymbol {\delta }$
, and
$\boldsymbol {\delta }$
, and 
 $p(\boldsymbol {\alpha }_l)$
 is the prior probability of the attribute vectors
$p(\boldsymbol {\alpha }_l)$
 is the prior probability of the attribute vectors 
 $\boldsymbol {\alpha }_l$
.
$\boldsymbol {\alpha }_l$
.
 The logarithm of 
 $L(\textbf {X})$
 is
$L(\textbf {X})$
 is 
 $$ \begin{align} \textit{LL}(\textbf{X})=log L(\textbf{X})=log\prod_{n=1}^{N}L( \textbf{X}_n)= \prod_{n=1}^{N}log\sum_{l=1}^{L}L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta})p(\boldsymbol{\alpha}_l). \end{align} $$
$$ \begin{align} \textit{LL}(\textbf{X})=log L(\textbf{X})=log\prod_{n=1}^{N}L( \textbf{X}_n)= \prod_{n=1}^{N}log\sum_{l=1}^{L}L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta})p(\boldsymbol{\alpha}_l). \end{align} $$
To find the marginal likelihood equation for 
 $\boldsymbol {\delta }$
 of item j, take
$\boldsymbol {\delta }$
 of item j, take 
 $$ \begin{align} \frac{\partial }{\partial \boldsymbol{\delta}_j} LL( \textbf{X} ) = 0. \end{align} $$
$$ \begin{align} \frac{\partial }{\partial \boldsymbol{\delta}_j} LL( \textbf{X} ) = 0. \end{align} $$
Then,
 $$ \begin{align} \frac{\partial }{\partial \boldsymbol{\delta}_j} LL( \textbf{X} ) & =\sum_{n=1}^{N}\frac{\partial }{\partial \boldsymbol{\delta}_j}(log L(\textbf{X}_n)) \nonumber\\ & = \sum_{n=1}^{N} [ L(\textbf{X}_n)]^{-1} \frac{\partial }{\partial \boldsymbol{\delta}_j} L(\textbf{X}_n) \\ & = \sum_{n=1}^{N} [ L(\textbf{X}_n)]^{-1} \sum_{l=1}^{L} p(\boldsymbol{\alpha}_l) \frac{\partial }{\partial \boldsymbol{\delta}_j} L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}).\nonumber \end{align} $$
$$ \begin{align} \frac{\partial }{\partial \boldsymbol{\delta}_j} LL( \textbf{X} ) & =\sum_{n=1}^{N}\frac{\partial }{\partial \boldsymbol{\delta}_j}(log L(\textbf{X}_n)) \nonumber\\ & = \sum_{n=1}^{N} [ L(\textbf{X}_n)]^{-1} \frac{\partial }{\partial \boldsymbol{\delta}_j} L(\textbf{X}_n) \\ & = \sum_{n=1}^{N} [ L(\textbf{X}_n)]^{-1} \sum_{l=1}^{L} p(\boldsymbol{\alpha}_l) \frac{\partial }{\partial \boldsymbol{\delta}_j} L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}).\nonumber \end{align} $$
 Assuming local independence, the term 
 $L(\textbf {X}_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta })$
 in Equation (A4) is given by
$L(\textbf {X}_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta })$
 in Equation (A4) is given by 
 $$ \begin{align} L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta})= \prod_{j=1}^{J} P_j(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_j(\boldsymbol\alpha_{l}^*)^{1-X_{j}}, \end{align} $$
$$ \begin{align} L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta})= \prod_{j=1}^{J} P_j(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_j(\boldsymbol\alpha_{l}^*)^{1-X_{j}}, \end{align} $$
where 
 $P_j(\boldsymbol \alpha _{l}^*)$
 is defined in Equation (1),
$P_j(\boldsymbol \alpha _{l}^*)$
 is defined in Equation (1), 
 $Q_j(\boldsymbol \alpha _{l}^*)=1-P_j(\boldsymbol \alpha _{l}^*)$
,
$Q_j(\boldsymbol \alpha _{l}^*)=1-P_j(\boldsymbol \alpha _{l}^*)$
, 
 $X_{j}=1$
 for correct answer of examinee n in item j and
$X_{j}=1$
 for correct answer of examinee n in item j and 
 $0$
 otherwise.
$0$
 otherwise.
 Hence, the derivative of 
 $L(\textbf {X}_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta })$
 becomes
$L(\textbf {X}_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta })$
 becomes 
 $$ \begin{align} \frac{\partial }{\partial \boldsymbol{\delta}_j} L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}) &=\frac{\partial }{\partial \boldsymbol{\delta}_j} \left[P_1(\boldsymbol\alpha_{l}^*)^{X_{1}} Q_1(\boldsymbol\alpha_{l}^*)^{1-X_{1}} \dots P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \dots P_{J}(\boldsymbol\alpha_{l}^*)^{X_{J}} Q_{J}(\boldsymbol\alpha_{l}^*)^{1-X_{J}} \right] \nonumber\\ &= \left[ \prod_{j'\neq j}^{J} P_{j'} (\boldsymbol\alpha_{l}^*)^{X_{j'}} Q_{j'}(\boldsymbol\alpha_{l}^*)^{1-X_{j'}} \right] \times \frac{\partial }{\partial \boldsymbol{\delta}_j} \left[ P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \right], \end{align} $$
$$ \begin{align} \frac{\partial }{\partial \boldsymbol{\delta}_j} L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}) &=\frac{\partial }{\partial \boldsymbol{\delta}_j} \left[P_1(\boldsymbol\alpha_{l}^*)^{X_{1}} Q_1(\boldsymbol\alpha_{l}^*)^{1-X_{1}} \dots P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \dots P_{J}(\boldsymbol\alpha_{l}^*)^{X_{J}} Q_{J}(\boldsymbol\alpha_{l}^*)^{1-X_{J}} \right] \nonumber\\ &= \left[ \prod_{j'\neq j}^{J} P_{j'} (\boldsymbol\alpha_{l}^*)^{X_{j'}} Q_{j'}(\boldsymbol\alpha_{l}^*)^{1-X_{j'}} \right] \times \frac{\partial }{\partial \boldsymbol{\delta}_j} \left[ P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \right], \end{align} $$
where the product is over 
 $j'\neq j$
 rather than j because the derivation is with respect to
$j'\neq j$
 rather than j because the derivation is with respect to 
 $\boldsymbol {\delta }_j$
.
$\boldsymbol {\delta }_j$
.
The second term in the right hand side of Equation of A6 is
 $$ \begin{align} &\frac{\partial P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}}}{\partial \boldsymbol{\delta}_j} \times Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} + P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} \times \frac{\partial Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}}}{\partial \boldsymbol{\delta}_j} \nonumber\\ &=X_j P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}-1} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} + (1-X_{j}) P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_j-1} \frac{\partial Q_{j}(\boldsymbol\alpha_{l}^*)}{\partial \boldsymbol{\delta}_j} \\ &= P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[ \frac{X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) } { P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right].\nonumber \end{align} $$
$$ \begin{align} &\frac{\partial P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}}}{\partial \boldsymbol{\delta}_j} \times Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} + P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} \times \frac{\partial Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}}}{\partial \boldsymbol{\delta}_j} \nonumber\\ &=X_j P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}-1} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} + (1-X_{j}) P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_j-1} \frac{\partial Q_{j}(\boldsymbol\alpha_{l}^*)}{\partial \boldsymbol{\delta}_j} \\ &= P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[ \frac{X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) } { P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right].\nonumber \end{align} $$
By substituting Equation (A7) to the right hand side of Equation (A6) and rearranging yields
 $$ \begin{align} \frac{\partial }{\partial \boldsymbol{\delta}_j} L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}) &= \left[ \prod_{j'\neq j}^{J} P_{j'} (\boldsymbol\alpha_{l}^*)^{X_{j'}} Q_{j'}(\boldsymbol\alpha_{l}^*)^{1-X_{j'}} \right] \times P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[ \frac{X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) } { P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \nonumber\\ &= \prod_{j=1}^{J} P_{j} (\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[ \frac{X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) } { P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \nonumber\\ &= L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}) \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[ \frac{X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) } { P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right]. \end{align} $$
$$ \begin{align} \frac{\partial }{\partial \boldsymbol{\delta}_j} L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}) &= \left[ \prod_{j'\neq j}^{J} P_{j'} (\boldsymbol\alpha_{l}^*)^{X_{j'}} Q_{j'}(\boldsymbol\alpha_{l}^*)^{1-X_{j'}} \right] \times P_{j}(\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[ \frac{X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) } { P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \nonumber\\ &= \prod_{j=1}^{J} P_{j} (\boldsymbol\alpha_{l}^*)^{X_{j}} Q_{j}(\boldsymbol\alpha_{l}^*)^{1-X_{j}} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[ \frac{X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) } { P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \nonumber\\ &= L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}) \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[ \frac{X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) } { P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right]. \end{align} $$
Substituting Equation (A8) to Equation (A4) yields
 $$ \begin{align} \frac{\partial }{\partial \boldsymbol{\delta}_j} LL( \textbf{X} ) &= \sum_{n=1}^{N} \left[ L(\textbf{X}_n) \right]^{-1} \sum_{l=1}^{L} p(\boldsymbol{\alpha}_l) L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}) \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) } { P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \nonumber\\&= \sum_{l=1}^{L} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{1} {P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \sum_{n=1}^{N} \frac{ L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}) p(\boldsymbol{\alpha}_l) }{ L(\textbf{X}_n) } \left[ X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) \right] \nonumber\\&= \sum_{l=1}^{L} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{1} {P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \sum_{n=1}^{N} p( \boldsymbol\alpha_{l}^* | \textbf{X}_n ) \left[ X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) \right] \\&= \sum_{l=1}^{L} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{1} {P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \left[\sum_{n=1}^{N} p( \boldsymbol\alpha_{l}^* | \textbf{X}_n ) X_{j} - P_{j}(\boldsymbol\alpha_{l}^*) \sum_{n=1}^{N} p( \boldsymbol\alpha_{l}^* | \textbf{X}_n ) \right] \nonumber\\&= \sum_{l=1}^{L} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{1} {P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \left[ R_{jl} - P_{j}(\boldsymbol\alpha_{l}^*) T_l \right],\nonumber \end{align} $$
$$ \begin{align} \frac{\partial }{\partial \boldsymbol{\delta}_j} LL( \textbf{X} ) &= \sum_{n=1}^{N} \left[ L(\textbf{X}_n) \right]^{-1} \sum_{l=1}^{L} p(\boldsymbol{\alpha}_l) L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}) \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) } { P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \nonumber\\&= \sum_{l=1}^{L} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{1} {P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \sum_{n=1}^{N} \frac{ L(\textbf{X}_n|\boldsymbol{\alpha}_l,\boldsymbol{\delta}) p(\boldsymbol{\alpha}_l) }{ L(\textbf{X}_n) } \left[ X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) \right] \nonumber\\&= \sum_{l=1}^{L} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{1} {P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \sum_{n=1}^{N} p( \boldsymbol\alpha_{l}^* | \textbf{X}_n ) \left[ X_{j}- P_{j}(\boldsymbol\alpha_{l}^*) \right] \\&= \sum_{l=1}^{L} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{1} {P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \left[\sum_{n=1}^{N} p( \boldsymbol\alpha_{l}^* | \textbf{X}_n ) X_{j} - P_{j}(\boldsymbol\alpha_{l}^*) \sum_{n=1}^{N} p( \boldsymbol\alpha_{l}^* | \textbf{X}_n ) \right] \nonumber\\&= \sum_{l=1}^{L} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{1} {P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \left[ R_{jl} - P_{j}(\boldsymbol\alpha_{l}^*) T_l \right],\nonumber \end{align} $$
where 
 $p( \boldsymbol \alpha _{l}^* | \textbf {X}_n )= \frac { L(\textbf {X}_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta }) p(\boldsymbol {\alpha }_l) }{ L(\textbf {X}_n) }$
 is the posterior probability that examinee n is in the latent group
$p( \boldsymbol \alpha _{l}^* | \textbf {X}_n )= \frac { L(\textbf {X}_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta }) p(\boldsymbol {\alpha }_l) }{ L(\textbf {X}_n) }$
 is the posterior probability that examinee n is in the latent group 
 $\boldsymbol \alpha _{l}^* $
,
$\boldsymbol \alpha _{l}^* $
, 
 $ R_{jl}= \sum _{n=1}^{N} p( \boldsymbol \alpha _{l}^* | \textbf {X}_n ) X_{j} $
 is the number of examinees in the latent group
$ R_{jl}= \sum _{n=1}^{N} p( \boldsymbol \alpha _{l}^* | \textbf {X}_n ) X_{j} $
 is the number of examinees in the latent group 
 $\boldsymbol \alpha _{l}^* $
 expected to answer the item j correctly, and
$\boldsymbol \alpha _{l}^* $
 expected to answer the item j correctly, and 
 $ T_l = \sum _{n=1}^{N} p( \boldsymbol \alpha _{l}^* | \textbf {X}_n ) $
 is the number of examinees expected to be in the latent group
$ T_l = \sum _{n=1}^{N} p( \boldsymbol \alpha _{l}^* | \textbf {X}_n ) $
 is the number of examinees expected to be in the latent group 
 $\boldsymbol \alpha _{l}^* $
.
$\boldsymbol \alpha _{l}^* $
.
Therefore, the marginal likelihood Equation (A3) can be written as follows:
 $$ \begin{align} \sum_{l=1}^{L} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{1} {P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \left[ R_{jl} - P_{j}(\boldsymbol\alpha_{l}^*) T_l \right]=0. \end{align} $$
$$ \begin{align} \sum_{l=1}^{L} \frac{\partial P_{j}(\boldsymbol\alpha_{l}^*) }{\partial \boldsymbol{\delta}_j} \left[\frac{1} {P_{j}(\boldsymbol\alpha_{l}^*) Q_{j}(\boldsymbol\alpha_{l}^*) } \right] \left[ R_{jl} - P_{j}(\boldsymbol\alpha_{l}^*) T_l \right]=0. \end{align} $$
Solving Equation (A10) yields the MML estimation of 
 $P(\boldsymbol {\alpha }_{lj}^*)$
, which can be expressed as
$P(\boldsymbol {\alpha }_{lj}^*)$
, which can be expressed as 
 $$ \begin{align} \hat{P}(\boldsymbol{\alpha}_{lj}^*)= \frac{R_{\boldsymbol{\alpha}_{lj}^*}}{T_{\boldsymbol{\alpha}_{lj}^*}}, \end{align} $$
$$ \begin{align} \hat{P}(\boldsymbol{\alpha}_{lj}^*)= \frac{R_{\boldsymbol{\alpha}_{lj}^*}}{T_{\boldsymbol{\alpha}_{lj}^*}}, \end{align} $$
 To convert the estimates of 
 $\hat {P}(\boldsymbol {\alpha }_{lj}^*)$
 into the item parameters
$\hat {P}(\boldsymbol {\alpha }_{lj}^*)$
 into the item parameters 
 $\boldsymbol {\delta }$
, the design matrix
$\boldsymbol {\delta }$
, the design matrix 
 $\textbf {M}_j$
 is needed. With
$\textbf {M}_j$
 is needed. With 
 $\textbf {M}$
, the estimates of
$\textbf {M}$
, the estimates of 
 $\boldsymbol {\delta }_j= \{\delta _{j0}$
,
$\boldsymbol {\delta }_j= \{\delta _{j0}$
, 
 $\delta _{jkm}$
,
$\delta _{jkm}$
, 
 $\ldots $
,
$\ldots $
, 
 $\delta _{jkmk'm'}$
,
$\delta _{jkmk'm'}$
, 
 $\ldots $
,
$\ldots $
, 
 $\delta _{j1{m_1}2{m_2} \dots {K_j^*}m_{K_j^*}}\}'$
 can be computed as
$\delta _{j1{m_1}2{m_2} \dots {K_j^*}m_{K_j^*}}\}'$
 can be computed as 
 $$ \begin{align} \hat{ \boldsymbol{\delta} }_j = (\textbf{M}_j^{'}\textbf{M}_j)^{-1}\textbf{M}^{'}_j\hat{\textbf{P}}_j, \end{align} $$
$$ \begin{align} \hat{ \boldsymbol{\delta} }_j = (\textbf{M}_j^{'}\textbf{M}_j)^{-1}\textbf{M}^{'}_j\hat{\textbf{P}}_j, \end{align} $$
where 
 $\hat {\textbf {P}}_j=\{\hat {P}(\boldsymbol {\alpha }_{lj}^*)\}$
. This is identical to the design matrix used with G-DINA model (de la Torre, Reference de la Torre2011). Due to the important role the design matrix
$\hat {\textbf {P}}_j=\{\hat {P}(\boldsymbol {\alpha }_{lj}^*)\}$
. This is identical to the design matrix used with G-DINA model (de la Torre, Reference de la Torre2011). Due to the important role the design matrix 
 $\textbf {M}$
 plays in the estimation of the sp-CDM, the following shows how
$\textbf {M}$
 plays in the estimation of the sp-CDM, the following shows how 
 $\textbf {M}$
 can be constructed.
$\textbf {M}$
 can be constructed.
 Under the sp-CDM, the dimension size of design matrix is 
 $M^{K_j^*} \times P_j$
, where
$M^{K_j^*} \times P_j$
, where 
 $P_j$
 is the number of the parameters of the model of interest, which is
$P_j$
 is the number of the parameters of the model of interest, which is 
 $M^{K_j^*}$
 when converting
$M^{K_j^*}$
 when converting 
 $\hat {\textbf {P}}_j$
 to
$\hat {\textbf {P}}_j$
 to 
 $\hat {\boldsymbol {\delta }}_j$
. To illustrate, let
$\hat {\boldsymbol {\delta }}_j$
. To illustrate, let 
 $K_j^*=2$
 and
$K_j^*=2$
 and 
 $M=3$
 for item j. This item has one intercept parameter, four main effect parameters for each level of the two nonzero levels of the required attributes, and four two-way interaction parameters between the levels of the required attributes, leading to a total of nine parameters. Hence,
$M=3$
 for item j. This item has one intercept parameter, four main effect parameters for each level of the two nonzero levels of the required attributes, and four two-way interaction parameters between the levels of the required attributes, leading to a total of nine parameters. Hence, 
 $P_j=9$
 in this example. The corresponding saturated design matrix is
$P_j=9$
 in this example. The corresponding saturated design matrix is 
 $$ \begin{align} \textbf{M}_{9 \times 9} =\begin{pmatrix} 1&0&0&0&0&0&0&0&0 \\ 1&1&0&0&0&0&0&0&0 \\ 1&1&1&0&0&0&0&0&0 \\ 1&0&0&1&0&0&0&0&0 \\ 1&0&0&1&1&0&0&0&0 \\ 1&1&0&1&0&1&0&0&0 \\ 1&1&1&1&0&1&1&0&0 \\ 1&1&0&1&1&1&0&1&0 \\ 1&1&1&1&1&1&1&1&1 \end{pmatrix}, \end{align} $$
$$ \begin{align} \textbf{M}_{9 \times 9} =\begin{pmatrix} 1&0&0&0&0&0&0&0&0 \\ 1&1&0&0&0&0&0&0&0 \\ 1&1&1&0&0&0&0&0&0 \\ 1&0&0&1&0&0&0&0&0 \\ 1&0&0&1&1&0&0&0&0 \\ 1&1&0&1&0&1&0&0&0 \\ 1&1&1&1&0&1&1&0&0 \\ 1&1&0&1&1&1&0&1&0 \\ 1&1&1&1&1&1&1&1&1 \end{pmatrix}, \end{align} $$
where rows 1 through 9 of 
 $\textbf {M}$
 correspond to the latent groups
$\textbf {M}$
 correspond to the latent groups 
 $00$
,
$00$
, 
 $10$
,
$10$
, 
 $20$
,
$20$
, 
 $01$
,
$01$
, 
 $02$
,
$02$
, 
 $11$
,
$11$
, 
 $21$
,
$21$
, 
 $12$
, and
$12$
, and 
 $22$
, respectively; columns 1 through 9 represent the intercept followed by the four main effects, and then by the four two-way interaction effect. For example, the success probability of the latent group
$22$
, respectively; columns 1 through 9 represent the intercept followed by the four main effects, and then by the four two-way interaction effect. For example, the success probability of the latent group 
 $11$
 (i.e., row 6) is
$11$
 (i.e., row 6) is 
 $\delta _{j0}+\delta _{j1m_1}+\delta _{j2m_1}+\delta _{j1m_1 2m_1}$
, whereas that of the latent group
$\delta _{j0}+\delta _{j1m_1}+\delta _{j2m_1}+\delta _{j1m_1 2m_1}$
, whereas that of the latent group 
 $22$
 (i.e., row 9) is
$22$
 (i.e., row 9) is 
 $\delta _{j0}+\delta _{j1m_1}+\delta _{j1m_2}+\delta _{j2m_1}+\delta _{j2m_2}+\delta _{j1m_1 2m_1}+ \delta _{j1m_2 2m_1}+\delta _{j1m_1 2m_2}+\delta _{j1m_2 2m_2}$
.
$\delta _{j0}+\delta _{j1m_1}+\delta _{j1m_2}+\delta _{j2m_1}+\delta _{j2m_2}+\delta _{j1m_1 2m_1}+ \delta _{j1m_2 2m_1}+\delta _{j1m_1 2m_2}+\delta _{j1m_2 2m_2}$
.
The implementation of the MMLE/EM algorithm is as follows:
Step 1: The expectation (E) step
(1) Use the Equation (A5) and provisional item parameter estimates to compute the likelihood of each examinee’s response vector at each of the L attribute patterns.
(2) Use the Equation (A1) to compute the likelihood of the whole data. For convenience, the uniform distribution is usually chosen to initialize the prior distribution, as in, 
 $p(\boldsymbol {\alpha }_l) = 1/L$
 in the first iteration. In subsequent iterations, the prior distribution is updated by replacing it with the posterior distribution, which itself is updated at the end of each iteration.
$p(\boldsymbol {\alpha }_l) = 1/L$
 in the first iteration. In subsequent iterations, the prior distribution is updated by replacing it with the posterior distribution, which itself is updated at the end of each iteration.
It is not uncommon that many examinees can have the same response vectors to the J items in a data set and the computations of 
 $L(\textbf {X}_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta })$
 are often replicated. To increase the computational efficiency, an alternative way to compute the likelihood of data starts from grouping the N examinees’ response vectors into
$L(\textbf {X}_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta })$
 are often replicated. To increase the computational efficiency, an alternative way to compute the likelihood of data starts from grouping the N examinees’ response vectors into 
 $2^J$
 possible response patterns. Let the item response patterns are denoted as
$2^J$
 possible response patterns. Let the item response patterns are denoted as 
 $U_s$
 (
$U_s$
 (
 $s=1,\ldots ,2^J$
) and the number of examinees possessing response pattern s is given by
$s=1,\ldots ,2^J$
) and the number of examinees possessing response pattern s is given by 
 $f_s$
. The likelihood of data can be calculated as
$f_s$
. The likelihood of data can be calculated as 
 $L(\textbf {X} )= \left [ \sum _{l=1}^{L}L(U_s|\boldsymbol {\alpha },\boldsymbol {\delta })p(\boldsymbol {\alpha _l} \right ]^{ f_s} $
.
$L(\textbf {X} )= \left [ \sum _{l=1}^{L}L(U_s|\boldsymbol {\alpha },\boldsymbol {\delta })p(\boldsymbol {\alpha _l} \right ]^{ f_s} $
.
(3) Count the number of examinees expected to be in the latent group 
 $\boldsymbol \alpha _{l}^* $
 among
$\boldsymbol \alpha _{l}^* $
 among 
 $M^{K_j^*}$
 latent groups and the number of examinees in the latent group
$M^{K_j^*}$
 latent groups and the number of examinees in the latent group 
 $\boldsymbol \alpha _{l}^* $
 expected to answer the item j correctly and use them as the values of
$\boldsymbol \alpha _{l}^* $
 expected to answer the item j correctly and use them as the values of 
 $T_{jl}$
 and
$T_{jl}$
 and 
 $ R_{jl}$
 in Equation (A9), respectively.
$ R_{jl}$
 in Equation (A9), respectively.
Step 2: The maximization (M) step
Solve the likelihood Equation (A3) using the values of 
 $T_{jl}$
 and
$T_{jl}$
 and 
 $ R_{jl}$
. Because these values depend on
$ R_{jl}$
. Because these values depend on 
 $L(X_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta } $
), which in turn, depends on the unknown item parameters, the likelihood equations are implicit and must be solved with an iterative procedure(e.g., Newton-Raphson procedure).
$L(X_n|\boldsymbol {\alpha }_l,\boldsymbol {\delta } $
), which in turn, depends on the unknown item parameters, the likelihood equations are implicit and must be solved with an iterative procedure(e.g., Newton-Raphson procedure).
The E step and M step will be repeated unless certain criteria are met (e.g., the change of likelihood between two successive cycles is less than 0.001, the maximum number of iterations, say, 100 is reached).
Appendix B
Q-matrix with five attributes and additional results in simulation study
Table B.1 Q-matrix for conditions of five attributes in simulation study


Figure B.1 Bias in parameter recovery with five attributes.Note: sp-CDM: saturated polytomous cognitive diagnosis models; fA-M: fully additive model for polytomous attributes; pG-DINA: generalized deterministic input, noisy “and” gate model for polytomous attributes with the specific attribute level mastery (SALM) assumption. J: test length; N: sample size.

Figure B.2 Room mean square error (RMSE) in parameter recovery with five attributes.Note: sp-CDM: saturated polytomous cognitive diagnosis models; fA-M: fully additive model for polytomous attributes; pG-DINA: generalized deterministic input, noisy “and” gate model for polytomous attributes with the specific attribute level mastery (SALM) assumption. J: test length; N: sample size.
Table B.2 Correctly classified attributes (PCA) and vectors (PCV) (in %) with five attributes and high quality items

Note: Gen: Generating model; EC: Evaluation criteria; 1: sp-CDM; 2: fA-M; 3: pG-DINA; PCAk: PCA of attribute k.
Table B.3 Correctly classified attributes (PCA) and vectors (PCV) (in %) with five attributes and moderate quality items

Note: Gen: Generating model; EC: Evaluation criteria; 1: sp-CDM; 2: fA-M; 3: pG-DINA; PCAk: PCA of attribute k.
Table B.4 Correctly classified attributes (PCA) and vectors (PCV) (in %) with five attributes and low quality items

Note: Gen: Generating model; EC: Evaluation criteria; 1: sp-CDM; 2: fA-M; 3: pG-DINA; PCAk: PCA of attribute k.

Figure B.3 Results of parameter recovery for the sp-CDM with 
 $N=500.$
Note: sp-CDM: saturated polytomous cognitive diagnosis models; J: test length; N: sample size.
$N=500.$
Note: sp-CDM: saturated polytomous cognitive diagnosis models; J: test length; N: sample size.
Table B.5 Correctly classified attributes (PCA) and vectors (PCV) (in %) for the sp-CDM with 
 $N=500$
$N=500$

Note: sp-CDM: saturated polytomous cognitive diagnosis models; EC: Evaluation criteria; H: High quality items; M: Moderate quality items; L: Low quality items; PCAk: PCA of attribute k.
 
 


























