This article describes a conceptual and psychometric scheme for distinguishing the categorical versus dimensional nature of psychological variables. By "psychological variables" we mean variables used to distinguish between entities in some psychological respect. These entities are commonly persons, but they can be also situations, tasks, test items, and so on. The scheme arose out of frustrations with instances of psychological research that had either assumed or "proved" that variables were of one kind or the other without examining the philosophical or empirical basis for doing so and without an overarching framework in which either was genuinely possible. In this article, we provide such an overarching framework, called the dimension/category framework, also called Dimcat, and provide empirical illustrations of its use.
A preliminary distinction in determining whether variables are category-like or dimension-like is the distinction between manifest variables and latent variables. Too often these two kinds of variables are confused, which can lead to inappropriate conclusions. Specifically, researchers may confuse manifest categories or dimensions, which are artifacts of the measurement approach, with latent categories or dimensions, which are typically the underlying psychological phenomena of interest.
The issue under consideration here is whether the latent nature of manifest variables is category-like or dimension-like. One assumption might be that the nature of the latent and manifest variables match. As discussed below, however, manifest dimensions can be turned into manifest categories (e.g., in segmentation into groups) and manifest categories into manifest dimensions (e.g., in sum scores on a test). Thus, the relations between different kinds of manifest variables and between different kinds of manifest and latent variables are not so simple as they might at first appear. Consequently, a conceptual and methodological framework that encompasses all of these possibilities is needed.
Manifest dimensions (or manifest continua) are common in psychological research, although their dimensional nature may be only a convenient fiction. For example, raw scores on a test (e.g., number of correct responses) are ordered manifest categories, yet they are commonly seen as approximating a manifest dimension. Items on a test are examples of indicators in the same way that symptoms in a diagnostic system are indicators, although these different kinds of indicators are typically put to very different uses. Whereas items are typically summed to produce a manifest dimension, symptoms are typically summed to produce a manifest category (a diagnosis). To complicate matters, a manifest dimension based on item sums may also be segmented (e.g., using a median split) to produce a manifest category, or the sum of symptoms may be used as an indicator of the extent to which patients show a syndrome. It should be apparent from this discussion that manifest categories and manifest dimensions can be functionally interchangeable and thus arbitrary.
Latent dimensions are quantitative variables with values that depend on the person and that in one way or another contribute to the observations, either directly or indirectly via the effect the quantitative variable has on the probability of the responses. For a discussion of the epistemological status of latent variables, see Borsboom, Mellenbergh, and van Heerden (2003). Latent dimensions are invoked as underlying quantities that determine data or functions thereof such as the sum score. For example, in classical test theory, a true score (latent dimension) is believed to be at the basis of the sum score of a test (manifest dimension), except for distortions due to the so-called error term. Latent dimensions are implicit whenever concepts like internal-consistency reliability are used--that is, in virtually all tests of psychological phenomena. The underlying variables in factor analysis models, structural equation modeling (SEM), and item response theory (IRT) are not manifest but latent dimensions.
Manifest categories are also common in psychological research, as independent or dependent variables. Regardless of whether the categorical variables are independent or dependent variables, they are often (but not always) rooted in, derived from, based on, or linked to some manifest or tacit indicators from the same domain. Indicators need to be either directly or indirectly observed in order to derive a manifest category from them.
A manifest category is commonly derived from indicators through either segmentation or expert judgment. Segmentation means that one indicator or a composite of indicators (e.g., a sum score on a test) is segmented into different manifest categories. Some segments may be omitted, as in the method of extreme groups, in which the middle segment is omitted. Expert judgment means that an expert attributes manifest categories on the implicit or explicit basis of knowledge regarding the values of indicators. For example, a psychiatric diagnosis is based on knowledge of the symptoms. Diagnostic systems such as the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; American Psychiatric Association, 1994) provide the expert with explicit rules based on the sum score obtained from a list of symptoms, but often the expert does not literally follow such rules but rather relies on tacit indicators. Another example of expert judgment is when people judge themselves on a trait (e.g., "I am shy") or on an attitude (e.g., "I am against capital punishment"). In this case, people are thought to be expert judges regarding more specific, possibly tacit, indicators about themselves that indicate a trait, attitude, or another underlying variable.
One may wonder whether a manifest category resulting from segmentation or expert judgment is in any sense more than an arbitrary segmentation of an underlying dimension. For example, it is a common practice to determine cut-off scores, such as those used to distinguish between depressed and non-depressed persons. The fact that categories are used does not prove that the phenomenon the categories refer to is category-like. The use of categories may be purely pragmatic. When a category-like manifest variable is used, it is important to know the nature of this variable in order to interpret results obtained with it. Manifest categories (e.g., a diagnosis) can correspond to either qualitative differences or quantitative differences. The basic issue is whether the categories at the manifest level are category-like or dimension-like in the latent structure. The complementary issue, whether a manifest dimension (e.g., a sum score) is category-like or dimension-like in the latent structure, is a legitimate question but is not addressed directly here; its answer requires the use of latent class or latent profile models (e.g., see Wilson  for a discussion). Thus, the present paper is asymmetric: Given manifest categories, we attempt to answer whether they are really category-like in the latent structure. If they are category-like at the latent level, then they have the properties of latent categories.
The issue we want to investigate parallels an issue in cognitive psychology and linguistics, particularly with respect to the nature and meaning of the categories and words we use in daily life. For example, is the concept behind the category trees really category-like, or does it correspond better to a dimension of tree-ness? When the categories are persons--for example, the category of psychiatric patients--and when human cognition intervenes in category assignment (as with expert judgment) then the similarity is even more relevant. The commonalities and differences between our research questions and those of the domain of concepts and categories are illustrative for what we plan to do. Hereafter, in Section 1, we summarize the research on categories and concepts and how it applies to our topic.
These conclusions contradict the so-called classical (Aristotelean) view. This classical view was described by Rosch (1978) and by Smith and Medin (1981) and defended by Sutcliffe (1993). Wittgenstein (1953) was the first prominent thinker to doubt the classical view. Rosch (1975, 1978) and Smith and Medin (1981) have clearly explained and demonstrated empirically why the classical view is invalid. Alternative theories have been developed in order to explain that people do not use definitions and that categories are gradual instead. These theories are also meant to explain a wide variety of phenomena such as category decisions, category learning, category-based induction, memory for exemplars, and so on (for overviews, see Komatsu, 1992; Medin & Coley, 1998; and Murphy, 2002). We concentrate here on decisions about category membership, because we want to investigate the nature of what we call a manifest category based on the attribution of a category label--for example, a personality disorder diagnosis, a self-description as being against capital punishment, or assignment to a developmental stage.
The first theory states that category membership is derived from the similarity of an element to the prototype of the category. This theory is called the prototype theory. For a description, see Hampton (1995). The similarity is based on a weighted sum of features present in the element in question. The features and their weights are the content of the prototype. The prototype is of an abstract nature, unless it is instantiated in an extant exemplar. The weighted sum is a continuous variable to be dichotomized in order to decide on category membership or to be used as an input for a choice rule if the decision is between two or more categories.
The second theory states that category membership is determined on the basis of similarity with earlier encountered exemplars from one's (possibly unconscious) memory. This theory is called the exemplar theory. Two well-known elaborations of this view are the Context Model (Medin & Shaffer, 1978) and the Generalized Context Model (Nosofsky & Palmeri, 1997). It is assumed in these models that for a similarity to be high, it needs to be high on all features, and that high similarities have a larger weight than low similarities. The Generalized Context Model is formulated in a rather general way, using various kinds of free parameters, so that it can adapt a lot of phenomena while still having similarities to the category exemplars as its core. Empirical comparisons of the prototype theory and the exemplar theory for category decisions tend to favor the exemplar theory (e.g., Medin & Coley, 1998; Murphy, 2002), including when natural categories are studied (Smits, Storms, Rosseel, & De Boeck, 2002; Storms, De Boeck, & Ruts, 2000).
A third theory is not formulated in a formalized way as are the previous two but must be seen as providing an explanation for the shortcomings of these two. This theory is called the knowledge approach (Murphy, 2002; Murphy & Medin, 1985) or the explanation-based theory (Komatsu, 1992). In this theory it is stressed that categories are embedded in a broader knowledge about the world and that this knowledge plays an important role in how we deal with categories and understand them.
Medin and Coley (1998) and Murphy (2002) noted that an important shortcoming of the prototype theory and the exemplar theory is their neglect of feature relations. That the internal structure of categories has been a neglected topic in the study of categories and concepts is not difficult to explain from the basic conjecture by Rosch, Mervis, Gray, Johnson, and Boyes-Braem (1976) that categories pick up correlations between features to maximize the informative value of categorization. Categories are clusters of entities based on the correlations between features in a much larger, between-category space. The implication is that categories explain the correlations away (in a statistical sense, not in a causal sense), so that not much correlation is left within the categories. The conjecture of Rosch et al. (1976) is primarily meant for so-called basic-level categories, not for so-called subordinate and superordinate categories. The association of categories with correlated features (in the between-category space) was empirically corroborated by Devlin et al. (1998) and Tyler, Moss, Dunant-Peatfield, and Levy (2000). Categories defined on the basis of correlated features were found to be more robust against cognitive and neuropsychological deficits--they seem to be stronger categories.
In contrast with the prototype theory and the exemplar theory, an interesting strength of the knowledge approach is that feature relations are recognized--they are part of the knowledge. For example, we know that wings help a lot to fly, so that a correlation between wings and flying is a quite natural cognition. This correlation is primarily based on between-category differences. Some categories of animals fly and have wings (various kinds of insects, bats, etc.), whereas other categories of animals do not fly and do not have wings either (elephants, spiders, snails, humans, etc.). But within-category correlation is also no problem for the knowledge approach. For example, for vegetables there is a correlation between being green and growing above the ground. The correlation is not perfect (for example, if one counts tomatoes as vegetables), but the exceptions are rare. Basic biological knowledge can explain the correlation between the green color and growing above the ground. The role that feature relations play in a knowledge approach is that they are quite natural and explained from knowledge one has about the world. No formal theory about feature correlations is developed within the knowledge approach, however, perhaps because there is no compelling evidence for within-category feature correlations to play a role in explaining typicality and category decisions (Murphy, 2002). The evidence in support of a feature-correlation effect is at best rather weak (Malt & Smith, 1984).
It can be concluded from this short overview that categories are considered heterogeneous in two senses: exemplars differ as to how typical they are of the category (typicality differences), and categories can have an internal structure that strongly deviates from a homogeneous uncorrelated structure (structural differences). The internal structure aspect has been somewhat neglected in the prototype theory and the exemplar theory, but it is stressed in the knowledge approach and in the more linguistic approaches, such as that of Lakoff (1987). Various kinds of internal structures have been described by Storms and De Boeck (1997): one that corresponds to a chainlike structure as described by Lakoff (1987), and another that corresponds to a within-category dimension-like structure: a triangular structure as in a Guttman scale.
Although the cognitive nature of categories and the linguistic meaning of lexicalized categories is not the topic or our investigation, the results we briefly discussed are nevertheless important because the ingredients are the same as for our topic of interest. In all of our studies, we have elements (persons) that are categorized (the manifest categories) on the basis of features (the indicators). The ingredients are the same, but our research question is different. We are not interested in the cognitive representation or the semantic structure of the categories but in their formal representation in a category-like or dimension-like structure. The two kinds of structure do not necessarily coincide. The issue we want to formulate more precisely in order to study it systematically is whether or not manifest categories (categories as assigned) can be represented as nothing more than following from cut-offs along a continuum. It is possible that this formal representation is not reflected in the cognitive representation. It has been speculated, for example, that humans tend to think in terms of internal essences (Medin, 1989), which would tend to predispose them toward category-like mental representations of concepts such as mental disorders, whereas the formal representation is an empirical question that may actually be dimension-like.
An interesting link between our research topic and the one from cognitive psychology is that, for both, two types of continua must be distinguished. The first type describes the typicality differences between category members without an internal structure for the category. To explain the first type, let us assume that all category exemplars are alike in that they show the category features with a probability of, say, .60 and that the features are uncorrelated. This is actually in line with the well-known latent class model (Goodman, 1972; Green, 1952; McCutcheon, 1987). All category members are equal at the latent level in that they share common feature probabilities. The implications of the assumptions are independence of features and heterogeneity of the exemplars in terms of the features. The features are independent within the category, because they are realized through a mechanism that is independent from feature to feature, following the assumption we made. That the features are uncorrelated also means that the category has no internal structure in the sense of within-category correlations between features.
Looking at the realized features, one will notice that the exemplars are heterogeneous, that they have quite different feature patterns, because of the stochastic nature of the feature realization. In fact, the probability for two exemplars to share a given feature is only .36. When a category decision is to be made, one can expect that the exemplars with more of the features (as a stochastic result) will be considered category members with more certainty, and that their typicality will be considered higher than that of exemplars with an accidentally lower number of features. The equivalent of this is the posterior probability of class membership given the feature realizations. This posterior probability continuum does not represent anything in the latent level--it merely picks up a characteristic of the realization of the latent structure. We will call the resulting kind of continuum a purely manifest continuum. It is the illusory effect of a homogeneous process, the same process that leads to independent features and lack of within-category structure.
Remarkably, the kind of categories just described is in line both with the classical view and with the common belief that categories are gradual and have no clear cut-off, depending on the level at which one looks. All exemplars are alike at the latent level, which is in conformity with the classical view, and the exemplars show heterogeneity at the manifest level, which is in conformity with the now common belief that the classical view is wrong. Only the first of the types of heterogeneity mentioned earlier is realized, however (typicality differences). The aspect that is neglected in prototype theory and exemplar theory is neglected here as well (structural differences). Following the first type of heterogeneity, categories are heterogeneous in that not all exemplars are equally good exemplars, but not so far as the (latent) internal structure is concerned.
To explain the second kind of continuum, let us assume that the exemplars differ in the true probabilities of showing the category features. Suppose the probabilities are again high but that they depend on the exemplar (in the range from, say, .60 to .90), and that the feature realization mechanism is again independent from feature to feature. The exemplars are now heterogeneous at the latent level, because some have higher feature probabilities than others. The consequences of these assumptions are correlated features and even more heterogeneous feature patterns. The features are all positively correlated now because they all tend to occur more in some exemplars (because of their higher probability) and less in other exemplars (because of their lower probability). These correlations stem from differences in probability, notwithstanding the independence of the realization mechanism, which is called local or conditional independence in the statistical literature and is a basic assumption in most statistical models. The resulting categories now have an internal structure, a one-dimensional structure. When we make the more realistic assumption that not just the exemplars but also the features have an effect on the probability that an exemplar shows the feature, then the stochastic version of the earlier described triangular structure would be obtained. For example, for psychiatric diagnoses it would mean that some symptoms have higher probabilities than other symptoms. Mild symptoms commonly have a higher probability than severe symptoms. When patients differ in a systematic way, some patients may have the more severe symptoms as well as the milder ones, whereas others may have only the milder symptoms. A one-dimensional internal structure can be a rather good approximation of reality, for example for the borderline personality disorder. For example, Sanislow et al. (2002) showed that three latent dimensions underlay the borderline symptoms from the DSM-IV, but also that the intercorrelations of these dimensions are higher than .90, and can reach even .99.
Looking at the realized feature patterns, three sources of differences now come into play. First, the exemplars differ randomly because of the stochastic nature of the feature realization. Second, the exemplars differ systematically because of the level of the generating probabilities. The number of category features an exemplar now shows reflects both the stochastic nature of the process and a systematic difference at the latent level. Third, the features can also have an effect. The second and the third source determine the probability a feature has for a given exemplar. This probably reflects something about the exemplar (how high its probabilities are overall) and something about the feature (how common it is). It is then possible to separate and estimate the contribution from the three sources: the stochastic source, systematic differences between exemplars, and systematic differences between features.
This idea of separating and estimating the three parts is exactly the idea behind a model from a quite different domain, called item response theory (IRT), as will be explained later. The newly derived continuum for the exemplars, their overall level of probability, is no longer a surface continuum or an illusory continuum--it is rooted in the underlying latent structure. The number of features is still a manifest continuum, but now it expresses more than a stochastic mechanism. It also reflects systematic underlying differences between the exemplars--the latent contributions of the exemplars to the feature probabilities. The continuum of the systematic underlying differences is not a manifest continuum but a latent continuum. It corresponds to the earlier mentioned second type of heterogeneity: structural differences.
This second formal theory of categories, which implies a latent continuum, is no longer in agreement with the classical view, because the exemplars are no longer homogeneous at either the manifest level or the latent level. The theory is in clear agreement, however, with the now common belief that cognitive categories are gradual and have no clear cut-off. Furthermore, both types of heterogeneity described earlier are now realized. Categories are heterogeneous not only in that not all exemplars are equally good exemplars but also due to the internal structure. This kind of within-category structure can be linked to the notion of "fuzzy categories," as discussed by Haslam and Kim (2002) and as tested empirically with taxometric methods by Haslam and Cleland (2002). It should be clear that what we mean by a latent continuum is variation at the latent level and not just at the manifest level. From the way the fuzziness is created by Haslam and Cleland (2002), it can be concluded that this condition is fulfilled.
Thus, one way of framing the issue of whether or not a category is basically dimension-like is by asking the question whether categories have a latent continuum or just a purely manifest continuum. The results of the studies on categories and concepts cannot answer this question so far as the cognitive representation is concerned, because, as explained, differences in how good exemplars are as exemplars and other effects can either stem from the stochastic nature of a homogeneous latent process or from a genuinely heterogeneous latent process and a similar stochastic component as for the homogeneous process.
Hereafter, in Section 2, a frame of reference is described for what it means for the latent structure behind a manifest category to be homogeneous or heterogeneous and for manifest categories to indicate qualitative differences or quantitative differences. Along with this frame of reference comes an approach for modeling data and deciding in what sense their structure is category-like or dimension-like. In Section 3, three empirical applications are described to illustrate the approach.
When the manifest categories are homogeneous, the qualitative differences cannot concern the discrimination of the indicators, because there is nothing to discriminate within the category. Only the indicator locations remain as a potential source of qualitative differences. Given that the locations refer to the levels of the indicators, qualitative differences imply that the indicator level profiles differ from one manifest category to another in more than just the overall level. For example, the symptom profile of the histrionic personality disorder may differ from that of the borderline personality disorder in a qualitative way and not just with respect to its overall lower level of borderline symptoms.
In the case of within-category heterogeneity, quantitative differences between manifest categories mean that the latent dimension is the same (same discriminations and/or locations) when applied to members of different manifest categories, and that the distribution of one manifest category is located at a lower level than the distribution of the other category on the same dimension. In the case of homogeneity (no variance in person locations), quantitative differences mean that the common category level of the indicator profiles differs depending on the manifest category. For example, it would be reasonable to expect that the preponderance of borderline symptoms is higher in the borderline category than in the histrionic category. The difference is that the quantitative differences can be explained as one manifest category having more or less of the same thing as the other, whereas qualitative differences never can be explained in this way. Qualitative differences concern the anchoring of dimensions with indicators (with respect to discriminations and/or locations): Differently anchored dimensions are different. Considering the contrast between quantitative and qualitative between-category differences, qualitative differences may be considered more category-like than quantitative differences and quantitative differences more dimension-like than qualitative differences.
These two contrasts--heterogeneity versus homogeneity and qualitative versus quantitative differences--can be crossed, as in Figure 1, to make a 2 x 2 classification. This classification is the framework that will later be used to explicate the relation between a category-like versus dimension-like latent structure for manifest categories.
In the upper left part, two different latent dimensions are shown, one for each of two different heterogeneous manifest categories. The heterogeneity is represented with a normal distribution for each category, although normality of the distributions is not required. In the upper right part, the heterogeneity is represented along one common latent dimension. The difference between the two manifest categories is either large (and abrupt) or small (and smooth), as will be explained hereafter. In the lower left part, two different latent dimensions are again shown, one for each manifest category, but now there are no individual differences within the manifest categories. The within-category homogeneity is represented with a narrow bar. Finally, in the lower right part, the two manifest categories are again located along one common latent dimension, but now the two manifest categories are homogeneous, as represented with two bars. Given that in the two lower parts the manifest categories are homogeneous, the between-category differences are abrupt, as will be explained hereafter.
In the upper right part of Figure 1, two pairs of normal distributions are shown. The distributions on the left are rather far apart, far enough for the distributions to result in a bimodal distribution when they are added into a joint distribution. The distributions on the right are close enough in order to result in a unimodal joint distribution.
The contrast between smooth versus abrupt differences cannot be considered a fundamental dichotomy compared to the heterogeneous/homogeneous and qualitative/quantitative dichotomy. The smooth/abrupt dichotomy is only relevant for quantitative between-category differences and is entirely based on the size of the difference between manifest categories and their within-category standard deviations, and it is therefore of a gradual kind. But even when there is a gap between the distribution of two manifest categories along the latent continuum, the distinction between the two is still purely quantitative and can be expressed in terms of more or less of the same thing.
The notation for raw scores is as follows:
Xpik = 0, 1, with
p = 1, …, P, an index for the persons,
i = 1, …, I, an index for the indicators, and
k = 1, …, K, an index for the manifest category to which a person belongs.
When the indicators are symptoms, Xpik = 1 means that person p from category k is attributed indicator i. The notation for the manifest categories is Cp = k, meaning that person p is assigned to manifest category k. In this model, persons are nested within manifest categories.
The manifest category, C, can be a random variable, or it can have fixed values. In a similar way, the parameters from the model to be presented can be either random or fixed. By convention, in formulas we will not condition on C or on parameters, as the conditioning makes sense only for random variables. The fact that the formulas are not given in their conditional format does not imply, however, that C or one or more parameters cannot be random variables.
P(Xpik = 1) = exp(bik) / (1 + exp(bik)), (1)
hpik = bik , (2)
where hpik = log(P / Q), p = P(Xpik = 1), and p = 1- Q.
It follows from Equation 2 that the b's are nothing more than logistic transformations of the probabilities of showing symptom i in category k: bik = log(P / Q). These logistically transformed probabilities can be represented as the locations of the indicators that can function as anchors on a possible latent dimension. Thus far, all persons have the same set of probabilities. The b's will also be called prevalences, as they indicate the occurrence of symptoms.
Person differences can be introduced into Equation 1 by substituting qpk - bik for bik, with qpk denoting the parameter of person p from category k--this locates the persons on the same scale as the indicators; for example, this locates the patients on the same scale as the symptoms, so that the difference between the location of person p and indicator i determines the probability of a response 1 for a person in category k:
P(Xpik = 1) = exp(qpk - bik) / (1 + exp(qpk - bik)), (3)
hpik = qpk - bik . (4)
With the inclusion of a q parameter, the values of the b's are identified only up to an additive constant--one can add a constant to all b's on the condition that the same constant is added to all q's. Note that the minus sign in Equations 3 and 4 is in a way arbitrary--it could as easily be a plus sign, but the minus sign is the usual convention.
For the example of personality disorders, qpk reflects the severity of person p on the latent dimension as it applies to diagnosis k, and -bik reflects the prevalence of symptom i for diagnosis k. They both contribute to hpi and the probability of a 1. One further complication is that the severity is not equally important for all symptoms. To reflect this difference, the equation is adapted as follows:
P(Xpik = 1) = exp(aikqpk - bik) / (1 + exp(aikqpk - bik)), (5)
hpik = aikqpk - bik , (6)
where ajk denotes the weight of qpk in determining the probability of the Xpik values.
Equations 5 and 6 represent the two-parameter logistic (2PL) model (Birnbaum, 1968) for a manifest category k and are the most general model that we will use to illustrate Dimcat. Note that in the formulation of the 2PL as in Equation 6, for each indicator a category-specific linear regression equation is obtained with the underlying within-category dimension as a predictor, with aik as its weight, and with bik as an intercept. See the Appendix for an alternative parameterization.
All other models that we will consider follow from restrictions on Equation 6. This model is general in the sense that it can generate all cases in the framework of Table 1 by choosing the appropriate restrictions.
Before formulating these restrictions on Equation 6 in order to obtain distinct latent structures, we will make use of that equation to characterize two important but distinct features of a dimension: location equivalence and discrimination equivalence. Both types of equivalence are necessary for two dimensions to be identical (i.e., for dimension equivalence). A latent dimension is defined by the location of the indicators and, if individual differences appear, then also by the weights of the indicators. A difference between manifest categories in either the location of indicators or their weights, or in both, means that the dimensions differ, unless the difference can be attributed to varying reliability. This is a special case that we will not discuss here any further, but it will be taken into account in the Appendix.
1. Equivalent dimensions must have equal locations for the indicators. The latter will be called location equivalence. Because the location parameters are identified only up to an additive constant, location equivalence refers to equality of the location parameters only up to an additive constant, implying that the differences between the indicator locations on the latent dimension are crucial for location equivalence. If marks on the scale do not correspond, then the meaning of the dimensions also differs. This first aspect of a latent dimension is independent of individual differences among persons.
2. Equivalent dimensions must have indicators with weights (discriminations) that do not depend on the dimension. Equality of discrimination parameters is called discrimination equivalence. If the differentiation capacity of an indicator depends on the manifest category, then the meaning of the dimensions differs between the manifest categories. Note that the discriminations are identified only up to a multiplicative constant: multiplying the discriminations with a constant is compensated by dividing the variance of the underlying dimension by the squared value of the same constant. This second aspect of a dimension makes sense only if there are individual differences among persons, because the a's are the weights of latent individual differences (in terms of q).
The notions of location equivalence and discrimination equivalence are related to the notions of factorial equivalence, measurement invariance, and differential item functioning (DIF). Factorial equivalence is of relevance here, because Takane and de Leeuw (1987) showed that the factor model results when the normal-ogive function is used in place of the logistic function used above, and because the two functions are practically identical except for a different slope. Often in factor analysis one is not interested in the means, and the model is then formulated for within-category deviation values (with a mean of zero), so that factorial equivalence is limited to the factor loadings. We will refer to this notion of factorial equivalence as factorial equivalence in the limited sense. Both Reise et al. (1993) and Meredith (1993), however, pointed out that the full factor model includes an explanation for the means, so that factorial equivalence in this broader (and full) sense includes location equivalence as well. Reise et al. (1993) distinguished between full invariance and partial invariance (see Byrne, Shavelson, & Muthén, 1989). Both are related to the factor loadings, independently of the factor variances and covariances. Full invariance means category-invariant loadings for all variables, whereas partial invariance implies that a substantial amount of the loadings are invariant so that a common metric can still be used. For binary indicators, the factor analytic or structural equation model for binary items would be equivalent to an IRT model but of the normal-ogive type instead of the logistic type (Bock, Gibbons, & Muraki, 1988; Muthén, 1984). A logistic variant is described by McKinley and Reckase (1983).
As to measurement invariance, Reise et al. (1993) referred to the same two aspects we have discerned in dimension equivalence. Meredith (1993) started from a definition stating that the cumulative distribution function of the measurement indicators may not depend on external factors beyond the underlying latent variables one assumes to explain the indicators. Simply stated, the measurement of intelligence, for example, may depend only on intelligence and not also on external factors, such as one's ethnicity. Invariance refers to all aspects of the cumulative distribution (expected value, variance, and higher moments), and implies both location and discrimination equivalence.
Lack of location equivalence is called uniform DIF in test theory, and lack of both location equivalence and discrimination equivalence is called non-uniform DIF (e.g., Holland & Wainer, 1993). Methods to detect DIF are described in the literature (Holland & Wainer, 1993; Millsap & Everson, 1993), but some of the DIF tests do not distinguish between unequal locations and discriminations.
In summary, discrimination equivalence refers to the indicator-specific slope of the equation (ajk). Location equivalence refers to the indicator-specific intercept of the equation (-bik). Location equivalence is sometimes not investigated in empirical studies in the literature, because the factor model is used not in its full formulation but rather for deviation transformed variables (Reise et al., 1993).
1. In the first type of latent structure (corresponding to the upper left cell in Table 1), the latent dimensions are qualitatively different depending on the manifest category, and the persons are heterogeneous within manifest categories. An example would be categories of athletes defined on the basis of the kind of sport, with performance levels as indicators. These categories would be between-sports categories with performance indicators. Within each category there are clear and systematic quantitative differences in their performances, and from one category to the other there are qualitative differences in the kind of performances at which they are good.
As far as the modeling is concerned, no restrictions on Equation 6 are introduced, and it is therefore reflected in the general Equation 6, where for k and k', the bik are allowed to differ from the bik', and the aik are allowed to differ from the aik'. This first type will serve as the reference type in the presentation of the other types, given that all others can be defined as restrictions on this one. In this first type, there is continuity within each qualitatively distinct category. Because the latent dimension differs depending on the manifest category k, the differences between manifest categories are qualitative. Both the indicator locations (bik) and the indicator discriminations (aik) are allowed to be category-specific. Because individual differences among persons, as expressed in qpk, are allowed, the manifest categories are heterogeneous. A special case is one with category-specific locations but common discriminations. Note that this type of differences would not be identified when factorial equivalence in the limited sense was the only criterion used to detect qualitative differences. In this case, the locations of the indicators are category-dependent but their discriminative power is not category-dependent.
2. In the second type of latent structure (corresponding to the upper right cell in Table 1), the latent dimensions are quantitatively different depending on the manifest category, and the persons are heterogeneous within manifest categories. An example would be the categorization into a professional and a nonprofessional category of athletes within the same sport. One can expect that both professionals and non-professionals differ in how well they perform at various contests, but the professionals would be clearly better overall than the non-professionals. These manifest categories would be within-sport categories with performance indicators.
The second type of latent structure differs from the first in only one respect: For any pair of manifest categories, k № k', the location of the manifest categories may differ only along a common underlying dimension. As a result of the absence of qualitative differences, all b's and all a's of each of the indicators are equal over manifest categories: bi1 = … = bik = ... = biK = bi , and ai1 = ... = aik = ... = aiK = ai. The second type can be formulated as follows:
hpik = aiqpk - bi (7)
with bi denoting the common location parameters, with ai denoting the common discrimination parameters, and with mqk № mqk', for k № k'.
Depending on how the manifest categories are distributed along the dimension, the differences between the manifest categories may be abrupt or smooth. There is no clear-cut criterion to distinguish between smoothness and abruptness, but two criteria that are often associated with abrupt differences are lack of overlap and bimodality. As discussed above, these criteria are less straightforward than one might think.
First, much depends on the kind of distribution one wants to assume for the two manifest categories. For example, lack of overlap can also look perfectly smooth, as when persons within each manifest category are distributed uniformly and the two distributions touch but do not overlap. Second, much depends on whether one looks at the manifest level or the latent level. For example, Grayson (1987) showed that, depending on the discriminations and on the locations of the indicators, it is possible for a bimodal distribution of sum scores to result from a unimodal distribution of person locations (q's). Although Grayson did not demonstrate the opposite--how a unimodal distribution of sum scores can result from a bimodal distribution of person locations (q's)--this is possible as well. All depends on the locations and discriminations.
3. In the third type of latent structure (corresponding to the lower left cell in Table 1), the latent dimensions are qualitatively different depending on the manifest category, and the persons are homogeneous within manifest categories. This type of latent structure differs from the first in only one respect: the manifest categories are homogeneous in their latent structure. An example would be the categories of athletes defined on the basis of their knowledge of the basic rules of the sport they practice. Within each category there is homogeneous knowledge of the basic rules (they all know the basic rules), although when questioned one may give a wrong answer now and then. The differences between the categories are qualitative in that the athletes differ in the kind of rules they know depending on the sport they practice. These categories are between-sport categories with rule knowledge indicators.
As a result of the homogeneity restriction, all q's within the same category are equal: qpk = qp'k = qk for all pairs of persons p and p' and for all values of k. In this type, the manifest categories do not have any dimension-like character: they are qualitatively different between categories and perfectly homogeneous. There is still an ordering possible for the indicators, but this means nothing more than that the probability of a certain response for a given indicator is different than the probability for other indicators. Note that when there are no individual differences within a manifest category, there is no longer any basis for using a discrimination parameter. The third type can therefore be formulated as follows:
hpik = qk - bik (8)
where for any pair of manifest categories, k № k', the bik may differ from the bik' , with qk denoting the location of all persons p with Cp = k.
4. In the fourth type of latent structure (corresponding to the lower right cell in Table 1), the latent dimensions are quantitatively different depending on the manifest category, and the persons are homogeneous within manifest categories. This type of latent structure differs from the first in two respects: the manifest categories are homogeneous (like the third type), and the differences between the manifest categories are quantitative (like the second type). An example would be the categories of persons who do versus do not play chess. Those who play chess would know all the basic rules, and those who do not would also be rather homogeneous in their lack of knowledge. They may guess and be correct on some of the rules, but no major systematic differences would exist. So the difference between the two categories is quantitative. The former category simply has a much higher knowledge than the latter. These categories are within-sport categories with rule knowledge indicators.
As a result, all person locations (q's) within the same manifest category are equal: qpk = qp'k = qk for all pairs of persons p and p', and for all values of k, as in the third type; all indicator locations (b's) are also equal: bi1 = ... = bik = ... = biK = bi. Again there is no basis for using a discrimination parameter. In this fourth type, homogeneous manifest categories are located within a latent dimension. The fourth type can be formulated as follows:
hpik = qk - bi (9)
with bi denoting the common location parameters.
The only type of latent structure that is thoroughly category-like is the homogeneous qualitative difference structure (Type 3). All other structures are at least partly dimension-like. The heterogeneous quantitative difference structures (Type 2) are thoroughly dimension-like if the differences between manifest categories are smooth. If the differences between manifest categories are abrupt, meaning that each manifest category has a distribution that is different enough along the single latent dimension, then heterogeneous quantitative differences are a hybrid structure. The second type of hybrid structure is the heterogeneous qualitative difference structure (Type 1), which is category-like because the differences between manifest categories are qualitative yet is also dimension-like because there is heterogeneity of persons within a descriptive dimension for each manifest category. The third type of hybrid structure is the homogeneous quantitative difference structure (Type 4), which is category-like because there is homogeneity of persons within the manifest categories yet is also dimension-like because the manifest categories differ as to their locations on a common descriptive dimension.
The features that define what it means to be category-like versus dimension-like can be realized to a stronger or weaker degree. That is, within-category homogeneity, between-category qualitative differences, and abrupt between-category differences can be small or large. First, for within-category homogeneity to be small or large means that the within-category variance is small or large, respectively. Second, what it means for between-category differences to be small or large is simple where the differences are quantitative: Small versus large differences correspond, respectively, to small versus large Cohen's d values (a standardized effect size measure), given that the distributions are normal (or symmetrical). The extent of qualitative differences is more complex. Qualitative differences between manifest categories are complex when they are not restricted to a few indicators or to a few principles. For locations, qualitative differences can be thought of as jumps of indicators on the descriptive dimension when going from one manifest category to another. When the jumps can be summarized with a few saltus parameters (when only a few indicators jump, or when groups of indicators each jump over the same distance), the differences are simple. When many saltus parameters are required, the differences are complex. Third, whether abrupt differences are small or large depends on the size of the two previous types of heterogeneity and on the distributional properties (e.g., bimodality, degree of overlap). We stress here that the differences between the four types of latent structures are gradual and not absolute. This is completely in line with the overall idea behind the framework that being categorical is not itself categorical.
For the within-category aspects as well as for the between-category aspects, the methodology is necessarily relative, even when the "truth" would be absolute, because the methodology is always limited. The results depend on the choice of indicators and of alternative manifest categories. In order to study within-category homogeneity versus heterogeneity, a set of indicators is needed. For personality disorder categories, symptoms are an evident choice, but a difficult issue is how one can make sure that all relevant symptoms are included. For other types of manifest categories, it is often less evident of what kind the indicators should be. The choice of indicators (features) is also a difficult issue in the cognitive study of concepts and categories (Murphy, 2002, pp. 45-46). In general, one can never be certain whether the crucial indicators are included in the study. On the other hand, an a fortiori type of reasoning applies. If, for the indicators that are chosen, within-category heterogeneity is found, then one can conclude against homogeneity. The a fortiori argument is that the manifest category will remain heterogeneous when other indicators are added. If, however, a manifest category turns out to be homogeneous, then the conclusion can change if other indicators are added, given that these new indicators may reveal the heterogeneous nature of the manifest category.
For the between-category aspects, the conclusions may also depend on the choices one makes. The latent structure may be category-like in contrast with one alternative manifest category (for example, borderlines in contrast with normals) but not in contrast with a second alternative manifest category (for example, borderlines in contrast with histrionics). When there is more than one manifest category, another complication is which indicators one should consider: indicators of one of the manifest categories (and of which one?) or indicators of all manifest categories. For example, one may investigate the manifest categories "borderlines versus histrionics" with a set of borderline symptoms as indicators, with a set of histrionic symptoms as indicators, or with a set comprised of both. The result may depend on which set of indicators is being used.
One cannot give an absolute answer to the general question whether the borderline personality disorder is category-like or dimension-like, because the answer may depend on the methodology: the indicators and the other groups one wants to consider (e.g., normals? which other personality disorders?). Thus, being category-like is not only a matter of degree, it is also relational. If only one manifest category is considered in isolation, then the relational character is less evident, because as soon as heterogeneity is found, the conclusion is that the manifest category is heterogeneous. If, however, a manifest category is studied in the context of other manifest categories, then the position on the horizontal axis of the framework depends on which other manifest categories are considered. It may turn out that a diagnosis A shows qualitative differences with diagnosis B, but only quantitative differences with diagnosis C. A general conclusion is not possible in that case, only a relative one: In relation to diagnosis B, diagnosis A shows qualitative differences, but not in relation to diagnosis C.
The horizontal axis: Quantitative differences versus qualitative differences. Qualitative differences can be of two types: differences in discrimination and differences in location. Discrimination equivalence and location equivalence are two ways in which qualitative differences can be lacking and are restrictions on Equation 6 (or Equation A1 in the Appendix). The following order of analyses is proposed. First, the general model of Equation 6 as parameterized in Equation A1 of the Appendix will be estimated without any restrictions. This model is called QUAL1&2-HET, because it describes qualitative differences of both types (QUAL1&2), and because the manifest categories are allowed to be heterogeneous. Second, the discriminations are restricted to be equal over the manifest categories (discrimination equivalence), yielding model QUAL2-HET, because qualitative differences are allowed only for the locations. The QUAL1&2-HET and QUAL2-HET models are variants of a Type 1 structure. Third, the locations are restricted to be equal over manifest categories (location equivalence), yielding model QUAN-HET, because only quantitative differences remain. The QUAN-HET model is a Type 2 model.
Heterogeneous quantitative differences (Type 2) are nested within heterogeneous qualitative differences (Type 1), and within Type 1, QUAL2-HET is nested within QUAL1&2-HET. We have chosen to estimate three models of decreasing complexity: QUAL1&2-HET, QUAL2-HET, and QUAN-HET, omitting the fourth possible model: QUAL1-HET, a model with location equivalence without discrimination equivalence. We believe it makes sense to restrict the discriminations first, because their estimation is less reliable than the estimation of the locations. Models QUAL1-HET (which will not be tested) and QUAL2-HET are not nested in one another.
In case of within-category homogeneity, the same logic applies, but now for homogeneous models (HOM instead of HET). The homogeneous models parallel their heterogeneous equivalents with respect to which models are nested in one another. The QUAL1&2-HOM and QUAL2-HOM models are variants of the Type 3 structure, whereas QUAN-HOM corresponds to Type 4.
The vertical axis: Heterogeneity versus homogeneity. As for the investigation of heterogeneity versus homogeneity, with the restriction of qpk to have zero variance for all values of k, models that parallel the heterogeneous ones are obtained: QUAL1&2-HOM, QUAL2-HOM, and QUAN-HOM. The homogeneous models are nested within their heterogeneous counterparts. Homogeneous qualitative differences (Type 3) are nested within heterogeneous qualitative differences (Type 1), and homogeneous quantitative differences (Type 4) are nested within heterogeneous quantitative differences (Type 2). Note that it is possible that one of the manifest categories is homogeneous and the other is not. This is not a serious complication, as it would mean for example that s21 № 0, whereas s22 = 0, which is a less severe restriction than when both variances are restricted to zero.
In a preliminary and exploratory investigation of heterogeneity, one can use an internal consistency index, Cronbach's a, in each manifest category. High values of this coefficient are an indication of heterogeneity. Low values, however, can have two, possibly combined, causes: low heterogeneity and multidimensionality. Cronbach's a can be tested for statistical significance and thus can also be used in a hypothesis-testing approach.
Smooth versus abrupt differences. To distinguish within the top right cell of Figure 1 between smooth versus abrupt differences, we will simply plot the distributions of the q in the different manifest categories in order to inspect the joint distribution for multimodality. One should not just look at the plots without also taking the model estimation into account, however, because what appears as smooth may actually be a discrete latent process, as will be illustrated in the second application.
Simple versus complex qualitative differences. To distinguish within the top left cell of Table 1 between simple versus complex qualitative differences, we will investigate whether the lack of location equivalence can be reduced to a few saltus parameters. In principle this method could be followed for discriminations as well as for locations, although it was originally formulated for locations, but it turned out that in our applications discrimination equivalence was a tenable assumption.
Statistical approaches to testing. A first aspect of testing is whether a model fits the data in an absolute sense, independently of a comparison with other models. We will follow two approaches to deal with this problem. In one application, a bootstrap method is used, and in the other applications, a Pearson c2-test is used for an equivalent conditional maximum-likelihood (CML) formulation of the selected model, because the CML framework has nicer statistical properties when it comes to testing absolute goodness of fit (Glas, 1988). Given that the issue here is to select the best-fitting model in order to identify the most appropriate latent structure (Type 1, 2, 3, 4), the absolute goodness of fit is less important than the relative goodness of fit. Second, a broad range of methods is available to test relative goodness of fit. The first kind of test is the likelihood-ratio test. This test is based on –2logL (L for likelihood), also called the deviance. The test compares the deviance value of two models, one of which is nested into the other. The difference of the two deviances is c2-distributed with a number of degrees of freedom equal to the reduction in the number of parameters of the nested model. Unfortunately, this test is no longer valid if one or more of the restrictions includes a boundary value, such as a variance that is fixed to zero. If the test is used nevertheless to test such zero-variance models, then the result is conservative (Verbeke & Molenberghs, 2000). We will use the conservative test as it did not make a difference in the applications whether the correct or the conservative test was used. The regular likelihood-ratio test can be used for the horizontal axis, but for the vertical axis (to distinguish between heterogeneity and homogeneity), the conservative test will be used. The regular likelihood-ratio test can also be used to test simple versus complex qualitative differences, because the saltus models are a reduced form of general qualitative differences.
An important problem with model selection is that the more complex models by definition have a higher chance to fit the data, whereas the simpler models are more parsimonious. A good balance of the two qualities is desirable. This explains the popularity of so-called information criteria. The Akaike information criterion (AIC; Akaike, 1973) or Bayesian information criterion (BIC; Schwartz, 1978) can be used to compare models while taking their complexity into account. Both the AIC and BIC penalize models for a higher numbers of parameters, the penalization being more severe in the BIC, because it increases with the log of the number of persons, so that the BIC tends to favor the simpler models more than the AIC, especially for a large sample size. For both the AIC and BIC, lower values indicate better model fit.
It is also possible to test individual parameter values against their null hypothesis value using Wald tests--dividing a parameter estimate by its standard error. The resulting statistic follows a t-distribution, but for a high number of observations (as in our applications) it can be interpreted as a z-distribution (as asymptotically it is). Like the likelihood ratio test, Wald tests are conservative for the null hypothesis of zero variance.
As to the differentiation of the various types of structure, one should realize that this is an issue that is not specific to our approach, given that we use only extant item response models. A complicating factor is that the difference between the various structures is gradual, as we have explained, so that by definition the differentiation power will be small when the differences are small. Nevertheless, we have conducted a simulation study with 40 to 80 data sets per type of differentiation (Hidegkuti & De Boeck, 2004), and it was found that for the likelihood ratio test, AIC, and BIC, the differentiation power was very good for all but two differentiations, even when small data sets were used (2 x 100 respondents and just 10 indicators). The two more problematic cases were the following. (a) The 1PL model was preferred in about 35% of the cases when actually the 2PL model was the true model. (b) Discrimination equivalence was preferred over lack of discrimination equivalence in 30% of the cases in which the true model violated the equivalence. The two differentiations in the other direction did not yield any problem. When the standard deviation of the degree of discrimination was raised from .10 to .25 (so that the 1PL model was violated to a larger extent, and the lack of discrimination equivalence was stronger), however, these two differentiations were no longer problematic. We also performed a specific simulation study for the second application, because the difference between within-category homogeneity and within-category heterogeneity cannot always be distinguished by visual inspection of the histogram. The results will be reported with the second application, and they confirm the differentiation power of our modeling approach. Finally, Mislevy and Wilson (1996) also reported simulation results regarding the saltus model.
The software available for model estimation with MML includes general statistical software for nonlinear mixed models--for example, SAS PROC NLMIXED (SAS Institute Inc., 1999)--and IRT-specific software such as BILOG (Mislevy, & Bock, 1989), MULTILOG (Thissen, 1997), and CONQUEST (Wu, Adams, & Wilson, 1998). An alternative is loglinear modeling, which uses CML estimation of indicator parameters. Using CML, the general program LOGIMO (Kelderman & Steen, 1993) can perform IRT loglinear analyses. The IRT-specific program OPLM (Verhelst, Glas, & Verstralen, 1994) is also based on CML. Both programs allow for a priori differences in indicator discriminations but not for the estimation of discrimination parameters. In the absence of a theory that specifies discrimination values a priori, such methods as pre-exploring the data (OPLM includes a subroutine for this purpose) could result in good approximate discrimination values.
Given that SAS is a widely used software package, we will use SAS PROC NLMIXED (SAS Institute, 1999). This procedure was developed for nonlinear mixed models (McCulloch & Searle, 2001). The IRT models we described are of this type (Rijmen, Tuerlinckx, De Boeck, & Kuppens, 2003). Our models are nonlinear in two ways: because of a nonlinear link function (e.g. a logistic function or a normal-ogive function), and because they are not linear in the parameters, as when products of parameters appear in the model (as in aikqpk). The models are mixed because they contain fixed effect parameters as well as random effect parameters. The a's, b's, and g's are fixed effect parameters in that they do not vary at random over individuals, but qpk is a random parameter. The nonlinear mixed models are generalizations of linear regression models. SAS provides not just the logistic variants of the models but also the normal-ogive variants, so that the factor-analytic versions of the models can also be estimated. It is shown in the Appendix how the estimation of models based on Equation A1 can be set up in SAS PROC NLMIXED. On can also consult Appendixes A and B in Rijmen et al. (2003).
1. The first extension is to allow for multidimensionality within manifest categories. This requires that qpk be given a dimension index: qpkr , r = 1, ..., R. Note that as presented the framework already allows for multidimensionality between manifest categories (such a structure would fall on the left side of Figure 1). In order to deal with multidimensionality within manifest categories, one either assigns indicators to specific dimensions, or one estimates the discriminations of indicators on each dimension (using dimension-specific weights, aikr, with r indicating the dimension: r = 1, ..., R). In the latter case, the problem of unreliable estimates of discriminations becomes serious, because there are now K sets of discriminations per manifest category, and possibly K x R sets for the total. We are not interested in the exact values of the a's, however, but in the test of whether the equality constraint on the a's makes a difference.
2. The second extension is to allow for polytomous indicators (instead of only binary indicators). Although several models for polytomous variables can be incorporated into the framework, robustness of estimation is improved when the structure of the indicator response categories is constrained. For example, in the rating scale model (Andrich, 1978), the steps from one category to another do not depend on the indicator, but in the partial credit model (Masters, 1982), a different location is specified for each response option within each indicator.
3. The third extension is to allow for latent categories (instead of only manifest categories). Latent categories cannot simply be identified on the basis of manifest variables. This extension implies a reformulation of the models in terms of latent classes (Mislevy & Wilson, 1996; Rost, 1990, 1991; Wilson, 1989). The latent classes do not necessarily correspond to the manifest categories--that is, the latent classes approach does not guarantee that the categorical variable of interest will emerge. Consequently, issues regarding the manifest categories cannot be dealt with directly. Furthermore, because latent classes are not defined a priori, they require interpretation before they can be labeled. A generalized approach to formulating such problems was described by Pirolli and Wilson (1998). As will be discussed further, the well-known taxometric approach is directed to latent categories while concentrated on mainly one feature of our framework.
Except for the latent class extension, the extended models can in principle be estimated with SAS PROC NLMIXED, but in practice a model with a high dimensionality will prove difficult to estimate. Other IRT software is also available, but it would lead us too far afield to give an overview, and high dimensionality is also a problem for those programs. We will not dwell on Bayesian methods (e.g., Beguin & Glas, 2001; Janssen et al., 2000), because they are not broadly accessible to researchers in psychology, and because for high dimensionalities they require a large sample size.
A second method for distinguishing qualitative differences from quantitative differences is checking factorial equivalence in its limited sense across manifest categories. If, in different manifest categories, the same factor loadings are found, then it is concluded that the latent structure is dimension-like. This method has been applied quite often in the study of personality disorders, with the result that a dimension-like structure seems appropriate (e.g., Livesley & Schroeder, 1990; Livesley, Schroeder, Jackson, & Jang, 1994; Tyrer & Alexander, 1979). From the approach we have developed, however, it is clear that factorial equivalence in its limited sense is important but also that it is only half of the story. Strict factorial equivalence as defined by Meredith (1993) is required.
A third method for distinguishing qualitative differences from quantitative differences is the taxometric approach developed by Meehl (1973, 1995). Although the underlying model is not based on manifest categories, data from persons belonging to different manifest categories are often used in its application. Taxometric methods have been applied to many psychological variables, including borderline personality disorder (e.g., Trull, Widiger, & Guthrie, 1990), dissociation (e.g., Waller, Putnam, & Carlson, 1996; Waller & Ross, 1997), worry (e.g., A. M. Ruscio, Borkovec, & Ruscio, 2001), depression (e.g., Haslam & Beck, 1994; A. M. Ruscio & Ruscio, 2002; J. Ruscio & Ruscio, 2000), sexual orientation (e.g., Gangestad, Bailey, & Martin, 2000; Haslam, 1997), and personality (e.g., Gangestad & Snyder, 1985, 1991; Strube, 1989). The main findings were summarized by Haslam and Kim (2002), who concluded that several psychopathological variables are "taxonic" (the term used in taxometrics for category-like), such as schizotypy and the antisocial personality disorder, whereas other variables are "nontaxonic" (dimension-like), such as depression. As for personality variables, Type A personality seems taxonic, whereas the five-factor model traits and the Jungian traits seem nontaxonic.
The taxometric method called MAXCOV (Waller & Meehl, 1998) is based on two assumptions: (a) between latent categories, the indicators are correlated, and (b) within latent categories, the indicators are not correlated. Suppose now that there are two latent categories represented in a sample and that they have an overall effect on the indicators. Then, as a consequence of the two assumptions, the sum of the indicators can be a good indicator of category membership. Persons with high sum scores will mostly belong to one category, and persons with low sum scores will mostly belong to the other category. On the other hand, persons with moderate sum scores can come from both categories. Therefore, it is expected that the covariance between pairs of indicators will show a curvilinear relation with the sum score of the remaining indicators. In practice, the sum score is divided into intervals, and the covariances are determined for pairs of indicators within each interval. The interval with the maximum covariance (MAXCOV) is called the HITMAX interval. If the curve is flat, then the conclusion is that the latent structure is not category-like but dimension-like. Note that in correspondence with the distinction that was made earlier, the manifest categories do not play any role in the method, except to determine the samples.
Taxometric methods were later extended from a pairwise approach to a multivariate approach. Either the first eigenvalue in a principal components analysis is used as a criterion instead of the covariance between pairs of indicators (the MAXEIG method) (Waller & Meehl, 1998), or the distribution of factor scores on the first factor is checked for multimodality (the L-Mode method) (Waller & Meehl, 1998). Waller and Meehl (1998) showed that the MAXCOV, MAXEIG, and L-Mode methods are formally equivalent for the case of homogeneous taxa.
The taxometric approach focuses on whether taxa are homogeneous, which corresponds to the lower portion of Table 1 (i.e., the types that show within-category homogeneity). Within-category homogeneity is called an auxiliary assumption (Waller & Meehl, 1998, p. 17), because it is an ideal situation; nevertheless, simulation studies have shown that violations of this assumption can occur without detrimental effects for the approach (Beauchaine & Beauchaine, 2002; Waller & Meehl, 1998). Moderate correlations within categories (i.e., moderate within-category heterogeneity) do not hamper the application and power of the taxometric approach (Meehl & Golden, 1982). Large correlations within categories (i.e., large within-category heterogeneity) can be handled using an extension of the MAXCOV approach (Meehl, 1995). Still, the basic idea is that categories are relatively homogeneous by comparison with the between-category differences. Concentrating on relative homogeneity as the concept of category-likeness is reasonable; homogeneity is also a basic assumption in the latent class model. Two other features of the taxometric approach are its limitation to applications in which only two categories are investigated (Beauchaine & Beauchaine, 2002) and that it is less appropriate for binary data (Miller, 1996; Ruscio, 2000). According to Haslam and Kim (2002), about half of the studies to date make use of dichotomous indicators, and they concluded that taxometric methods are valid for dichotomous indicators as well, but they cautioned that large sample sizes are required, and they recommended that "researchers should use continuous indicators whenever possible, but not shrink from using dichotomous indicators when there is no alternative" (p. 306). This recommendation contrasts with the fact that Dimcat applies equally well to dichotomous and polytomous indicators.
In sum, various methods relate to our approach, and each stresses one aspect of category-like structure. They are either based on an underlying concept of category-likeness as showing abrupt between-category differences (multimodality), discrimination equivalence (factorial equivalence in its limited sense), or relative homogeneity of latent categories along a latent dimension (MAXCOV). Implicit in all of these approaches is the assumption of a mainly monothetic definition of category-likeness (but see the earlier quotation from Waller and Meehl [1998, p. 9]). The difference with our approach is that we explicitly include all of these aspects of category-likeness within a broader framework, one for manifest categories (in contrast with taxometrics). A category can be category-like in different ways, and a dimension can also be dimension-like in different ways. In this polythetic definition of category-likeness, being category-like is both complex and a matter of degree.
Psychiatric diagnosis has come to rely primarily on matching of features on a list provided by the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; American Psychiatric Association, 1994). The syndromes defined by such features are supposed to be atheoretical and purely descriptive. The categories of the DSM-IV are not categories in the classical sense defined by singly necessary and jointly sufficient criteria; rather, they are more akin to prototypes, because they are defined by showing a certain number of features from a list, with each feature typically being equally weighted.
Researchers have shown how a prototype approach can be applied directly to the classification of psychopathology. The prototype view has been contrasted with the classical view of psychiatric diagnosis (Cantor, Smith, French, & Mezzich, 1980). For example, a prototype approach has been applied to the classification of borderline personality disorder (Clarkin, Widiger, Frances, Hurt, & Gilmore, 1983). Indeed, the concept of mental disorder itself has been speculated to comprise a prototype (Lilienfeld & Marino, 1995).
The DSM-IV (APA, 1994) reflects a revision such that diagnosis is based on showing a critical number of symptoms from a list, independently of the specific symptoms shown. This approach allows for heterogeneous symptom patterns, on the condition that they come from the list of symptoms associated with the disorder. The DSM-IV authors did not go so far as to reject the idea of categories altogether. One may wonder what is the basis for resistance against giving up the notion of personality disorder categories altogether. The resistance may be inspired by a cognitive bias toward thinking in categories, which may lead some to feel that categories of personality disorders tally with their experience of reality. Social psychology has a tradition of theories based on the assumption that people tend to categorize other people (e.g., Tajfel, 1981), and this is also the view in cognitive psychology (e.g., Smith & Medin, 1981). This argument has been invoked by Beauchaine and Waters (2003) to cast doubt on methods that are based on ratings.
The issue of whether disorders are category-like or dimension-like has become a topic of research and debate. A large majority of studies reject the categorical view in favor of the dimensional view. Three main empirical arguments have been presented for the dimensional view of personality disorders. First, personality disorders do not show bimodality (e.g., Kass et al., 1985; Nestadt et al. 1991; Zimmerman & Coryell, 1990). Second, personality disorders show factorial equivalence in its limited sense (e.g., Livesley et al., 1992). Third, personality disorders do not show relative homogeneity as derived from the MAXCOV methodology (e.g., Trull et al., 1990).
The issue we are raising is also deeper than the formal issue of whether one should treat personality disorders as category-like, dimension-like, or some combination. If substantial qualitative differences exist, then the meaning of a symptom differs depending on the group to which a person belongs. Thus, the issue has consequences for both the theory and assessment of symptoms and syndromes of psychopathology. A consequence for diagnostic purposes is that a simple score based on symptoms, such as a sum score, can no longer be compared from one group to another.
In the present study (based on Maesschalck, 1998), we focused on borderline personality disorder (BPD) as compared with two other personality disorders of Cluster B (the dramatic, erratic cluster): histrionic personality disorder (HPD) and antisocial personality disorder (APD). These three disorders were compared with respect to the DSM-IV symptoms for BPD. In this connection, we noted earlier that one aspect of the dimension/category issue is relativity to the groups compared.
Some words of caution are needed to see the study in the correct light. First, we used a particular selection of indicators, and the results may depend on the indicators considered. This is a basic feature of our approach and of all other current approaches. This is what we meant by deeming our approach "relational." Second, we used ratings by clinicians. Ratings do not necessarily reflect the truth. Because we relied on ratings and classifications, we should be aware that the ratings and the diagnosis are not perfectly reliable, although they come from experts. The less than perfect reliability may lead to a larger heterogeneity within the categories. We do not intend to investigate the true disorder categories, however, but the assigned disorder categories instead (i.e., manifest categories). This is of interest because most category-like variables in psychology are manifest. As a consequence, our conclusions must be seen as being based on categories that are assigned by experts, and we cannot claim more than cognitive relevance of the results. This brings us to the approach we took in the introduction when we described the cognitive approach to categories. Third, the manifest categories are not mutually exclusive. In psychopathology, overlap is called comorbidity. In our study we did not include patients with a multiple diagnosis for several reasons: (a) Overlap creates new manifest categories, so-called conjunctive categories (comprising patients with multiple diagnoses), and their structure is quite complex (Storms, De Boeck, Hampton, & Van Mechelen, 1999), so that it seems reasonable to start with pure manifest categories; (b) the inclusion of multiple diagnoses may confound the results in a way that cannot be detected, because there are not enough cases of each different multiple diagnosis; (c) it is not without importance to investigate the latent structure of pure manifest categories, because they reflect the disorder in an unconfounded way. As a consequence, we will not be able to generalize our results to the whole categories of the three diagnoses, but we believe that just as in a psychological experiment it may be of interest to create pure conditions.
Manifest categories. Axis I and II diagnoses were made in the three weeks after first admission or consultation, by one or more diagnosticians, usually including a senior psychiatrist. These diagnoses, which defined the manifest categories, were instances of expert judgment.
Indicators. Each patient was also rated by a clinician other than those on the initial diagnostic team on a list of nine DSM-IV symptoms of BPD. The rating clinicians were unaware of the original diagnosis. Note that this methodological feature of the study favors its objectivity, but at the same time makes it less relevant from a cognitive perspective. In order to draw conclusions of a cognitive kind, it is to be preferred that the same persons rate the indicators and do the categorization. Symptom ratings were based on information from charts, staff meetings, and contacts between the clinician and the patient. The symptoms were presented in a random order to be judged on a 4-point scale from 0 (least severe) to 3 (most severe). In the instructions, scale points 0 and 1 were defined as non-pathological, whereas scale points 2 and 3 were defined as pathological. Responses were later dichotomized, such that 0 and 1 were recoded as 0 (less severe, non-pathological), and 2 and 3 were recoded as 1 (more severe, pathological).
Analyses. The full modeling approach was followed as explained earlier, making use of SAS PROC NLMIXED for the dichotomized data, as explained in the Appendix. The BDP group was used as a reference category. The locations and discriminations in the other two categories were expressed as deviations from those in the BPD category. In order to test absolute goodness of fit, we used a bootstrap approach (Efron & Tibshirani, 1993). One of the aspects investigated is how well the correlations between indicators within each group could be explained from the model. Because we used unidimensional models within each diagnostic group, this bootstrap of correlations is also a test on the undimensionality of the heterogeneous within-category structure. As mentioned earlier, Sanislow et al. (2002) presented a multidimensional model (but with extremely high correlations among the dimensions), so we wanted to make sure we did not have to expand our model to be multidimensional as well (within each of the manifest categories). Note that it is possible to find unidimensionality within manifest categories, although the single dimension is different depending on the manifest category, implying that for the total group the model is multidimensional. When the persons belong to different manifest categories and a joint analysis is performed, one can conclude that the structure is multidimensional, whereas in fact it is unidimensional within each manifest category. Such a result would be perfectly in agreement with a Type 1 structure.
Following the strategy presented in Figure 3 and explained in the section on Modeling, we began by investigating the nature of the between-category differences, based on three models. Using a likelihood-ratio test, it was found that the goodness of fit of the QUAL2-HET model was not statistically significantly worse than that of the QUAL1&2-HET model (c2 = 14.3, p > .10). This means we can assume discrimination equivalence. When the QUANT-HET model was compared with the QUAL2-HET model, however, it turned out that its goodness of fit was worse (c2 = 56.5, p < .001), which also implies that its goodness of fit was worse than that of the QUAL1&2-HET model. Therefore, we cannot conclude that we have location equivalence--there seemed to be qualitative differences in terms of locations between the manifest categories.
In order to identify the location differences, we inspected the deviations of the locations in the HPD and APD groups from the BPD group using the reparameterization with ai as a multiplication factor not just for q but for the whole logit of item i, and with effect coding for the location deviance parameters (see the Appendix for this reparameterization). Several of the deviation parameter estimates were statistically significant. In the HPD group, this was the case for the symptoms "affective instability" and "avoidance of abandonment," with estimates of -0.747 (t = 2.12, p < .05) and –1.121 (t = 2.87, p < .01), respectively. In the APD group, a significant location deviance was found for "chronic feelings of emptiness": -0.833, (t = 2.38, p < .05), but also for "avoidance of abandonment" a rather large but non-significant deviation was found (-0.599). For all four deviations, this implies that the corresponding symptoms were either easier in the HPD group or/and in the APD group. When for these four location differences one common saltus parameter was used and all other locations were assumed to be equal over the three groups, the resulting saltus model was not significantly worse than the QUAL2-HET model (c2 = 14.4, p > .10), implying that it is sufficient to limit the location differences to this one saltus parameter and two symptoms in each manifest category. This result is very similar to the one we obtained without the reparameterization, where only one fewer symptom was given a saltus parameter ("chronic feelings of emptiness" in the APD category), and the difference in goodness of fit with the QUAL2-HET model was slightly larger and significant. We also compared the models on the AIC and BIC criteria: the lower the values, the better the model. The AIC value of the saltus model was slightly lower than that of the QUAL2-HET model (2,921.1 versus 2,927.7), and its BIC value was clearly lower (2,995.4 versus 3,045.1), so that it can be considered a good approximation. It should also be mentioned that the AIC and BIC values of the QUAL1&2-HET model were 2,937.4 and 3,101.8, respectively, both higher than the corresponding values of the QUAL2-HET and QUAN-HET models.
When the model was further restricted to have zero variance within the three groups, the goodness of fit is dramatically lower following the likelihood-ratio test, which was conservative given the boundary value of the null hypothesis (c2 = 180, p < .001). Each of the variance estimates is highly significant in the QUAL2-HET model using a Wald test (which is also conservative in this case). Therefore we must conclude that the diagnostic groups were heterogeneous. This was corroborated by a statistically significant Cronbach a in the three groups: .49 for BPD, .61 for HPD, and .67 for APD (all p's < .01). Taking together the conclusions regarding the vertical and the horizontal axes, we end up with a Type 1 structure: between-category qualitative differences and within-category heterogeneity. A reasonably good saltus model was found, so that the qualitative differences can be considered rather simple.
We will now further explore the model that came out as the best, the QUAL2-HET model, a model with discrimination equivalence but not with location equivalence. This model implies a 2PL model within each diagnosis with equal discriminations between diagnoses. To test this model, we applied a bootstrap methodology. Starting from the parameter estimates, we generated 2,000 new data sets, and in each of these data sets the following statistics were derived: Pearson correlations (phi's) between the indicators within each diagnostic group (yielding 21 x 3 correlations), and differences in assigned symptom proportions for the HPD and APD groups in comparison with the BPD group as the reference group (7 x 2 differences). Of the 63 correlations only two fell outside the bootstrap-based .01 confidence interval, and three more fell outside the corresponding .05 confidence interval. This is a remarkably good result, from which it can be concluded that the model and also its unidimensionality within groups should not be rejected. The result was even better where the proportion differences were concerned. All 14 differences fell right in the middle of the confidence interval, implying that the model captured the location differences very well. Based on this bootstrap result, we can accept the QUAL2-HET model.
Apart from the crucial aspects of this model to decide on the type of latent structure (in this case Type 1), some other aspects of the model are of interest. First, the variances in the three groups differed. The variance in the BPD groups was fixed to 1.000 as an identification restriction, and the estimates in the other two groups were 1.292 (HPD) and 2.288 (APD). These differences were in agreement with the size order of the internal-consistency coefficients that were reported earlier. Larger variance typically means larger consistency. Second, HPD and APD were less borderline than BPD. The difference of HPD from BPD was –1.452 and the difference of APD from BPD was –2.748, and both difference estimates (on the q-scale) were statistically significantly different from zero (p < .001), meaning that overall group effects were statistically significant. The most borderline group was BPD, as expected, followed by HPD and APD.
Similar studies were conducted on the diagnoses of HPD and APD, using histrionic and antisocial symptom lists from the DSM-IV, respectively (Maesschalck, 1998). For HPD, the result was similar, in that only simple qualitative differences in location were found. For APD, however, the qualitative differences could not be reduced to a few saltus parameters; the pattern of APD indicator values was quite different among the diagnostic groups, as in the left panel of Figure 2.
This conclusion cannot be taken as an absolute, because of the restrictions we mentioned earlier. The data concern only a limited number of indicators, although very important ones, and they are based on ratings by clinicians. Because of the latter, our conclusion must primarily relate to the dimension-like versus category-like nature of judgments made by clinicians. As such the results can also be looked upon from the cognitive perspective on categories. The clinicians' category of borderline personality disorder (independently of whether it reflects the true state of affairs) is a manifest category with a latent continuum, with some BPD members being better members of the category than others. The result may have been cognitively induced, although the experts who rated the indicators were different from those who made the diagnosis. The structure within the manifest category is unidimensional, the stochastic variant of what has been called a triangular structure. The HPD and APD patients not only are less borderline but also show some slight qualitative differences, enough to conclude that BPD is category-like in at least one respect: that of qualitative between-category differences.
Our findings regarding BPD may not generalize to other categories of personality disorders as may be derived from taxometric studies (although they follow a different approach). For example, studies have found taxometric evidence for the taxonic nature of schizotypy (Golden & Meehl, 1979; Korfine & Lenzenweger, 1995; Lenzenweger, 1999; Lenzenweger & Korfine, 1992; Meehl, 1993) and of APD (Skilling, Quincey, & Craig, 2001), whereas the evidence is more equivocal for BPD, as concluded by Haslam and Kim (2002). This shows that being category-like may depend on the personality disorder, which was also the case for the data we used (Maesschalck, 1998) showing that APD was more category-like than BPD and HPD.
The phenomena we identified at the latent level can be considered endophenotypes. These refer to the phenotype but go deeper than the manifest indicators. When category-like, endophenotypes comprise natural kinds, non-arbitrary discontinuities; when dimension-like, they comprise equally non-arbitrary continuities. Haslam (2002) noted,
Of course, a discrete psychopathological kind might arise out of an essence-like cause such as a genetic abnormality (e.g., Down's syndrome) or germ (e.g., general paresis). However, other non-essentialist models are also possible, for example developmental polarization, non-linear interactions of vulnerability factors (e.g., emergenesis), and threshold effects.A continuous endophenotype, by contrast, is likely to result from divergent causes, such as polygenic influences, idiosyncratic environments, and "bad luck" (cf. Meehl, 1978). When an essence-like cause becomes known, an endophenotype becomes a closed concept, but, contrary to the essentialist beliefs of most laypersons (Haslam & Ernst, 2002), most endophenotypes in psychopathology (including category-like ones) have no essence-like cause and thus remain open concepts.
It is somewhat surprising that the identification of endophenotypes has not always been the primary concern in the classification of psychopathology. The operational approach was espoused in order to increase interjudge reliability. The consequent increase in reliability was purchased at the price of a decreased theoretical basis (e.g., Carson, 1991) and, more formally, a lack of interest in the latent structure. This contrasts with the explanatory approach in cognitive psychology, discussed earlier (e.g., Muphy & Medin, 1985), in which the glue that ties concepts together is a theory-based understanding of the world (e.g., Kim & Ahn, 2002). Although we did not investigate the theoretical basis of the diagnostic categories, we assessed the validity of several latent structure models of BPD. As far as the endophenotypes are concerned, we were able to find out what the BPD endophenotype is--not all aspects of it, but those related to the DSM-IV borderline indicators. Now the task remains of incorporating the open concept of the BPD endophenotype into a larger nomological network, including theories of its etiology, course, and treatment.
Beyond the question of the category-like versus dimension-like latent structure of psychiatric diagnoses, at least three controversial issues within psychopathology and treatment research could be addressed using Dimcat. The first issue is whether putatively distinct disorders are not really identical. Consider several examples on the border between Axis I and Axis II: avoidant personality disorder and social phobia, schizotypal personality disorder and schizophrenia, borderline personality disorder and mood disorders, antisocial personality disorder and substance use disorders, and depressive personality disorder and dysthymia (Endler & Kocovski, 2002; Widiger & Shea, 1991). Frances, Widiger, and Fyer (1990) noted,
it is rarely clear, when a given symptom serves as a defining feature of two different categories, whether the resulting overlap between them reflects the true state of the relationship or is an unnecessary artifact based on the choice of the identical definitional items in both sets. (p. 47)From the perspective of Dimcat, this question can be answered rather straightforwardly. A combined symptom list could be taken as indicators. Whether the symptoms overlap or not does not matter. If the disorders were qualitatively distinct, then they would obviously not be identical. If the disorders were only quantitatively distinct, then they would be identical if the difference between the distributions was of a magnitude considered pragmatically negligible.
The second issue is whether a psychiatric diagnosis can be adequately assessed by a self-report inventory. This issue has been debated with respect to using students who score highly on the Beck Depression Inventory as "analogs" of patients diagnosed with major depression (e.g., Coyne, 1994; Flett, Vredenburg, & Krames, 1997; Vredenburg, Flett, & Krames, 1993; A. M. Ruscio & Ruscio, 2002). From the perspective of Dimcat, this question can also be answered rather straightforwardly, taking the items in the inventory as indicators. If the diagnosis was qualitatively distinct from its absence, then the self-report inventory would not be an adequate representation of the diagnosis--it would mean that the inventory was measuring qualitatively distinct phenomena for persons with and without the diagnosis. If the diagnosis was only quantitatively distinct from its absence, however, then the latent dimension defined by the self-report inventory fulfills a necessary condition to be an adequate representation of the diagnosis.
The third issue is whether a stepped-care approach to treatment is appropriate. In a stepped-care approach, treatments are tailored to the level of severity of the disorder. Such an approach presupposes heterogeneity within the category of persons with a diagnosis. For example, a stepped-care approach for the treatment for DSM-IV nicotine dependence might be recommended, such that a stop-smoking pamphlet or telephone quitline might help some smokers, whereas others might require an antidepressant or extensive cognitive-behavioral treatment. This would make sense only if this manifest category were heterogeneous, with some nicotine dependent smokers higher on nicotine dependence than others.
We studied attitudes toward different types of crimes varying in the following characteristics: murder or other crimes, sexual or non-sexual crimes, and child or adult victim. A group of respondents was interviewed and asked whether, in their opinions, persons who committed the kind of crime in question should be considered for capital punishment if it were legal.
Our first interest was whether the attitudes were qualitatively distinct. This kind of question is not uncommon for attitude research. Eagly and Chaiken (1993), for example, asked whether the relation between liberalism and conservatism, which might seem opposite poles of a single dimension, was actually more complex. One explanation for the latter structure would be that the two groups differ in the values considered relevant to an issue. In the present context, these may concern the unconditional value of human life, the acceptability of revenge, and the seriousness of a crime. The criteria for seriousness of a crime may include taking someone else's life, sexual abuse, and vulnerability of the victim. Differences in these criteria should result in a qualitatively different scale for seriousness of a crime between groups in favor of and opposed to capital punishment.
Our second interest was whether the attitudes were heterogeneous. Only if the attitude groups were heterogeneous could within-category person differences be observed.
Our third interest is in the capacity of our approach to differentiate between a purely manifest continuum versus a latent continuum. The reason is that in this application it would not be a surprise if there were two clear-cut homogeneous attitudes in the latent structure. In the case of latent homogeneity, one can be misled by the heterogeneity that would show up not only in the sum scores but also in the estimates of individual q's. The crucial test, however, is not in the sum score or q estimate distributions that would result, but in (a) the likelihood ratio test to compare a heterogeneous with a homogeneous model, and (b) the test of each of the variances regarding their difference from zero. Therefore, we will set up a simulation study to investigate whether we can actually differentiate between categories being heterogeneous or homogenous in their latent structure, notwithstanding the expected heterogeneity in the sum scores and the estimated q's. This issue is also important because Haertel (1990) has shown that the 2PL model can be approached quite well with a latent class model.Our fourth interest was related to the study of cognitive categories. Because the data in this application were self-rating data, and because the rating of the indicators and the classification were both made by the same respondents, a cognitive approach to the categories seemed relevant. This offered us an opportunity to test the Generalized Context Model, a model for how people decide on a category (Nosofsky & Palmeri, 1997), because it focuses on classification into two categories, and because the respondents both classified themselves in two categories and made the indicator ratings. Let us assume that the respondents decided on whether they were in favor of or against the legalization of capital punishment from what they heard from others. For example, they heard what other people said about various crimes and how the criminals should be treated. These other people can be considered the exemplars of the learning set, before the respondents decided on the classification of their own opinion. The alternative to the exemplar theory is that the self-classification in the two legalization opinions is based on two prototypes.
Indicators. The interview consisted of 10 questions, 9 of which referred to the following crimes, in this order: (a) serial murder, (b) murder of one's whole family, (c) murder of a family member, (d) sex murder of an adult, (e) sex murder of a child, (f) robbery with murder of an adult, (g) robbery with murder of a child, (h) rape of an adult, and (i) rape of a child. For each crime, the question was whether the respondent would consider capital punishment appropriate if it were legal ("yes" or "no"). The tenth question was whether the respondent was for or against the legalization of capital punishment.
Manifest categories. Two manifest categories of attitudes were distinguished on the basis of the tenth question: one in favor of legalization, one against legalization. These categories were based on expert judgment, with respondents considered experts on their own attitudes.
Analyses. The main part of the analyses were again based on Dimcat. Because we experienced estimation problems with the more complex models, most likely due to the manifest distribution of the data, we based part of the analysis on a conditional maximum-likelihood (CML) approach using the OPLM program (Verhelst, et al., 1994).
Two additional analyses were run. First, for the simulation study we used two homogeneous categories with nine indicators and the following b's: .20, .20, .30, .15, .35, .30, .20, .35, and .25, and a .40 difference on the q-scale between the two categories (one q equals 0 and the other .40). Ten data sets were generated with these parameters and 300 persons in each category. The data were analyzed with the QUAN-HET model (with equal and unequal discriminations) and the QUAN-HOM model.
Second, in order to analyze the data following the Generalized Context Model (Nosofsky & Palmeri, 1997), we made the (arbitrary) choice to select the response patterns of 40 randomly sampled respondents of each group as the learning stimuli and the remaining response patterns as the test stimuli. This is as if the respondents first had been informed about 40 people's opinion (through daily life discussions) before they decided on their own attitude category (in favor of or against legalization) based on what they think of how the criminals should be treated. The procedure was repeated five times, each time with a randomly sampled learning subset from each group, and with the remaining respondents as the subset of test stimuli. The prototypes for the two categories were defined on an a priori basis. As the prototype for the pro-legalization category, we took the overall 1-pattern for all indicators, and as the prototype for the anti-legalization category, we took the overall 0-pattern for all indicators (the complement of the first prototype). To compare the prototype model to the exemplar-based model, the same five sets of test stimuli were used for the two models.
For both models, the nine binary indicators were used as nine binary features or dimensions. The maximum-likelihood-based analysis was performed with two different similarity functions, one with an exponential decay (q = 1), another with a Gaussian decay (q = 2), and with a city-block metric (because of the binary features and a better goodness of fit than the Euclidian metric). Eleven parameters were estimated for both models: c (an overall scaling parameter--the higher its value, the larger the weight of close similarities), b (response bias toward the category in favor of legalization), and nine indicator weights (eight of which were free parameters, given that their sum is one).
Next we estimated a QUAN-HET model with SAS PROC NLMIXED with the same fixed discrimination values (see Table 1) and also with location equivalence. The resulting deviance was 1,630.9, and the corresponding AIC and BIC values were 1,654.9 and 1,699.4, respectively. The deviance of this model was only slightly higher than that of the corresponding CML model (1,630.9 versus 1,625.9), so that the difference in distribution between the two approaches did not seem to play an important role in the goodness of fit for the QUAN-HET model. Note that the discriminations of the individual indicators cannot be estimated very reliably when the sample size is rather small. Because we will not interpret these individual discriminations, and because of the previous result, we constrained all discriminations to be equal within and between categories. Analogously, the q-estimates for individual persons would perhaps not be very reliable when only nine indicators are used, but again we concentrate on overall features, such as the parameters of the q-distribution(s). The result of the QUAN-HET model with equal discriminations was a deviance of 1,582.0, with corresponding AIC and BIC values of 1,606.0 and 1,650.4, respectively. From this result it seemed that equal discriminations were a good option when a normal distribution was assumed. Assuming equal discriminations for all indicators, we estimated a QUAL2-HET model in the next step, which is actually a step back in the order of testing. The resulting deviance was 1,576.7. Based on a likelihood-ratio test, this is not statistically significantly lower than the deviance of the QUAL2-HET model with equal discriminations (c2 = 5.30, p > .10). Accepting location differences between the two groups did not seem to pay off. Therefore, we continued with the QUAN-HET model with equal discriminations for all indicators as the reference model.
We tested this model against the QUAN-HOM model in order to make a choice along the vertical axis in Table 1. The resulting deviance was 2,133.1, and the corresponding conservative likelihood-ratio test was statistically significant (c2 = 551.1, p < .001). The conclusion must be that the QUAN-HET model was the better one and that the groups were heterogeneous.
From an inspection of the QUAN-HET parameter estimates, the two attitude groups seemed to differ in attitude level as well as in heterogeneity. When reporting the estimates, we mention the standard errors in parentheses. The estimate of the group effect on the latent continuum was –8.847 (.847), which was statistically significant. The group that was against capital punishment was located much lower on the attitude continuum than was the group that was for capital punishment. The variance of the two attitude groups was quite different: s2pro = 4.227 (.842), and s2anti = 17.391 (3.404). Both estimates were statistically significantly different from zero using the conservative Wald test for variances. This confirmed the earlier conclusion that the groups were heterogeneous. The difference between the two variances was also estimated (in a separate run). The result was 13.164 (3.444), which was statistically significant using a Wald test. The latent structure for the two groups seemed to be one with a relatively homogeneous group in favor that is rather far above a much more heterogeneous group against.
In order to have a better view on the latent distribution, a histogram of that distribution is shown in Figure 4, based on the estimated distribution parameters. Because the size of the groups clearly differs and may have a misleading visual effect, the histogram is constructed for groups with equal size (both n = 202, which is the size of the largest group). The distribution was clearly bimodal and corroborates the bimodality of sum scores. Because there was also a clear difference between the means of the two groups, the quantitative difference between the two attitude groups may be considered to be abrupt.
The data of the simulation study were first analyzed using equal discriminations (as they were generated and in conformity with the results on capital punishment), with the QUAN-HET and QUAN-HOM models. The results show smooth histograms without any gap for the sum scores as well as for the individual q-estimates from the QUAN-HET model with no category main effect. One might be misled to conclude that the latent structure of the categories is heterogeneous. Using a likelihood-ratio test, however, the QUAN-HOM model was never rejected against the QUAN-HET model independently of the category main effect (all p's > .10 and differences in the deviance statistic that are smaller than 1.5 ), and in none of the 10 data sets was the variance statistically significantly different from zero (all p's > .10). Similar results were obtained with the 2PL model. This result also means that we can differentiate between a model with homogenous classes and the 2PL (with or without a category main effect), so that our concern based on Haertel's (1990) study is met. The result of this small simulation study is reassuring for our approach. It shows how one can be misled by apparent heterogeneity in the sum scores and in the individual q-estimates if one does not use statistical tests for features of the latent structure. Consequently, the bimodal distribution in Figure 4 should be seen in the light of the statistical tests.
As for testing the exemplar model and the prototype model with q = 1, the means of the log likelihoods were 78.6 and 78.9, respectively, and for q = 2 the corresponding values were 76.5 and 78.9, respectively. (The value of q will not make a difference for the prototype model because of the way the prototypes were defined.) This means that the two models performed about equally well. For the prototype model, the c estimate varied between 1.98 and 9.54, whereas the corresponding values for the exemplar model were more extreme--from 6.43 to 15.97 for q = 2, and even more extreme for q = 1. High values of c mean that close similarities weighed much more heavily in determining the classification decision. The b estimates were found to be in line with the fact that the group in favor was larger than the group against. Finally, the weights were more stable (over the five runs) for the prototype model than for the exemplar model. The highest average weights in the prototype model were found for serial murder (.349), murder of one's whole family (.239), and rape of a child (.132) (the same for the two values of q).
If one were to invoke the bimodal distributions as evidence for the existence of a latent categorical structure, then one should realize that the bimodality is a relative criterion, namely the size of the main effect of the group factor. All other aspects of the latent structure are dimension-like. Because there is no definitive way to tell how large the absolute difference should be, nor how large Cohen's d should be, and because the bimodality follows from the size of Cohen's d, the bimodality is at best a relative criterion. That the two categories appear as heterogeneous is not an artefact and neither is it derived from the histogram in Figure 4. It is based instead on the result of a likelihood-ratio test and a test of the variances. The discriminative power of our approach was corroborated through the results of a small simulation study.
The conclusion that the structure is dimension-like (apart from the abrupt difference) needs a word of caution. First, one can imagine that indicators could be used other than the nine we studied. For the personality disorder categories, the selection of indicators (the symptoms) had a strong basis in the DSM-IV. For the attitudes toward capital punishment, the choice was less evident. Second, ratings were again used, but now they were self-ratings instead of ratings by experts. Given that the indicator ratings and the classifications were made by the same persons, the conclusions may reflect the cognitive construction of attitudes by the respondents.
As to the relevance of the cognitive models for our data, there is no way to compare the goodness of fit of the exemplar-based and prototype models with the nonlinear mixed models that we estimated. The purpose and the structure of the models are totally different. In Dimcat, the classification (the manifest category) is a predictor for the indicators, whereas in the cognitive models, the indicator data are the predictors for the classification. The structure of the cognitive models is also quite different--for example, because of the crucial role of similarities between exemplars or of exemplars with the prototype. There is no counterpart of this in the nonlinear mixed model family.
The fact that the performance of the prototype model is about as good as that of the exemplar-based model is remarkable. It would be of interest from a cognitive-psychological viewpoint to compare two types of categories: one with manifest heterogeneity but no internal structure, another with manifest heterogeneity and latent heterogeneity, in order to investigate whether the superiority of the exemplar-based model generalizes to dimension-like (heterogeneous) categories. As discussed earlier, within-category structure has been neglected thus far in the cognitive literature. Our results could inspire studies to investigate the effect of the within-category structure on the validity of the exemplar model and the prototype model.
Second, the multitask approach (K. W. Fischer, Pipp, & Bullock, 1984) was developed to relax the limitation that stages need to be homogeneous, in order to capture micro-sequences within the stages. K. W. Fischer et al. (1984) made an interesting distinction between first-order versus second-order discontinuity, a distinction similar to our distinction between simple versus complex qualitative differences. A first-order discontinuity is a sudden leap in performance (corresponding to quantitative differences on all relevant problems, one that is equal for all problems, whereas a second-order discontinuity is a discordant leap (corresponding to qualitative differences), one that is large for some problems but not for others . K. W. Fischer et al. (1984) accepted the probabilistic link between stages and solving problems, but did not use the idea for formal modeling.
Third, the ordered latent class model (Croon, 1990) can be used to relax the deterministic nature of the model (and of the stages). It provides an explicit probabilistic link between stages and performance on problems. Within-stage homogeneity is still assumed, as in the scalogram model, albeit homogeneity of a stochastic kind. Although the classes (stages) are ordered, they can show qualitative differences, because problem locations can differ across classes. Indeed, the problem locations must meet certain inequality restrictions for the classes to be ordered (see also Hoijtink & Molenaar, 1997). The ordered latent class model is situated between Type 3 and Type 4 from Table 1, but for latent categories.
In contrast with these three models, the saltus model combines a probabilistic view of stages, the assumption of within-stage heterogeneity, and the possibility of modeling certain between-stage qualitative differences. The saltus model has a special type of parameter to distinguish between first-order and second-order discontinuities, the d-parameters. A dkks' № 0 implies that, for stage k in comparison with stage k', performance on a subset s of problems differs from performance on the complementary subset of problems. Differences of this kind are qualitative, because differences between problem locations are not equivalent across stages. When no saltus parameters are required (the saltus parameters are zero) and the stage main effects suffice, the discontinuities are of the first-order type and quantitative. For a first-order discontinuity to occur, the distance between groups of persons on the latent dimension (which is also the proficiency scale) must be large--for example, without overlap. In sum, the saltus model lacks the limitations of the previous models, and it allows for the distinction between two kinds of discontinuities. Furthermore, the saltus model is a particular specification of a Type 1 model from Table 1.
Saltus parameters can capture how some problems become much easier relative to others as persons add to or reconceptualize their knowledge. Saltus parameters can also capture how some problems actually become harder as persons progress from an earlier stage to a more advanced stage, because they previously gave the correct answer but for the wrong reasons. There are two ways to apply the saltus model. One way (in which it was originally developed) is to assume that class membership is a latent variable estimated from the data--we will call this the latent saltus model (Mislevy & Wilson, 1996; Wilson, 1989). A second way is to assume that class membership is an observed variable that is given by, for example, segmentation or expert judgment--we will call this the manifest saltus model (G. Fischer, 1992; Wilson, 1993). The assumption of manifest class membership makes estimation of the model simpler, and it may make interpretation more straightforward, but it also involves certain limitations (Wilson, 1993).
Siegler (1981) investigated the rule assessment theory with three experimental problems involving proportionality: a balance-scale problem, a projection-of-shadows problem, and a probability problem. We will concentrate on the balance-scale problem. Using problem analysis and by reference to previous empirical and theoretical work, Siegler posited a series of rules that children might use in tackling the problem. A child using Rule I will not consider the distances of the weights from the fulcrum; to such a child, only the amounts of the weights matter (weight is the dominant dimension). A child using Rule II will consider the distances of the weights from the fulcrum only when the weights are the same (distance is the subordinate dimension); otherwise the child will consider only the amounts of the weights. A child using Rule III is aware of his or her lack of understanding of the behavior of the balance scale when both weights and distances vary, and will use a cognitive strategy such as guessing or taking cues from the experimenter. A child using Rule IV will compute torques on either side of the balance beam and choose accordingly; this computation can be executed either by actual calculation or "by eye."
To distinguish between persons at these four rule levels, Siegler (1981) designed six types of problem, of which we will present three: dominant problems (D), with unequal values on the dominant dimension (weight) and equal values on the subordinate dimension (distance); subordinate problems (S), with equal values on the dominant dimension (weight) and unequal values on the subordinate dimension (distance); and conflict-equal problems (CE), with unequal values on both dimensions but with the two sides balanced (see Figure 5).
The six problem types yield different profiles for the four rules, and this difference was the basis for Siegler's classification. For the three kinds of problems we described, the differentiation is as follows. Rule I differentiates between D problems and S problems, because D problems can be solved when exclusively the dominant dimension is used, but S problems cannot. Rule II differentiates between D or S problems and CE problems, because taking the subordinate dimension into account in the case of equality on the first dimension helps a person solve S problems but not CE problems. Rule III differentiates in a similar way, except that a person will guess on CE problems. Finally, Rule IV also will lead a person to guess on CE problems, because the combination of distance and weight on both sides yields a tie. The three problem types considered here permit the distinction between adjacent rule levels: D versus S (Rule I versus higher), and S versus CE (Rule II versus higher). The three stages are differentiated on the basis of the hypothesized distances in difficulty between D, S, and CE problems. Rule I children should show a large distance between D on the one hand, and S and CE on the other hand (D----S-CE), Rule II children should show a large distance between D and S on the one hand, and CE on the other hand (D-S----CE), and finally, Rule III and Rule IV children should show a smaller distance between D and S on the one hand, and CE on the other hand (D-S-CE).
Indicators. The presentation of analyses will be restricted to a comparison between two kinds of problems: D and S. Five D problems and five S problems were considered. Results were similar for comparisons between the other pair of consecutive problems (S and CE) and among all three problems (D, S, and CE).
Manifest categories. Students who scored 0 to 5 were assigned to the first stage (Rule I level), and those who scored 6 to 10 were assigned to the second stage (Rule II level). This method of defining manifest categories is an example of segmentation. More sophisticated methods of defining categories (e.g., using latent saltus class probabilities) can also be applied (Wilson, 1989).
Therefore, we tested three models: (a) a QUAL2-HET model with one task-independent overall degree of discrimination and with a person variance of one in both groups, (a) a QUAN-HET model with the same restrictions, and (c) a saltus model with a dS for the expected jump for the problems requiring that the subordinate dimension be used. The corresponding deviance values were 2,441.9, 2,729.9, and 2,448.5, respectively. The corresponding AIC and BIC values were 2,483.9 and 2,571.7 (QUAL2-HET), 2,753.9 and 2,804.1 (QUAN-HET), and 2,474.5 and 2,528.9 (saltus model), respectively. The likelihood-ratio test comparing the restricted QUAL2-HET with the restricted QUAN-HET was statistically significant: (c2 = 288.0, p < .001), but when the saltus model was compared with the QUAL2-HET model, the difference in goodness of fit was not statistically significant: (c2 = 6.6, p > .10). The saltus model seemed to capture all qualitative differences between the two groups. It was also the best model with respect to the AIC and BIC. The estimates of dS indicated the size of the jump of the S items for the Rule II group. The estimated jump from the Rule I to the Rule II level was -4.856 (.337), which was statistically significant given its standard error. The S problems were drastically easier at the Rule II level than at the Rule I level. No other differences were needed to approach the restricted QUAL2-HET model, so we concluded that the D tasks were equally easy for both groups.After the assessment of between-category differences, we tested for within-category differences, in line with the vertical axis of Dimcat. The saltus model with homogeneity yielded a deviance of 2,507.5, with AIC and BIC values of 2,531.5 and 2,609.3, respectively, and a statistically significant (conservative) likelihood-ratio test when compared with the corresponding model with heterogeneity (c2 = 59, p < .001). The goodness of fit could largely be improved, however, when the discrimination for the Rule I level was fixed to zero (implying homogeneity in one group). The resulting deviance of 2,428.6 was also better than that of the corresponding full heterogeneity model. The heterogeneous model was in fact the best model of all those that could be estimated with good results. The AIC and BIC values were 2,454.6 and 2,509.0, respectively. Because the overall discrimination for the Rule II group was statistically significant, 1.551 (.116), we concluded that there was homogeneity at the Rule I level, and heterogeneity at the Rule II level. This finding was interesting, because it was the first time among our three applications that a manifest category turned out to be homogeneous.
We replicated the comparisons above for the S and CE problems, and also for the D, S, and CE problems (in the latter case, using a segmentation that yielded three manifest categories when all three kinds of problems were analyzed). The results for D and S problems replicated the above results, meaning that the difference was again qualitative, and that again the manifest saltus model could explain this qualitative difference. For S and CE, one saltus parameter was again needed. To fit the data from the D, S, and CE problems, two saltus parameters were needed, one for the difference between D and S, and one for the difference between D and CE.
Another interesting finding is that the stages (or rule assessment classes) as defined by our segmentation rule are heterogeneous at the manifest level but not necessarily at the latent level. The Rule II stage seems to exhibit the micro-sequence phenomenon noted by K. W. Fischer et al. (1984), but the Rule I stage does not. A speculation to explain this result is that each stage shows the so-called micro-sequence phenomenon, implying within-stage quantitative development until a homogeneous end-state within the stage is reached, followed by a qualitative jump to the next stage, where again within-stage quantitative development occurs. The results can be explained by assuming that the Rule I students have reached the end-state of the Rule I level and that the other students are at different points of their quantitative development with respect to Rule II.
The second important result is that heterogeneity, even when captured by a descriptive dimension, does not necessarily imply that the manifest categories are only quantitatively different. To a small extent in the first and to a large extent in the third application, there was evidence for qualitative differences. Thinking of manifest categories as being dimension-like while still reflecting qualitative differences may seem contradictory, but as we have shown qualitative differences and heterogeneity relate to different features of what it means to be dimension-like. In this situation, the use of the saltus parameters gives us a way to describe qualitative differences for a dimension-like structure.
The third important result is that, when the differences are quantitative, the abruptness of the difference can be investigated at the latent level, so that one need not rely on the distribution of manifest variables, such as sum scores. In particular, in Application 2, where quantitative differences were found, the manifest distribution and the latent distribution were both clearly bimodal, but this correspondence is not guaranteed, as shown by Grayson (1987), and as corroborated in our simulation study, showing that a structure with two categories with latent homogeneity can generate a smooth distribution, albeit one that can be identified as an arttefact when the appropriate statistical tests are performed.
The fourth important result is that qualitative differences between manifest categories can sometimes be captured in a simple way. This is either because the qualitative differences are only minor (as in Application 1) or because a simple principle applies (as in Application 3). The latter is of special interest, because it allows one to test a theory of qualitative differences. In Application 3, the theory is Piaget's theory of cognitive development.
It is of interest to note that in our applications a large variety of latent structures were found, often with strong evidence against alternative structures. In all cases, we started from a rather simple manifest categorical variable, either based on expert judgment or on segmentation. The implication of our findings is that manifest categories can differ a lot in their underlying structure. Without an investigation such as we conducted, one would perhaps not be aware of the quite different underlying status of the categorical variable one is using.
The differences between the different types of structure we found often turned out to be quite drastic, in all cases when within-category homogeneity versus heterogeneity was considered, and also in the third application with respect to qualitative differences. Looked upon from this practical viewpoint, differentiating between the different types of structures was often not a problem. Although the issue of differentiating power remains an important one, it was shown in two simulation studies that, for the kinds of differentiation that are relevant in our applications, the modeling approach we followed has good differentiation power and that modeling can correctly differentiate what the eye cannot.
Our approach hinges on the indicators that are selected, on the method of observation (e.g., ratings), and on the alternative manifest categories. For the study of personality disorders, the selection of the indicators was rather self-evident, given that both the indicators (symptoms) and the manifest categories (diagnoses) were based on the DSM-IV. For the study of attitudes, several alternatives were available. We could have referred to the circumstances of the crimes and to characteristics of the criminals, and one cannot tell whether these would have yielded the same results. For the study of cognitive development, the indicators certainly make sense, given that they are well-known tasks from this domain of study, but alternative tasks have been used. Perhaps the most severe limitation is that in the first two applications ratings were used, so that a cognitive bias may have affected the results. The conclusions must therefore be stated in terms of the manifest categories as used by raters. The situation is different for the developmental application, in which objective data were used. The choice of alternatives to a reference category is also an important issue. In some cases the choice is evident, as for the application on attitudes toward capital punishment and for the developmental application. However, for the personality disorder study, the category of people without any personality disorder would be a meaningful alternative category. The true nature of a category does not depend on the alternative categories it is compared with, but the alternative categories are an important methodological feature that restricts what one can or cannot find. For example, we believe that before one can come to a well-founded conclusion on personality disorders, it seem worthwhile to compare a given disorder with alternative disorders and with normality. One should also realize that our conclusions are restricted to pure personality disorders. Although we had reasons to use only pure categories, it prevents us from generalizing the results to the disorder categories as a whole.Our approach is based on nonlinear mixed models for categorical data and as such it is a very broad one, encompassing most IRT models and more. Analogous approaches can be developed rather easily for continuous data and for latent categories, but such developments would have a different scope. Instead we opted for bringing in another approach that is directed towards manifest categories and categorical indicators, one that is used in cognitive psychology: the Generalized Context Model.
The link we made with the cognitive study of concepts and categories can be considered a mutually inspiring one. Our applications point to the need to include within-category heterogeneity and structure in studies on the cognitive representation of categories. In principle, one can analyze an element-by-feature matrix with elements from different categories, in the same way we did. On the other hand, the cognitive models are a good basis to investigate the way raters (experts and lay persons) come to a category-like decision on other persons or themselves. The cognitive models should be tried out more for heterogeneous manifest categories, given that our results differ from those obtained with stimuli from categories without an internal structure (without correlated features).
We believe the approach we have formulated and applied is rather general and workable. It completes several other approaches, which can be deemed more specialized in one or another aspect of the concept of category-likeness. For example, the taxometric approach is specialized in detecting discreteness between latent categories along a dimension, and it concentrates on pairs of categories. Another example are methods to investigate factorial equivalence in its limited sense (checking only the factor loadings), which concentrate on discrimination equivalence, one aspect of qualitative versus quantitative differences. We do not claim that our framework is all-encompassing, but we believe that there is not just one feature that is distinctive for category-likeness, and that the meta-category of category-likeness is itself polythetic, as most categories are. It was our aim to leave freedom for such a polythetic view of category-likeness, and that room was needed to explain our data.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki, (Eds.), Second international symposium on information theory (pp. 267-281). Budapest, Hungary: Akademiai Kiado.
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author.
Andrich, D. (1978). A rating scale formulation for ordered response categories. Psychometrika, 43, 567-573.
Beauchaine, T. P., & Beauchaine, R. J., III (2002). A comparison of maximum covariance and k-means cluster analysis in classifying cases into known taxon groups. Psychological Methods, 7, 245-261.
Beauchaine, T. P., & Waters, E. (2003). Pseudotaxonicity in MAMBAC and MAXCOV analyses of rating-scale data: Turning continua into classes by manipulating observer's expectations. Psychological Methods, 8, 3-15.
Beguin, A. A., & Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541-561.
Birnbaum, A. (1968). Some latent trait models. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-424). Reading, MA: Addison-Wesley.
Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261-280.
Borsboom, D., Mellenbergh, G., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110, 203-219.
Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105, 456-466.
Cantor, N., Smith, E. E., French, R. D., & Mezzich, J. (1980). Psychiatric diagnosis as prototype categorization. Journal of Abnormal Psychology, 89, 181-193.
Carson, R. C. (1991). Dilemmas in the pathway of the DSM-IV. Journal of Abnormal Psychology, 100, 302-307.
Clarkin, J. F., Widiger, T. A., Frances, A., Hurt, S. W., & Gilmore, M. (1983). Prototypic typology and the borderline personality disorder. Journal of Abnormal Psychology, 92, 263-275.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.
Coyne, J. C. (1994). Self-reported distress: Analog or ersatz depression? Psychological Bulletin, 116, 29-45.
Croon, M. (1990). Latent class analysis with ordered latent classes. British Journal of Mathematical and Statistical Psychology, 43, 171-192.
de Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch models. Journal of Educational Statistics, 11, 183-196.
Devlin, J. T. Gonnerman, L. M., Andersen, E. S., & Seidenberg, M. S. (1998). Category-specific semantic deficits in focal and widespread brain damage: A computational account. Journal of Cognitive Neuroscience, 1, 77-94.
Eagly, A. H., & Chaiken, S. (1993). The psychology of attitudes. Orlando, FL: Harcourt Brace.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.
Endler, N. S., & Kocovski, N. L. (2002). Personality disorders at the crossroads. Journal of Personality Disorders, 16, 487-502.
Fischer, G. (1992). The 'saltus model' revisited. Methodika, 6, 87-98.
Fischer, K. W., Pipp, S. L., & Bullock, D. (1984). Detecting discontinuities in development: Methods and measurement. In R. N. Emde & R. Harmon (Eds.), Continuities and discontinuities in development (pp. 95-121). Norwood, NJ: Ablex.
Flett, G. L., Vredenburg, K., & Krames, L. (1997). The continuity of depression in clinical and nonclinical samples. Psychological Bulletin, 121, 395-416.
Frances, A., Widiger, T., & Fyer, M. R. (1990). The influence of classification methods on comorbidity. In J. D. Maser & C. R. Cloninger (Eds.), Comorbidity of mood and anxiety disorders (pp. 41-59). Washington, DC: American Psychiatric Press.
Gangestad, S. W., Bailey, J. M., & Martin, N. G. (2000). Taxometric analyses of sexual orientation and gender identity. Journal of Personality and Social Psychology, 78, 1109-1121.
Gangestad, S., & Snyder, M. (1985). "To carve nature at its joints": On the existence of discrete classes in personality. Psychological Review, 92, 317-349.
Gangestad, S. W., & Snyder, M. (1991). Taxometric analysis redux: Some statistical considerations for testing a latent class model. Journal of Personality and Social Psychology, 61, 141-146.
Glas, C. A. W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution. Psychometrika, 53, 525-546.
Golden, R. R., & Meehl, P. E. (1979). Detection of the schizoid taxon with MMPI indicators. Journal of Abnormal Psychology, 88, 217-233.
Goodman, L. A. (1972). A general model for the analysis of surveys. American Journal of Sociology, 77, 1035-1086.
Grayson, D. A. (1987). Can categorical and continuous views of psychiatric illness be distinguished? British Journal of Psychiatry, 151, 355-361.
Green, B. F. (1952). Latent structure analysis and its relation to factor analysis. Journal of the American Statistical Association, 47, 71-76.
Guttman, L. A. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139-150.
Haertel, E. H. (1990). Continuous and discrete latent structure models for item response data. Psychometrika, 55, 477-494.
Hampton, J. A. (1995). Testing the prototype theory of concepts. Journal of Memory and Language, 34, 686-708.
Haslam, N. (1997). Evidence that male sexual orientation is a matter of degree. Journal of Personality and Social Psychology, 73, 862-870.
Haslam, N. (2002). Natural kinds, practical kinds, and psychiatric categories. Psycoloquy, 13(001).
Haslam, N., & Beck, A. T. (1994). Subtyping major depression: A taxometric analysis. Journal of Abnormal Psychology, 103, 686-692.
Haslam, N., & Cleland, C. (2002). Taxometric analysis of fuzzy categories: A Monte Carlo study. Psychological Reports, 90, 401-404.
Haslam, N., & Ernst, D. (2002). Essentialist beliefs about mental disorders. Journal of Social and Clinical Psychology, 21, 628-644.
Haslam, N., & Kim, H. C. (2002). Categories and continua: A review of taxometric research. Genetic, Social, and General Psychology Monographs, 128, 271-320.
Hidegkuti, I., & De Boeck, P. (2004). The differentiation of Dimcat models: A simulation study. Unpublished manuscript, K. U. Leuven, Belgium.
Hoijtink, H, & Molenaar, I. W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62, 171-189.
Holland, P. W., & Wainer, H. (Eds.; 1993). Differential item functioning. Hillsdale, NJ: Erlbaum.
Janssen, R., De Boeck, P., Viaene, M., & Vallaeys, L. (1999). Simple mental addition in children with and without mild mental retardation. Journal of Experimental Child Psychology, 74, 261-281.
Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25, 285-306.
Kass, F., Skodol, A. E., Charles, E., Spitzer, R., & Williams, J. B. W. (1985). Scaled ratings of DSM-III personality disorders. American Journal of Psychiatry, 142, 627-630.
Kelderman, H., & Steen, R. (1993). LOGIMO [computer software]. Groningen, the Netherlands: ProGAMMA.
Kiers, H. A. L. (1990). SCA: A program for simultaneous analysis of variables measured in two or more populations [computer software and manual]. Groningen, the Netherlands: ProGAMMA.
Kim, N. S., & Ahn, W. K. (2002). Clinical psychologists' theory-based representations of mental disorders predict their diagnostic reasoning and memory. Journal of Experimental Psychology: General, 131, 451-476.
Kofsky, E. (1966). A scalogram study of classificatory development. Child Development, 37, 191-204.
Komatsu, L. U. (1992). Recent views of conceptual structure. Psychological Bulletin, 112, 500-526.
Korfine, L., & Lenzenweger, M. F. (1995). The taxonicity of schizotypy: A replication. Journal of Abnormal Psychology, 104, 26-31.
Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press.
Lenzenweger, M. F. (1999). Deeper into the schizotypy taxon: On the robust nature of maximum covariance analysis. Journal of Abnormal Psychology, 108, 182-187.
Lenzenweger, M. F., & Korfine, L. (1992). Confirming the latent structure and base rate of schizotypy: A taxometric analysis. Journal of Abnormal Psychology, 101, 567-571.
Lilienfeld, S. O., & Marino, L. (1995). Mental disorder as a Roschian concept: A critique of Wakefield's harmful dysfunction analysis. Journal of Abnormal Psychology, 104, 411-420.
Livesley, W. J., Jackson, D. N., & Schroeder, M. L. (1992). Factorial structure of traits delineating personality disorders in clinical and general population samples. Journal of Abnormal Psychology, 101, 432-440.
Livesley, W. J., & Schroeder, M. L. (1990). Continua of personality disorder: The DSM-III-R Cluster A diagnoses. Journal of Nervous and Mental Disease, 178, 627-635.
Livesley, W. J., Schroeder, M. L, Jackson, D. N., & Jang, K. L. (1994). Categorical distinctions in the study of personality disorders: Implications for classification. Journal of Abnormal Psychology, 103, 6-17.
Maesschalck, C. (1998). A psychometric modelling framework for testing categorical and/or continuous aspects of the borderline, histrionic, and antisocial personality disorders. Unpublished doctoral dissertation, K. U. Leuven, Belgium.
Malt, B. C., & Smith, E. E. (1984). Correlated properties in natural categories. Journal of Verbal Learning & Verbal Behavior, 23, 250-238.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
McKinley, R. L., & Reckase, M. D. (1983). MAXLOG: A computer program for the estimation of the parameters of a multidimensional logistic model. Behavior Research Methods and Instrumentation, 15, 389-390.
McCulloch, C. E., & Searle, S. R. (2001). Generalized, linear, and mixed models. New York: Wiley.
McCutcheon, A. L. (1987). Latent class analysis. Newbury Park, NJ: Sage.
Medin, D. L. (1989). Concepts and conceptual structure. American Psychologist, 44, 1469-1481.
Medin, D. L., & Coley, J. D. (1998). Concepts and categorization. In J. Hochberg & J. E. Cutting (Eds.), Perception and cognition at century's end: Handbook of perception and cognition (pp. 403-439). San Diego, CA: Academic.
Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238.
Meehl, P. E. (1973). MAXCOV-HITMAX: A taxonomic search method for loose genetic syndromes. In Psychodiagnosis: Selected papers (pp. 200-224). Minneapolis, MN: University of Minnesota Press.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806-834.
Meehl, P. E. (1979). A funny thing happened to us on the way to the latent entities. Journal of Personality Assessment, 43, 563-581.
Meehl, P. E. (1995). Bootstraps taxometrics: Solving the classification problem in psychopathology. American Psychologist, 50, 266-275.
Meehl, P. E. (1999). Clarifications about the taxometric method. Applied & Preventive Psychology, 8, 165-174.
Meehl, P. E., & Golden, R. R. (1982). Taxometric methods. In P. Kendall & J. Butcher (Eds.), Handbook of research methods in clinical psychology (pp. 127-181). New York: Wiley.
Meehl, P. E., & Yonce, L. J. (1994). Taxometric analysis: I. Detecting taxonicity with two quantitative indicators using means above and below a sliding cut (MAMBAC procedure). Psychological Reports, 74, 1059- 1274.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543.
Miller, M. B. (1996). Limitations of Meehl's MAXCOV-HITMAX procedure. American Psychologist, 51, 554-556.
Millsap, R. E., & Everson, M. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334.
Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359-381.
Mislevy, R. J., & Bock, R. D. (1989). PC-BILOG 3: Item analysis and test scoring with binary logistic models [computer software]. Mooresville, IN: Scientific Software.
Mislevy, R. J., & Wilson, M. (1996). Marginal maximum likelihood estimation for a psychometric model of discontinuous development. Psychometrika, 61, 41-71.
Murphy, G. L. (2002). The big book of concepts. Boston: MIT Press.
Murphy, G. L., & Medin, D. L. (1985). The role of theories in conceptual coherence. Psychological Review, 92, 289-316.
Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115-132.
Nestadt, G., Romanoski, A. J., Brown, C. H., Chahal, R., Merchant, A., Folstein, M. F., Gruenberg, E. M., & McHugh, P. R. (1991). DSM-III compulsive personality disorder: An epidemiological survey. Psychological Medicine, 21, 461-471.
Nosofsky, R. M., & Palmeri, J. J. (1997). An exemplar based random walk model of speeded classification. Psychological Review, 104, 266-300.
Pirolli, P., & Wilson, M. (1998). A theory of the measurement of knowledge content, access, and learning. Psychological Review, 105, 58-82.
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114, 552-566.
Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, 185-205.
Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104, 192-233.
Rosch, E. (1978). Principles of categorization. In E. Rosch & B. B. Lloyd (Eds.), Cognition & categorization (pp. 27-48). Hillsdale, NJ: Erlbaum.
Rosch, E., Mervis, C. B., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382-439.
Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271-282.
Rost, J. (1991). A logistic mixture distribution model for polychotomous item responses. British Journal of Mathematical and Statistical Psychology, 44, 75-92.
Ruscio, A. M., Borkovec, T. D., & Ruscio, J. (2001). A taxometric investigation of the latent structure of worry. Journal of Abnormal Psychology, 110, 413-422.
Ruscio, A. M., & Ruscio, J. (2002). The latent structure of analogue depression: Should the Beck Depression Inventory be used to classify groups? Psychological Assessment, 14, 135-145.
Ruscio, J. (2000). Taxometric analysis with dichotomous indicators: The modified MAXCOV procedure and a case removal consistency test. Psychological Reports, 87, 929-939.
Ruscio, J., & Ruscio, A. M. (2000). Informing the continuity controversy: A taxometric analysis of depression. Journal of Abnormal Psychology, 109, 473-487.
Sanislow, C. A., Grilo, C. M., Morey, L. C., Bender, D. S., Skodol, A. E., Gunderson, J. G., Shea, M. T., Stout, R. D., Zanarini, M. C., & McGlashan, T. H. (2002). Confirmatory factor analysis of DSM-IV criteria for borderline personality disorder: Findings from the Collaborative Longitudinal Personality Disorder Study. American Journal of Psychiatry, 159, 284-290.
SAS Institute, Inc. (1999). SAS online doc (Version 8) [software manual on CD-ROM]. Cary, NC: SAS Institute, Inc.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
Siegler, R. S. (1981). Developmental sequences within and between concepts. Monographs of the Society for Research in Child Development, 46, 1-4.
Skilling, T. A., Quincey, V. L., & Craig, W. M. (2001). Evidence of a taxon underlying serious antisocial behavior in boys. Criminal Justice and Behavior, 28, 450-470.
Smith, E. E., & Medin, D. L. (1981). Categories and concepts. Cambridge, MA: Harvard University Press.
Smits, T., Storms, G., Rosseel, Y., & De Boeck, P. (2002). Fruits and vegetables categorized: An application of the generalized context model. Psychonomic Bulletin & Review, 9, 836-844.
Sцrbom, D. (1974). A general method for studying differences in factor means and factor structures between groups. British Journal of Mathematical and Statistical Psychology, 28, 229-239.
Storms, G., & De Boeck, P. (1997). Formal models for intra-categorical structure that can be used for data analysis. In K. Lamberts & D. Shanks (Eds.), Knowledge, concepts, and categories (pp. 439-459). London: UCL Press.
Storms, G., De Boeck, P., Hampton, J., & Van Mechelen, I. (1999). Predicting conjunction typicalities by component typicalities. Psychonomic Bulletin & Review, 6, 677-684.
Storms, G., De Boeck, P., & Ruts, W. (2000). Prototype and exemplar based information in natural language categories. Journal of Memory & Language, 42, 51-73.
Strube, M. J. (1989). Evidence for the type in Type A behavior: A taxometric analysis. Journal of Personality and Social Psychology, 56, 972-987.
Sutcliffe, J. P. (1993). Concept, class and category in the tradition of Aristotle. In I. Van Mechelen, J. Hampton, R. S. Michalski, & P. Theuns (Eds.), Categories and concepts: Theoretical views and inductive data analysis (pp. 35-65). London: Academic.
Tajfel, H. (1981). Human groups and social categories: Studies in social psychology. Cambridge, MA: Harvard University Press.
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393-408.
Taylor, J. R. (1995). Linguistic categorization: Prototypes in linguistic theory (2nd. ed). Oxford, England: Oxford University Press.
Thissen, D. (1997). MULTILOG [Computer software]. Mooresville, IN: Scientific Software.
Trull, T. J., Widiger, T. A., & Guthrie, P. (1990). Categorical versus dimensional status of borderline personality disorder. Journal of Abnormal Psychology, 99, 40-48.
Tyler, L. K., Moss, H. E., Dunant-Peatfield, M. R., & Levy, J. P. (2000). Conceptual structure and the structure of concepts: A distributed account of category-specific deficits. Brain and Language, 75, 195-231.
Tyrer, P., & Alexander, J. (1979). Classification of personality disorders. British Journal of Psychiatry, 135, 163-167.
van Maanen, L., Been, P., & Sijtsma, K. (1989). The linear logistic test model and heterogeneity of cognitive strategies. In E. E. Roskam (Ed.), Mathematical psychology in progress (pp. 267-287). New York: Springer-Verlag.
Verbeke, G., & Molenberghs, G. (2000). Linear mixed models for longitudinal data. New York: Springer.
Verhelst, N. D., Glas, C. A. W., & Verstralen, H. H. F. M. (1994). OPLM: Computer program and manual [Computer software]. Arnhem, The Netherlands: CITO.
Vredenburg, K., Flett, G. L., & Krames, L. (1993). Analogue versus clinical depression: A critical reappraisal. Psychological Bulletin, 113, 327-344.
Waller, N. G., & Meehl, P. E. (1998). Multivariate taxometric procedures: Distinguishing types from continua. London: Sage.
Waller, N. G., Putnam, F. W., & Carlson, E. B. (1996). Types of dissociation and dissociative types: A taxometric analysis of dissociative experiences. Psychological Methods, 1, 300-321.
Waller, N. G., & Ross, C. A. (1997). The prevalence and biometric structure of pathological dissociation in the general population: Taxometric and behavior genetic findings. Journal of Abnormal Psychology, 106, 499-510.
Widiger, T. A. (1992). Categorical versus dimensional classification: Implications from and for research. Journal of Personality Disorders, 6, 287-300.
Widiger, T. A., & Shea, T. (1991). Differentiation of Axis I and Axis II disorders. Journal of Abnormal Psychology, 100, 399-406.
Wilson, M. (1989). Saltus: A psychometric model of discontinuity in cognitive development. Psychological Bulletin, 105, 276-289.
Wilson, M. (1993). The "saltus model" misunderstood. Methodika, 7, 1-4.
Wittgenstein, L. (1953). Philosophical investigations. Oxford, England: Blackwell.
Wu, M. L., Adams, R. J., & Wilson, M. (1998). ACER Conquest: Generalized item response modelling software [Computer software]. Melbourne, Australia: Australian Council for Educational Research.
Zimmerman, M., & Coryell, W. H. (1990). DSM-III personality disorder dimensions. Journal of Nervous and Mental Disease, 178, 686-692.