
Ida Marais- Doctor of Psychology
- Senior Researcher at The University of Western Australia
Ida Marais
- Doctor of Psychology
- Senior Researcher at The University of Western Australia
Based at the Psychometric Laboratory, UWA Medical School, I undertake research on measurement and assessment in health.
About
90
Publications
13,552
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,158
Citations
Introduction
Skills and Expertise
Current institution
Additional affiliations
January 2008 - December 2011
Publications
Publications (90)
A meta-metre is estimated on two well-known and established intelligence tests, the Stanford-Binet and the Raven’s Progressive Matrices. The first is a test of general intelligence and the second a test of a more specific variable of spatial reasoning.
The essential properties of measurement, and in particular those articulated by Thurstone (J Abnormal Soc Psychol 21:384–400; 1927b, Psychol Rev 34:278–286, 1927c; Am J Sociol 33:529–554, 1928a; Psychol Rev 35:175–197, 1928b), are summarised in this Chapter, and then shown to be advanced by the work of Rasch (Probabilistic models for some intellige...
Educational attainment tests have a long history, with one of their earliest uses being the selection into, and graduation from, universities during medieval times. Intelligence testing has been the subject of study from around the turn of the twentieth century, precipitated to a large extent by mandated universal education in government sanctioned...
This Chapter presents the approach taken in this monograph to study growth on intellectual variables that mirrors the approach of the physical sciences. This approach is consistent with West’s approach to studying physiological growth, which was anticipated in the 1950s by Rasch. The essence of Rasch’s approach is to find a mode of growth in time,...
Complementing the importance of reading proficiency in a literate society is mathematics proficiency. As with reading in the previous Chapter, this Chapter applies the methods of studying growth to two longitudinal data sets with five times of measurement, one from Kindergarten to Grade 8 and the other from Grade 1 to Grade 9. Individual data in th...
Proficiency in reading is seen as central to functioning in literate societies such that schools devote substantial time to teaching it, which generally begins in kindergarten. In the sense of being taught explicitly, the variable of reading comprehension is different from the variables of intelligence. The Chapter applies the methods of studying g...
The study of growth on variables such as intelligence and attainment tests from schooling, generally, does not follow closely the methods of the physical sciences. In order to provide a wider context for the study of growth in some intelligence and attainment tests from the perspective carried out in Part I of the monograph, this Chapter surveys th...
To provide a transition from the study of growth with physiological variables to intellectual variables applying a meta-metre, with the necessary adaptation because of an arbitrary origin in the latter, this Chapter shows analyses of two physiological variables. The first shows a reanalysis of data from the growth in weight of babies in their first...
Like the physiological variables, all the intellectual variables analysed showed early rapid but decelerating growth with excellently estimated meta-metres. This Chapter elaborates the implications of three features of this deceleration for the attainment variables. It shows: first, that the meta-metre implies that differences between the attainmen...
Means and Standard Deviations for Early Maladaptive Schemas (EMS) in Three Psychiatric and a Non-Clinical Group. Total Sample (N= 857).
The capacity of the Young Schema Questionnaire (YSQ) to predict psychopathology in specific clinical groups has consistently produced mixed findings. This study assessed three versions of the Young Schema Questionnaire (YSQ), including the long form (YSQ-L3), short form (YSQ-S3), and the recent Rasch-derived version, the YSQ-R, and their subscales,...
Background
Screening for depressive symptoms during adolescence is of high clinical significance. The shorter 12-item version of the Children's Depression Inventory (CDI 2:SR[S]) was specifically developed for this purpose. Evaluations of the CDI 2:SR[S] psychometrics are limited, however. The purpose of this study was to validate the CDI 2: SR[S]...
The aim of this study was to refine the YSQ-L3 by identifying the most statistically and clinically appropriate items for each Early Maladaptive Schema (EMS) using Rasch analysis.
Method: A Rasch analysis was undertaken on a large sample (N = 838) that included a heterogeneous clinical sample (N = 574) and a smaller non-clinical group (N = 264).
Re...
Objective
The aim of this study was to refine the YSQ-L3 by identifying the most statistically and clinically appropriate items for each Early Maladaptive Schema (EMS) using Rasch analysis.
Method
A Rasch analysis was undertaken on a large sample (N = 838) that included a heterogeneous clinical sample (N = 574) and a smaller non-clinical group (N...
Purpose:
The psychometric properties of the Perth A-loneness Scale (PALs) have been extensively validated using classical test theory, but to date no studies have applied a Rasch analysis. The purpose of this study was to validate the PALs four subscales, using Rasch analysis.
Methods:
Responses from 1484 adolescents (58% female, mean age = 12.8...
Modern test theory models focus on the interaction of a person with an item rather than upon a total score as in CTT. Rasch specified a two-way frame of reference of items and persons. In the dichotomous Rasch model, the probability of a person’s response to an item is a function of the difference between two model parameters, the item’s location (...
One type of fit which examines the consistency of the data with the model begins with the difference between the observed and expected score at the person–item response level, given the parameter estimates. Then a fit statistic can be obtained at both the person (person fit-residual) and item (item fit-residual) level following a parallel sequence...
The one-parameter logistic (1PL ) or Rasch model involves a single-person parameter and one-item parameter: location. The two-parameter logistic (2PL) model involves two item parameters: location and discrimination. The three-parameter logistic (3PL) model involves three item parameters: location, discrimination and guessing.
In the Rasch model, the probability that a person answers one of two dichotomous items correctly, on the condition that only one is answered correctly and the other incorrectly, depends only on the relative difficulties \( \delta_{1} \) and \( \delta_{2} \) of the items and is independent of the proficiency \( \beta \) of the person. Because the Ra...
To estimate the difference in difficulty between two dichotomous items the equation conditioning on the total score and eliminating the person parameter is applied. In the solution equation, the proportion of persons who have answered one item correctly and the other incorrectly is an estimate of the probability of answering that item correctly, gi...
In the 3P model, a guessing parameter is estimated for each item, in addition to parameters for the item’s location and discrimination. The Rasch model makes no provision for guessing behaviour. Therefore, guessing affects item and person estimates. A person is most likely to show guessing if an item is very difficult for the person. Guessing can b...
There are two approaches that can be taken with regards to a data–model relationship. The first and most common approach is that of item response theory (IRT), where the model describes the data. If the data do not fit the model, another model with more parameters is chosen. The second and less common approach is that of Rasch measurement theory, w...
A category coefficient \( \kappa_{k} \) is the sum of the exceeded thresholds for response category k. The PRM can be rewritten in terms of principal components (Guttman) for the thresholds instead of the thresholds themselves. The principal component \( \lambda \) characterizes the spread of the responses, the principal component \( \eta \) charac...
The Rasch models routinely handle missing data. It is possible for subsets of persons to respond to different subsets of items which have been established to be in the same frame of reference, and have person estimates on the same scale. As a result, test equating when only some items are common between two assessments assessing the same variable i...
Ordered categories are taken as analogous to physical measurement with a continuum partitioned by successive thresholds. However, unlike physical measurements, the thresholds which form the categories are not assumed to be equidistant. Thresholds are defined by minimum proficiencies required to succeed at the thresholds. The threshold form of the p...
In CTT, reliability is defined as the proportion of true score variance to total variance. It is most often estimated using the coefficient \( \alpha \). This index assumes the instrument is unidimensional and is not a test of unidimensionality. Construct validation addresses the substantive dimension of the variable assessed.
DIF occurs when items do not function in the same way for different groups of people who otherwise have the same value on the trait. DIF can be identified graphically through the ICC. Different locations of the curves for the groups but similar slopes indicate uniform DIF. Different slopes for the groups indicate non-uniform DIF. DIF can be confirm...
Fit of responses to the model can be assessed graphically and also statistically. Assessing fit has two aspects: assessing person fit and item fit. Comparing observed proportions in class intervals with the ICC is a graphical test of item fit. The \( \chi^{2} \) test performs the same comparison for an item statistically.
The total score \( r_{n} \) of person n on a set of items in the Rasch model is a sufficient statistic. Sufficiency implies that there is no further information about the person’s proficiency \( \beta_{n} \) in the pattern of the person’s responses. If the response patterns fit the Rasch model, then they are likely to be close to the Guttman patter...
The sum of the theoretical means (probabilities) of the number of times each item would be answered correctly should be equal to the number of items that are answered correctly \( r_{n} \) by person n. From this equation \( \beta_{n} \) can be estimated. Equations are solved iteratively until convergence. The same total score, for the same items co...
In addition to the two parameters \( \beta \) and \( \delta \), the Rasch model for ordered response categories contain additional parameters called thresholds, denoted by. A threshold is a point on the measurement continuum, where the probability of a response in two adjacent categories is equally likely. In the case of a dichotomous response, the...
The standard two-facet, person-by-item design and application of the Rasch model has been extended to a three-facet design and a three-facet parametrization of the model. Applications of this parametrization include estimates of judge severity, the diagnosis of a halo effect, and studying change with repeated measurements. Repeated measurements can...
Derivation of the CTT equations of Chap. 3 and coefficient α. In particular, derivation of covariance, standard error of measurement, equation for predicting the true score from the observed score and coefficient α.
Revision of the principles of Rasch measurement theory—invariance of comparisons, item and threshold locations, tests of statistical fit between responses and the Rasch model.
Assessment involves the engagement of an entity with some instrument and the recording of observations of the engagement according to some protocol. Measurement involves some kind of transformation of assessments and is defined as the estimation of the amount of an unidimensional latent trait relative to a unit. A scale is a linear continuum partit...
Classical test theory (CTT) rests on the assumption of a normal distribution of scores in some population and assumes scores are not at the extremes of the possible range. In CTT, a person’s observed test score is a sum of a true score and an error score. The test’s reliability is the central index of CTT and is the ratio of true score variance to...
In the polytomous Rasch model (PRM) rating scale parameterization—one set of thresholds is estimated for all items. In the PRM partial credit parameterisation—a different set of thresholds are estimated for each of the items. A category coefficient \( \kappa_{k} \) is the negative sum of the exceeded thresholds for response category k. The slope of...
The Rasch model implies statistical independence of responses, generally referred to as local independence. Local independence can be violated in two generic ways: multidimensionality, when person parameters other than \( \beta \) are involved in the response, and response dependence, when the response to one item might depend on the response to a...
RMT and CTT are not incompatible theories but RMT can be seen as an elaboration of CTT.
Items hypothesized to be dependent can be combined into higher order polytomous items and the data reanalysed using the PRM. If the reliability estimate from an analysis where items hypothesized to be dependent are combined into higher order polytomous items is lower than the reliability estimate from the original analysis, dependence is present. T...
There are non-Rasch models used for analysing responses to items with ordered categories. Their application follows the item response theory, rather than the Rasch measurement theory, paradigm. There are two classes of models used. The first class specializes algebraically to the PRM; the second class is structurally different from the PRM.
During the item development stage, a pool of items are produced by experienced item writers according to the instrument specifications, which may include the number of the items to measure each aspect or content area and the format of each item. After items have been developed they are typically refined through a number of item trials. Instruments...
(i) Conformance of responses to the Guttman structure with items of different difficulty operationalizing the continuum of a construct; (ii) the Guttman structure is cumulative and more or less difficult items reflect more or less of the construct on the continuum; (iii) the total score of a person reproduces the unique pattern of responses across...
The residual is the difference between a person’s response to an item and the response that is expected according to the model. When it is referenced to its standard deviation, it is a standardized residual. The residual distributions produced in RUMM2030 can be helpful in interpreting residuals. Correlations between item residuals can be helpful i...
This study set out to examine the range of legibility demonstrated by Western Australian students required to handwrite tasks of increasing intrinsic cognitive load. A representative sample of students in Years 1, 2 and 3 (N=437) was recruited for a cross sectional study and teachers administered handwriting tasks. Year 1 students were administered...
The 10-item Emotion Regulation Questionnaire (ERQ) was developed to measure individual differences in the tendency to use two common emotion regulation strategies: cognitive reappraisal and suppression. The current study examined the psychometric properties of the ERQ in a heterogeneous mixed sample of 713 (64.9% female) community residents using t...
Even though guessing biases difficulty estimates as a function of item difficulty in the dichotomous Rasch model, assessment programs with tests which include multiple‐choice items often construct scales using this model. Research has shown that when all items are multiple‐choice, this bias can largely be eliminated. However, many assessments have...
Establishing the internal validity of psychometric instruments is an important research priority, and is especially vital for instruments that are used to collect data to guide public policy decisions. The Warwick-Edinburgh Mental Well-Being Scale (WEMWBS) is a well-established and widely-used instrument for assessing individual differences in well...
The 10-item Emotion Regulation Questionnaire (ERQ) was developed to measure individual differences in the tendency to use two common emotion regulation strategies: cognitive reappraisal and suppression. The current study examined the psychometric properties of the ERQ in a heterogeneous mixed sample of 713 (64.9% female) community residents using t...
Background
Awareness of sport-related concussion (SRC) is an essential step in increasing the number of athletes or parents who report on SRC. This awareness is important, as there is no established data on medical care at youth-level sports and may be limited to individuals with only first aid training. In this circumstance, aside from the coach,...
The Young Schema Questionnaire (YSQ) was developed to measure Early Maladaptive Schemas (EMS), a construct central to Schema Therapy (ST). Traditionally YSQ items were placed in a grouped format for each schema but in recent versions of the questionnaire, items are presented in a random order. This study investigates the effect of item placement on...
This study explores the development, description and illustration of inherent requirement statements (IR) to make explicit the requirements for performance on an Initial Teacher Education (ITE) practicum. Through consultative group processes with stakeholders involved in ITE, seven inherent requirement domains were identified. From interviews with...
This paper presents results of an investigation into the relationship between Kenyan Sign Language (KSL) and English literacy skills. It is derived from research undertaken towards an MEd degree awarded by The University of Western Australia in 2011. The study employed a correlational survey strategy. Sixty upper primary deaf students from four res...
This study presents a Rasch-derived short form of the Warwick-Edinburgh Mental Well-Being Scale for use as a screening tool in the general population. Data from 2,005 18- to 69-year-olds revealed problematic discrimination at specific thresholds. Estimation of model fit also deviated from Rasch model expectations. Following deletion of 4 items, the...
The Mindful Attention Awareness Scale was developed to measure individual differences in the tendency to be mindful. The current study examined the psychometric properties of the Mindful Attention Awareness Scale in a heterogeneous sample of 565 nonmeditators and 612 meditators using the polytomous Rasch model. The results showed that some items di...
Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The consequ...
The psychometric properties of the General Functioning subscale of the McMaster Family Assessment Device were examined using the Rasch Model (N = 237 couples). Mothers’ and fathers’ ratings of the General Functioning subscale of the McMaster Family Assessment Device are recommended, provided these are analyzed separately. More than a quarter of cou...
Large scale testing programs often involve a number of assessments that include multiple choice items administered to students in different grades. The Rasch model is sometimes used to transform the raw test scores onto a common vertical scale of proficiency. However, with multiple choice items students may guess and the Rasch model makes no provis...
Andrich, Marais, and Humphry showed formally that Waller's procedure that removes responses to multiple choice (MC) items that are likely to be guessed eliminates the bias in the Rasch model (RM) estimates of difficult items and makes them more difficult. The former did not study any consequences on the person proficiency estimates. This article sh...
This chapter offers a discussion of multidimensionality in health outcome scales and describes methods that can help indicate if there is multidimensionality in a data set. Two different situations are considered: confirmatory analysis where an a priori hypothesis is tested regarding which items measure what latent construct, and exploratory analys...
The unidimensional Rasch model for dichotomous items and the unidimensional Rasch model for more than two ordered categories rely on the assumption of local independence. This chapter discusses tests of the assumption of local independence of responses and the implications of violations of this assumption in data. Local dependence is related to sub...
Graffiti is often viewed as a nuisance ‘kids’ crime, an act of youthful resistance
and, as such, it is sometimes given a lower policing prioritisation level than more
‘serious’ crimes. In this study, the three-year offending histories of 798 graffitists
were extracted from the Western Australian Police Information Management
System database. To add...
Andersen (1995, 2002) proves a theorem relating variances of parameter estimates from samples and subsamples and shows its use as an adjunct to standard statistical analyses. The authors show an application where the theorem is central to the hypothesis tested, namely, whether random guessing to multiple choice items affects their estimates in the...
Models of modern test theory imply statistical independence among responses, generally referred to as local independence. One violation of local independence occurs when the response to one item governs the response to a subsequent item. Expanding on a formulation of this kind of violation as a process in the dichotomous Rasch model, this article g...
Skate-parks serve as hang-out hubs for juveniles engaged in both lawful leisure pursuits (e.g. skateboarding, inline skating, bike/scooter riding and urban artistry) and illegal activities (e.g. graffiti-writing, underage drinking or substance abuse). Thus, proposed skate-park builds often can produce polarized community debate. Such debate typical...
Wiseman and Watt’s short scales of positive and negative superstitions have attracted attention in the literature. Using a representative survey of the Australian state of Queensland, the six scale items were applied to 1243 respondents. Initial investigation using Cronbach’s alpha showed that one of the scales did not function properly. A factor a...
The 'halo effect' may be unique to different raters or common to all raters. When common to all raters, halo is not detectable through standard fit indices of the three-facet Rasch model used to account for differences in rater severities. Using a formulation of halo as a violation of local independence, a halo effect common to all raters is simula...
An analysis of distractors in measuring achievement provides information about students’ understanding of the variable being measured in the classroom environment. To be able to provide information, a distractor should contain some aspect of the correct answer. Also, the proficiency required to choose the distractor with information should be less...
Because of confounding effects that can mask change when persons respond to the same items on more than one occasion, the measurement of change is a challenge. The specific effect on change studied in this paper is that observed when responses of persons to items at time 2 are dependent statistically on their responses at time 1. In addition, becau...
In Perth, the capital city of the state of Western Australia, there is a growing move towards the use of urban art as graffiti deterrence. This paper reports on an empirical evaluation of a commissioned urban art project. A former graffiti hot-spot (three bus underpass walls at a commuter train station) and a one square kilometre area surrounding t...
Local independence in the Rasch model can be violated in two generic ways that are generally not distinguished clearly in the literature. In this paper we distinguish between a violation of unidimensionality, which we call trait dependence, and a specific violation of statistical independence, which we call response dependence, both of which violat...
By adding items with responses identical to a selected item, Smith (2005) investigated the effect of the response dependence on person and item parameter estimates in the dichotomous Rasch model. By varying the magnitude of response dependence among selected items, rather than their having perfect dependence, this paper provides additional insights...
Participants responded to probe letters after sets of two, four, and six letters were memorized (Sternberg, 1966, 1969b). Spatial attention was controlled by central arrow cues and stimuli were presented in a clear or a visually degraded from. Overall RT was shorter for attended than for unattended locations, and shorter for clear than for degraded...
Participants performed a memory-scanning task (Sternberg, 1966, 1969a) in which probe letters were displayed unilaterally or bilaterally after sets of two, four, or six letters were memorized. The mean response time (RT) to bilateral presentations was significantly longer than the mean RT to unilateral presentations, but the slope of the set-size f...