Article

Introduction to Educational and Psychological Measurement Using R

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

This book provides an introduction to the theory and application of measurement in education and psychology. Topics include test development, item writing, item analysis, reliability, dimensionality, and item response theory. These topics come together in overviews of validity and, finally, test evaluation. Validity and test evaluation are based on both qualitative and quantitative analysis of the properties of a measure. This book addresses the qualitative side using a simple argument-based approach. The quantitative side is addressed using descriptive and inferential statistical analyses, all of which are presented and visualized within the statistical environment R (R Core Team 2017). The intended audience for this book includes advanced undergraduate and graduate students, practitioners, researchers, and educators. Knowledge of R is not a prerequisite to using this book. However, familiarity with data analysis and introductory statistics concepts, especially ones used in the social sciences, is recommended.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... We then conducted a factor analysis to identify multiple unobserved variables, or factors, that explained the correlations among our observed variables (in this case test item scores). 21 In an exploratory factor analysis we explored the possible number and types of factors that explained correlations within the questionnaire and how these factors related to each other by calculating eigenvalues (numerical values that express the total variance explained by each factor). We first fitted a 10-factor structure and evaluated the results in a scree plot, which visualizes factors in decreasing order of eigenvalue, where values of 1 or more are considered the cutoff for acceptability. ...
... We first fitted a 10-factor structure and evaluated the results in a scree plot, which visualizes factors in decreasing order of eigenvalue, where values of 1 or more are considered the cutoff for acceptability. 21 We then decided upon the optimal number of factors based on the strength of underlying constructs (scree effect) and how well each item related or loaded, on these hypothesized factors. We therefore consecutively fitted a 1-factor, 2-factor, 3-factor, and 4-factor analysis respectively and discussed which structure was most appropriate to explain possible constructs or subdomains within the 2 questionnaire. ...
Article
Full-text available
Physician-oriented online education could be a pathway to improve care for patients with heart failure, however, it is difficult to measure the impact of such education. Self-efficacy is a potential outcome measure. In this article, we develop a methodology for analyzing an educational intervention for general practitioners (GPs) using self-efficacy as a concept. This study was partly conducted within the setting of an observational study, IMPACT-B, where we developed online education for GPs. We designed and refined a 24-item questionnaire using item analysis, and exploratory and confirmatory factor analysis. Ninety-one GPs completed the questionnaire before and after the online education. Follow-up data after 6 months was available for 13 GPs. Item analysis revealed a high degree of internal consistency (coefficient alpha 0.95) and validity. Each additional year of experience was associated with an average baseline self-efficacy score of 0.50 points (95% CI [0.21-0.80]), and each additional patient in HF follow-up with an average score of 2.0 points (95% CI [0.48-3.5]). Items that differentiated most between GPs with high and low self-efficacy were the treatment of congestion as well as titrating medication and MRA in heart failure with reduced ejection fraction. Factor analysis reduced the number of questions to 14, mapping to three factors (diagnosis, treatment, and follow-up), and improved the model fit as measured by the goodness-of-fit indicator comparative-fit-index (from 0.83 to 0.91). We demonstrated a method to assess the impact of online education on general practitioners. This led to a questionnaire that was reliable, valid, and convenient to use in an implementation context.
... These categories include content validity, face validity, criterion validity, and construct validity (Bolarinwa, 2015;Sürücü & Maslakci, 2020). Construct validity is currently the most commonly used validity testing method (Albano, 2020). Therefore, this study mainly uses construct validity to prove the scale validity. ...
Article
Full-text available
In light of the escalating environmental pollution and the pressing issue of climate change attributed to vehicle emissions, a growing number of electric vehicle companies have emerged in China. Despite various study endeavors focusing on consumers' purchase intention for electric vehicles, there remains a significant gap when it comes to exploring the purchase intentions specifically related to Chinese brand electric vehicles. This study seeks to bridge this gap by examining the relationship between brand image, perceived benefits, attitude, and the purchase intention of potential consumers in the context of Chinese brand electric vehicles. The study was conducted through the distribution of questionnaires, both in offline stores specializing in Chinese brand electric vehicles and online via WenJuanXing, a survey platform. To ensure the data collected is reflective of the target audience, a purposive sampling technique was employed, gathering 187 valid questionnaires from individuals between the ages of 18 and 70 who have a genuine interest in electric vehicles and intend to make a purchase in the near future. The gathered data was analyzed using SPSS and SmartPLS. This study's results prove that brand image, encompassing functional, symbolic, and experiential aspects, plays a crucial role in shaping consumers' perceived benefits and attitude, thereby significantly positively influencing their purchase intention. This study holds immense significance for car sellers and industry stakeholders. By unraveling the dynamics of brand image, perceived benefits, and consumer attitude toward Chinese brand electric vehicles, the findings provide essential insights to empower companies to align their strategies, thereby significantly motivating potential consumers to make their purchase intention a reality. Ultimately, this study contributes to the growing field of electric vehicles, particularly within the unique context of the Chinese market.
... f1 Importance of Non-Technical Skills in Engineering f2 Sense of Belonging in Engineering f3 Academic Self-Confidence and Self-Efficacy f4 Understanding of the Broad Nature of Engineering f5 Attitudes toward Persisting and Succeeding in Engineering f6 Importance of Technical Skills in Engineering Factor 6, Importance of Technical Skills in Engineering, was omitted from the SEM analysis primarily because only two survey items loaded onto this factor. While there is no strict rule for the minimum number of items that should be in each factor, a commonly suggested guideline is to have at least three items per factor to ensure more stability and reliability in factor analysis, which helps to retain more robust and interpretable results [83]. Moreover, the topic of factor 6, a students' recognition of the importance of technical skills in engineering, was deemed somewhat superfluous to this current study. ...
... Furthermore, the entries MT1, SR1, AP10, and AP9 had cross-loadings these were eliminated as well (Maskey et al., 2018). After removing all items that did not meet the loading factor of .5 and with cross-loadings, the EFA analysis was performed again (Albano, 2020). Table 3 provides the 51-item questionnaire on the Impact of AI in Education, which includes: AI Dependency (9 items); Social Interaction (9 items); Career Guidance (10 items); Academic Performance (6 items); Learning Experience (6 items); Self-reliance (6 items); and Motivation (5 items). ...
Article
p style="text-align:justify">The role of artificial intelligence (AI) in education remains incompletely understood, demanding further evaluation and the creation of robust assessment tools. Despite previous attempts to measure AI's impact in education, existing studies have limitations. This research aimed to develop and validate an assessment instrument for gauging AI effects in higher education. Employing various analytical methods, including Exploratory Factor Analysis, Confirmatory Factor Analysis, and Rasch Analysis, the initial 70-item instrument covered seven constructs. Administered to 635 students at Nueva Ecija University of Science and Technology – Gabaldon campus, content validity was assessed using the Lawshe method. After eliminating 19 items through EFA and CFA, Rasch analysis confirmed the construct validity and led to the removal of three more items. The final 48-item instrument, categorized into learning experiences, academic performance, career guidance, motivation, self-reliance, social interactions, and AI dependency, emerged as a valid and reliable tool for assessing AI's impact on higher education, especially among college students.</p
... Given that guidelines typically recommend that factors have no fewer than three items (Albano, 2020), an initial set of four multilevel EFA models, ranging from one to two between-and within-person factors, were fit including all eight cannabis craving items. An oblique (i.e., Geomin) rotation was used. ...
Article
Given the popularity and ease of single-item craving assessments, we developed a multi-item measure and compared it to common single-item assessments in an ecological momentary assessment (EMA) context. Two weeks of EMA data were collected from 48 emerging adults (56.25% female, 85.42% White) who frequently used cannabis. Eight craving items were administered, and multilevel factor analyses were used to identify the best fitting model. The resulting scale’s factors represented purposefulness/general desire and emotionality/negative affect craving. Convergent validity was examined using measures of craving, cannabis use disorder symptoms, frequency of use, cannabis cue reactivity, cannabis use, negative affect, and impulsivity. The scale factors were associated with cue-reactivity craving, negative affect, impulsivity, and subfactors of existing craving measures. For researchers interested in using a single item to capture craving, one item performed particularly well. However, the new scale may provide a more nuanced assessment of mechanisms underlying craving.
... Measurement is a fundamental quality of science that can be defined as the assignment of values to an object in such a way as to correspond to different degrees of a quality or property of some object, person, or event (Duncan, 1984;Stevens, 1946). Thus, for measurement to occur, three things are necessary (Albano, 2017): (a) one needs an object or thing that is being measured (in matters of social and public policy, this is often people); (b) a variable for which a property or quality is being measured for an object (i.e., a construct); and (c) a value or units in which measurement is captured within a variable (i.e., concrete assessment). How measured variables represent abstract concepts can take on many approaches using numerous measurement instruments (for a review, see DeVellis, 2017). ...
Technical Report
Full-text available
Data science research often involves ingesting and linking disparate sources of secondary data. While these sources can often be cleaned and wrangled into a usable form for analysis, robust documentation on how variables are created, and their intrinsic meaning, might not always be readily apparent. Without such meaning applied, data can lack the context necessary to understand the best ways to use and analyze it and risk misinterpretation. We introduce conceptual and methodological profiling processes into the data science pipeline as a qualitative tool to help researchers derive additional meaning and understanding from their data. Conceptual and methodological profiling uses various taxonomies to categorize variables and produce metadata to inform about how variables were created or recorded and the concepts they represent. To help explicate these processes, we first broadly describe these approaches and their place in the data science pipeline, then present a real-world example applying these techniques in our research using disparate data sources from the U.S. Army. Lastly, we discuss how researchers can find agreement while conducting these qualitative processes. We hope that the processes outlined here will provide data scientists additional tools to know their data better and how best to use it.
... There is also a growing body of books focusing on measurement with practical examples in R. These include Y. Li and Baron (2011), Revelle (2015), and Baker and Kim (2017), or the more recent books of J. D. Brown (2018), Desjardins and Bulut (2018), Mair (2018), T. Albano (2020), and Debelak, Strobl, and Zeigenfuse (2022). ...
... In a psychosocial instrument, item score provides information on a specific aspect of a complex construct while summary score -a combination of some or all item scores -represents a more comprehensive measure and tends to be more statistically reliable [31]. Hence, two predictor models were evaluated: one model with only item scores and other with only summary scores of the psychosocial scales at baseline (FTND, MNWS, PANAS and QSU-B). ...
Article
Full-text available
Background: Research on risk factors for neuropsychiatric adverse events (NAEs) in smoking cessation with pharmacotherapy is scarce. We aimed to identify predictors and develop a prediction model for risk of NAEs in smoking cessation with medications using Bayesian regularization. Methods: Bayesian regularization was implemented by applying two shrinkage priors, Horseshoe and Laplace, to generalized linear mixed models on data from 1203 patients treated with nicotine patch, varenicline or placebo. Two predictor models were considered to separate summary scores and item scores in the psychosocial instruments. The summary score model had 19 predictors or 26 dummy variables and the item score model 51 predictors or 58 dummy variables. A total of 18 models were investigated. Results: An item score model with Horseshoe prior and 7 degrees of freedom was selected as the final model upon model comparison and assessment. At baseline, smokers reporting more abnormal dreams or nightmares had 16% greater odds of experiencing NAEs during treatment (regularized odds ratio (rOR) = 1.16, 95% credible interval (CrI) = 0.95 - 1.56, posterior probability P(rOR > 1) = 0.90) while those with more severe sleep problems had 9% greater odds (rOR = 1.09, 95% CrI = 0.95 - 1.37, P(rOR > 1) = 0.85). The prouder a person felt one week before baseline resulted in 13% smaller odds of having NAEs (rOR = 0.87, 95% CrI = 0.71 - 1.02, P(rOR < 1) = 0.94). Odds of NAEs were comparable across treatment groups. The final model did not perform well in the test set. Conclusions: Worse sleep-related symptoms reported at baseline resulted in 85%-90% probability of being more likely to experience NAEs during smoking cessation with pharmacotherapy. Treatment for sleep disturbance should be incorporated in smoking cessation program for smokers with sleep disturbance at baseline. Bayesian regularization with Horseshoe prior permits including more predictors in a regression model when there is a low number of events per variable.
... Moreover, and specific to mental health promotion, many measures of positive mental health and wellbeing remain underdeveloped or un-validated, as was the case with several of the measures adopted for the present study (e.g., peer and adult attachment, civic engagement). An additional challenge with currently available measures of positive mental health is that many do not have sufficient item discrimination, that is, the items are framed in a way that produces little variability in scores across participants (47). This was an issue in our study alongside ceiling effects, wherein the overall group of participants scored high on measures of positive mental health at baseline, leaving little room to demonstrate improvement over time. ...
Article
Full-text available
Introduction Protecting and promoting the mental health of youth under 30 years of age is a priority, globally. Yet investment in mental health promotion, which seeks to strengthen the determinants of positive mental health and wellbeing, remains limited relative to prevention, treatment, and recovery. The aim of this paper is to contribute empirical evidence to guide innovation in youth mental health promotion, detailing the early outcomes of Agenda Gap, an intervention centering youth-led policy advocacy to influence positive mental health for individuals, families, communities and society. Methods Leveraging a convergent mixed methods design, this study draws on data from n = 18 youth (ages 15 to 17) in British Columbia, Canada, who contributed to pre- and post-intervention surveys and post-intervention qualitative interviews following their participation in Agenda Gap from 2020-2021. These data are supplemented by qualitative interviews with n = 4 policy and other adult allies. Quantitative and qualitative data were analyzed in parallel, using descriptive statistics and reflexive thematic analysis, and then merged for interpretation. Results Quantitative findings suggest Agenda Gap contributes to improvements in mental health promotion literacy as well as several core positive mental health constructs, such as peer and adult attachment and critical consciousness. However, these findings also point to the need for further scale development, as many of the available measures lack sensitivity to change and are unable to distinguish between higher and lower levels of the underlying construct. Qualitative findings provided nuanced insights into the shifts that resulted from Agenda Gap at the individual, family, and community level, including reconceptualization of mental health, expanded social awareness and agency, and increased capacity for influencing systems change to promote positive mental health and wellbeing. Discussion Together, these findings illustrate the promise and utility of mental health promotion for generating positive mental health impacts across socioecological domains. Using Agenda Gap as an exemplar, this study underscores that mental health promotion programming can contribute to gains in positive mental health for individual intervention participants whilst also enhancing collective capacity to advance mental health and equity, particularly through policy advocacy and responsive action on the social and structural determinants of mental health.
... Th e internal consistency of the total AFI scale was 0.87 in the current study. Th e internal consistency of the three subscales was also acceptable (Albano, 2020): eff ective actions (0.94), attentional lapses (0.80), and interpersonal eff ectiveness (0.62). ...
Article
Full-text available
CoSAGE Community Advisory and Ethics Committee; Age-related hearing impairment yields many negative outcomes, including alterations in mental health, functional impairments, and decreased social engagement. The purpose of the current study was to examine perceived hearing impairment and its relationship with person-centered outcomes among adults in a rural community setting. A cross-sectional, descriptive correlational design was used. Survey packets of validated instruments were distributed following all weekend services at a rural community church; 72 completed surveys were returned (26% response rate). Descriptive and inferential statistics, including Spearman's rank correlations (rs), were used to address the study aims. Mean age of participants was 54 years (SD = 17 years), 58% were female, and 97% attended church regularly. Thirty-one percent of respondents reported moderate to severe hearing impairment. Perceived hearing impairment was associated with more depressive symptoms (rs = 0.24, p = 0.052), poorer attentional function (rs = −0.29, p = 0.016), and decreased quality of life in the mental health domain (rs = −0.21, p = 0.081). Findings expand evidence supporting the relationship between hearing and person-centered outcomes, including a functional measure of cognition. These results serve as a foundation for the design of a community-driven, church-based hearing health intervention. [Research in Gerontological Nursing, 16(1), 21–32.]
... According to Ferketich as cited by Albano (2018), the best-case scenario when administering an initial pool of candidate test items must at least be twice as large as the final number of items needed. With the 45 items generated in the Kinematics test, the 110 respondents were satisfied, and therefore, can be considered suitable for item analysis. ...
Article
Full-text available
Kinematics, a fundamental structure in Mechanics is a critical concept that needs to be realized by students for a more complex analysis of subsequent topics in Physics. One way to determine the effectiveness of Physics teachers in teaching at these trying times is to measure the conceptual understanding of Grade 12-Senior High School (SHS) students in Science, Technology, Engineering, and Mathematics (STEM) track. With the goal of establishing a valid and reliable test questionnaire in Kinematics that can be administered either in a paper-and-pencil approach (asynchronous learning) or online approach (synchronous learning); this study focused on the development and validation process of a 45-item conceptual test in Kinematics. Adhering to the Most Essential Learning Competencies (MELC) set by the Department of Education (DEPED), the initial pool of items was pilot tested using a Google form to 110 SHS students after the items had undergone face and content validation by a panel of experts. Furthermore, Classical Item Analysis by calculating the difficulty and discrimination indices was examined to establish test validity. Reliability analysis was also conducted using Cronbach’s Alpha ( =0.758) and the Kuder-Richardson formula, (KR-20 = 0.761) which resulted in a deletion of 15 items. In general, this Physics concept test in Kinematics showed an acceptable standard of measurement for classroom use which can be utilized by teachers as a form of diagnostic, formative, and summative tests.
... The comparison between eigenvalue to the number of items, and can identify the factors that found the most variability in scores. We used EFA as the first step of validating the structure to examine the relevance factor of subjective well-being and to get the initial number of factors according to the analysis of the current data (Albano, 2018). There is no index in EFA to reveal which factor is better, which is the main restriction that would be the part of the next step of the analysis. ...
Article
Full-text available
Bringing up a child with disabilities has particular challenges and demands as these children's disabilities may cause certain impacts on their caregivers' well-being. In most of the studies, caregivers exhibited high scores of negative emotions that led to low subjective well-being. The effort to improve caregivers' well-being has been being carried out and one of the ways through subjective well-being research. Diener et al. (2009) define Subjective Well-Being (SWB) as the person’s evaluation of their life events in terms of cognitive and affective aspects. The higher rating score of these aspects, the higher level of SWB of the person. The aspects of SWB could be well measured if the instrument has good psychometric properties. The validity of the instruments is crucial to produce good quality research. In the present study, we examined the construct validity of the SWB using Satisfaction with Life Scale (SWLS) and Scale of Positive and Negative Experience (SPANE) scales. The data was collected from 209 parents who had children with intellectual disability in Tangerang and Jakarta. The construct was validated by exploratory factor analysis (EFA) and Confirmatory Factor Analysis (CFA) in the software R version 3.6.2. The EFA results showed that the construct consisted of four factors: one for the cognitive aspect, one for positive affect, and two for negative affects. The CFA results further demonstrated that this model fitted the empirical data.
... This theory also estimates the individual's marks/grades in these traits. A group of models has developed from this theory, known as the latent traits model (Albano, 2017;Steel & Klingsieck, 2016;Zanon et al., 2016). ...
Article
Full-text available
This study aims to compare the effect of test length on the degree of ability parameter estimation in the two-parameter and three-parameter logistic models, using the Bayesian method of expected prior mode and maximum likelihood. The experimental approach is followed, using the Monte Carlo method of simulation. The study population consists of all subjects with the specified ability level. The study includes random samples of subjects and of items. Results reveal that estimation accuracy of the ability parameter in the two-parameter logistic model according to the maximum likelihood method and the Bayesian method increases with the increase in the number of test items. Results also show that with long and average length tests, the effectiveness is related to the maximum likelihood method and to all conditions of the sample size, whereas in short tests, the Bayesian method of prior mode outperformed in all conditions. Results indicate that the increase of the ability parameter in the three-parameter logistic model increases with the increase of test items number. The Bayesian method outperforms with respect to the accuracy of estimation at all conditions of the sample size, whereas in long tests the maximum likelihood method outperforms at all different conditions. Received: 17 September 2021 / Accepted: 24 November 2021 / Published: 3 January 2022
... The course duration in this study is short, with only 16 weeks; a more prolonged period can help reduce random errors. Another limitation is the sample size, with 83 students; we believe that a larger sample size could increase the generalizability of the result (nevertheless, our sample size meets the minimum requirement suggested by [2]). We will expand the study in the following semesters for future work, and we expect that researchers can conduct related studies to confirm our findings. ...
Chapter
Full-text available
Having a good Human-Computer Interaction (HCI) design is challenging. Previous works have contributed significantly to fostering HCI, including design principle with report study from the instructor view. The questions of how and to what extent students perceive the design principles are still left open. To answer this question, this paper conducts a study of HCI adoption in the classroom. The studio-based learning method is adapted to teach 83 graduate and undergraduate students in 16 weeks long with four activities. A standalone presentation tool for instant online peer feedback during the presentation session is developed to help students justify and critique other’s work. Our tool provides a sandbox, which supports multiple application types, including Web-applications, Object Detection, Web-based Virtual Reality (VR), and Augmented Reality (AR). After presenting one assignment and two projects, our results shows that students acquired a better understanding of the Golden Rules principle over time, which is demonstrated by the development of visual interface design. The Wordcloud reveals the primary focus was on the user interface and sheds light on students’ interest in user experience. The inter-rater score indicates the agreement among students that they have the same level of understanding of the principles. The results show a high level of guideline compliance with HCI principles, in which we witness variations in visual cognitive styles. Regardless of diversity in visual preference, the students present high consistency and a similar perspective on adopting HCI design principles. The results also elicit suggestions into the development of the HCI curriculum in the future.
... The course duration in this study is short, with only 16 weeks; a more prolonged period can help reduce random errors. Another limitation is the sample size, with 83 students; we believe that a larger sample size could increase the generalizability of the result (nevertheless, our sample size meets the minimum requirement suggested by [2]). We will expand the study in the following semesters for future work, and we expect that researchers can conduct related studies to confirm our findings. ...
Preprint
Full-text available
Having a good Human-Computer Interaction (HCI) design is challenging. Previous works have contributed significantly to fostering HCI, including design principle with report study from the instructor view. The questions of how and to what extent students perceive the design principles are still left open. To answer this question, this paper conducts a study of HCI adoption in the classroom. The studio-based learning method was adapted to teach 83 graduate and undergraduate students in 16 weeks long with four activities. A standalone presentation tool for instant online peer feedback during the presentation session was developed to help students justify and critique other's work. Our tool provides a sandbox, which supports multiple application types, including Web-applications, Object Detection, Web-based Virtual Reality (VR), and Augmented Reality (AR). After presenting one assignment and two projects, our results showed that students acquired a better understanding of the Golden Rules principle over time, which was demonstrated by the development of visual interface design. The Wordcloud reveals the primary focus was on the user interface and shed some light on students' interest in user experience. The inter-rater score indicates the agreement among students that they have the same level of understanding of the principles. The results show a high level of guideline compliance with HCI principles, in which we witnessed variations in visual cognitive styles. Regardless of diversity in visual preference, the students presented high consistency and a similar perspective on adopting HCI design principles. The results also elicited suggestions into the development of the HCI curriculum in the future.
... They are critical components of any assessment framework because they reflect the selected SC concept and definition. As indicated by Albano (2017), scale systems are categorized in four classes that range, in terms of the conveyed information, from very general to specific scale values. These classes are: (1) nominal (descriptive names), (2) ordinal (ranking without meaningful intervals), (3) interval (meaningful intervals with relative benchmark), and (4) ratio (meaningful intervals with absolute benchmark) scaling systems. ...
Conference Paper
Full-text available
As a response to the challenges of population and urban growth, the concept of smart city/community (SC) promises more intelligent, sustainable, and resilient communities that provide better services and quality of life. However, the SC as an ecosystem is an evolving concept; hence, there is no universally-shared definition or assessment tool. Additionally, each municipality worldwide has its own unique characteristics, challenges, and opportunities. Therefore, any SC definition and assessment method should be adopted or developed specifically for each city and agreed participatively by the SC initiative leaders. In terms of an SC assessment, most of the available tools are based on evaluating the performance of urban systems. Hence, the developed indicators are mainly used for ranking or comparison purposes. However, these performance and ranking indicators face many challenges due to the broad, multidisciplinary, and rapidly evolving and changing nature of SCs. For instance, due to the rapid technological evolution of SCs, some of the currently-accepted performance indicators will be obsolete in just a few years. Therefore, our research attempts to adapt a generic SC definition with three dimensions. These dimensions include the "connectivity" that can be achieved through intelligent technologies, "sustainability" in terms of long-term viable performance, and "resiliency" in terms of preventive and proactive considerations. Based on these dimensions, a maturity-based scale that is compatible with the evolving nature of SC is proposed for SC maturity assessment. The significance of the research outcome is that it will help the public and managers of the municipalities focus on advancing city maturity which is essential for continuously improving citizens' well-being.
... Thus, we estimate two temporal phase annotations (upper and lower face) per video. We evaluate accuracy with a standard annotation evaluation metric [1], namely Pearson's correlation. Specifically, we correlate the estimated upper (lower) face annotation with the temporal phase annotations of all the upper (lower) facial action units (AUs) [13,42], and consider an estimation successful if the correlation is at least 0.85. ...
Conference Paper
Finding the largest subset of sequences (i.e., time series) that are correlated above a certain threshold, within large datasets, is of significant interest for computer vision and pattern recognition problems across domains, including behavior analysis, computational biology, neuroscience, and finance. Maximal clique algorithms can be used to solve this problem, but they are not scalable. We present an approximate, but highly efficient and scalable, method that represents the search space as a union of sets called ϵ-expanded clusters, one of which is theoretically guaranteed to contain the largest subset of synchronized sequences. The method finds synchronized sets by fitting a Euclidean ball on ϵ-expanded clusters, using Jung's theorem. We validate the method on data from the three distinct domains of facial behavior analysis, finance, and neuroscience, where we respectively discover the synchrony among pixels of face videos, stock market item prices, and dynamic brain connectivity data. Experiments show that our method produces results comparable to, but up to 300 times faster than, maximal clique algorithms, with speed gains increasing exponentially with the number of input sequences.
... IRT models provide information about item parameters and latent traits of test respondents, helping gain insights and assessments about their performance as well as the items. It is also useful for test development, item analysis, equating, item banking, and computer aided test (CAT) [5]. As a group of statistical models with probabilistic and stochastic procedures, IRT connects the pattern of responses to a group of items to predict a latent trait/ability, and then, converts discrete item responses into the levels or locations of probability estimates which respondents possess underlying the latent trait [6,7]. ...
Article
Full-text available
This paper explores a way to apply Item Response Theory (IRT), one of the popular statistical methodologies in measurement and psychometrics, to evaluate Financial Transmission Rights (FTR) paths in the U.S. electricity market. FTR is an energy derivative product to hedge congestion cost risks inherent in constrained transmission lines. In New England, with about 1200 pricing locations, the theoretical combinations of FTR paths amount to 1.4 million in prevailing flows alone. With capital constraints, it is imperative that FTR market participants build the capability to evaluate FTR paths to bid on. IRT provides a framework of how well tests work, and how individual items work on tests, estimating respondents’ latent abilities, and individual item parameters. IRT is utilized to analyze historical electricity data of 2019 for a daily congestion cost of eight customer load zones and one hub in the U.S., New England, for the evaluation of FTR paths. In the analysis, an item represents an FTR path, while item difficulty, item discrimination, and a latent trait variable for the path correspond to the path profitability, risk level, and daily congestion ability, respectively. This paper explores the experimental procedures by which IRT, a psychometric tool, may also be applicable in complex energy markets, providing a consistent and standardized analytical framework to address the issues of selection and prioritization among multiple opportunities. FTR path evaluation is conducted in three steps to determine bid priority paths in FTR auctions: parameter significance tests, ranking on path profitability and risk level, and weighting scores of individual rankings on the two criteria.
... When the two remaining cases were coded by the first and third authors, and combined with the first three, IRR remained strongly positive. The final number of agreements (n = 38) divided by the total possible agreements (n = 45) and multiplied by 100 yielded a final IRR of 84% across all cases (Albano, 2017). Scorers then met to assess IRR and discuss disagreements until discrepancies were resolved. ...
Article
With transition litigation on the rise in recent years, educators need access to current legal trends in special education. Traditionally, educators have been dependent on researchers and attorneys to report on the implications of legal cases to guide the education and services for students with disabilities. In response to this, the Three Dimensions of FAPE Rubric (FAPE3DR) was created to help educators analyze legal cases in a timely manner. Specifically, the authors applied this rubric to five recent legal cases that were decided in favor of the family or transition-age youth. Findings are reported within the scope of broader transition issues.
Preprint
Full-text available
The COVID-19 pandemic led to widespread school closures and the suspension of in-person learning, affecting billions of students globally. Education in emergencies during this crisis seeks to address the unique challenges posed by the pandemic and ensure the continuity of education through alternative modes of delivery. Remote and online learning became crucial tools in facilitating access to education, but they also exposed existing inequities in digital infrastructure and access to technology, exacerbating educational inequalities. In Bhutan, education in emergencies aims to provide continuity in learning, protect the rights of children, and promote psychosocial well-being during crises. It required collaborative efforts from the government, educational institutions, civil society organizations, and international partners. The Ministry of Education, in collaboration with relevant stakeholders, implemented strategies and interventions to ensure that education services are maintained during emergencies. Bhutan emphasized proactive planning to mitigate the impact of emergencies on education. This involved developing policies, frameworks, and guidelines to ensure a rapid and coordinated response to crises. The school closure necessitated the use of adapted and prioritized curricula based on the pandemic situation. The Ministry of Education (MOE) recognized the importance of addressing the emotional and psychological well-being of students affected by crises. Accordingly, psychosocial support programs were implemented to help students cope with trauma and provide a supportive learning environment. It also accelerated the use of blended learning and flip classroom by leveraging on digital technologies. Continuous professional development programs were conducted for teachers to enhance their capacity in delivering quality education in emergency contexts. Teachers were equipped with skills to handle psychosocial support, trauma-informed teaching, and flexible instructional methods. The challenges faced in implementing education in emergencies during the COVID-19 pandemic include technological limitations, the digital divide, learning loss, and the socioeconomic impact on vulnerable populations. However, the crisis has also provided an opportunity to reimagine and innovate education delivery, leading to the development of new pedagogical approaches and the recognition of the importance of flexible and adaptable education systems.
Article
Early identification of language delay is important as it has a serious impact on a child’s life in terms of educational, social, and emotional development. Among the early language screening tools, there are some parent-administered tools; however, they are not culturally appropriate or freely available. This article documents the development and preliminary validation of a quick and easy-to-administer language screening tool for babies from 6 to 18 months of age. Parents of 100 babies ranging in age from 6 to 21 months were included in the study. The babies were classified into five screening levels according to their age. The items of a Screening Test of Early Language Development-Test version (STELD-T) were created and validated through expert opinion. The STELD-T was administered along with the Receptive Expressive Emergent Language Scale (REELS). Internal consistency using the Kuder-Richardson Formula-20 ranged from 0.457 to 0.853 across the five levels, acceptable owing to short tool length and item heterogeneity. Kappa coefficients indicated 0.459 to 0.875 agreement between the STELD-T and the REELS indicated satisfactory criterion validity. After calculating the percentage of babies with a “refer” result as well as Kappa statistics with three different pass-refer criteria, a pass-refer criterion of 75% seemed to be appropriate for screening. The STELD seems to be a reliable and valid tool to screen language development in babies from 6 months to 18 months of age in urban areas of Maharashtra. Items representing a range of language skills including pragmatics make it a unique tool.
Article
Full-text available
The present paper aims to analyze the psychological properties of the mathematics fourth-grade items of Omani and Iranian students by IRT and CDM item analysis. The statistical samples were selected from all Omani and Iranian fourth-grade students who took TIMSS 2015 mathematics test. The research methodology was a secondary analysis method. The results of the IRT showed, there are no same difficult items for both countries. However, 22 items were recognized as very easy items for Oman and Iran. Furthermore, IRT showed, there were just 5 of the same appropriate items for Omani and Iranian students. Besides, the CDM approach found the 9 most difficult items for both Omani and Iranian students. Consequently, CDM analysis analyzed better the psychological properties of the items.
Conference Paper
Virtual 3D conferences are emerging communication channels as a substitution for face-to-face fashion due to the advancement of technologies and the covid-19 pandemic. Current efforts focus on bringing contents into 3D virtual space while delivering them to the color vision deficiency have not been taken into account. To alleviate the stated issue, this paper presents a prototype for color-blind people to simulate the same experience as normal ones. Our method helps users: 1) understand the presented content through adjusted color filtering in such a way that similar colors can be differentiated by the brightness, 2) apparently-identical colors can be varied by the color transformation. Our proposed prototype is demonstrated through three use cases setup in three conditions such as traffic lights, fruit color differentiation, and graph reading in a virtual meeting room. A pilot study conduct with 29 participants shows that our proposed method can improve color differentiation and accuracy for color-blind
Book
Full-text available
The basic issues in psychometrics for MA students in educational and behavioral sciences are explained. Classical Test Theory and Item Response Theory are the major focus.
Article
Objectives: Emergency department thoracotomy (EDT) is a rare and challenging procedure. Emergency medicine (EM) residents have limited opportunities to perform the procedure in clinical or educational settings. Standardized, reliable, validated checklists do not exist to evaluate procedural competency. The objectives of this project were twofold: 1) to develop a checklist containing the critical actions for performing an EDT that can be used for future procedural skills training and 2) to evaluate the reliability and validity of the checklist for performing EDT. Methods: After a literature review, a preliminary 22-item checklist was developed and disseminated to experts in EM and trauma surgery. A modified Delphi method was used to revise the checklist. To assess usability of the checklist, EM and trauma surgery faculty and residents were evaluated performing an EDT while inter-rater reliability was calculated with Cohen's kappa. A Student's t-test was used to compare the performance of participants who had or had not performed a thoracotomy in clinical practice. Item-total correlation was calculated for each checklist item to determine discriminatory ability. Results: A final 22-item checklist was developed for EDT. The overall inter-rater reliability was strong (κ = 0.84) with individual item agreement ranging from moderate to strong (κ = 0.61 to 1.00). Experts (attending physicians and senior residents) performed well on the checklist, achieving an average score of 80% on the checklist. Participants who had performed EDT in clinical practice performed significantly better than those that had not, achieving an average of 80.7% items completed versus 52.3% (p < 0.05). Seventeen of 22 items had an item-total correlation greater than 0.2. Conclusions: A final 22-item consensus-based checklist was developed for the EDT. Overall inter-rater reliability was strong. This checklist can be used in future studies to serve as a foundation for curriculum development around this important procedure.
Thesis
O ensino do pensamento computacional, já na Educação Básica, é de suma importância para preparar os alunos para os desafios do século XXI. Desta forma, surge a necessidade de avaliação das competências adquiridas em relação ao pensamento computacional. A avaliação pela análise do código criado pelo aluno como resultado de atividades abertas é uma forma que permite verificar quais conceitos foram efetivamente aplicados no processo de ensino-aprendizagem. Mesmo já existindo algumas abordagens de forma pontual, principalmente para a linguagem de programação Scratch, ainda não existe um modelo de avaliação mais abrangente e sistematicamente validado. Desta forma, o objetivo do presente trabalho é desenvolver sistematicamente um modelo de avaliação genérico independente de uma linguagem de programação visual (VPL) com base na literatura e no estado da arte. O modelo é instanciado por uma rubrica voltada à avaliação de programas criados com a VPL App Inventor e implementado evoluindo a ferramenta web CodeMaster. A avaliação da confiabilidade e validade do modelo é realizada por uma avaliação em larga escala com mais de 88 mil aplicativos desenvolvidos com App Inventor. Os resultados da avaliação indicam que o modelo é válido e confiável. Por meio da disponibilidade do modelo, espera-se facilitar e reduzir o esforço necessário para avaliação de atividades de programação no contexto de ensino de computação na Educação Básica, suportando assim a sua ampla aplicação em escolas brasileiras.
ResearchGate has not been able to resolve any references for this publication.