Article

Respondent rationale for neither agreeingnordisagreeing: Person and item contributors to middle category endorsement intent on Likert personality indicators

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The current study examines intentions behind middle category endorsement in personality assessment, and investigates person and item antecedents to these intentions. Participants verbally explained their responses to 100 personality items and completed personality, self-concept clarity, and cognitive ability measures. Talked through items were scaled with respect to clarity, complexity, and need for contextualization. Verbal protocols suggest that the predominant respondent orientation when selecting the Likert middle category is it depends. Candidate item and person antecedents indicate that middle category endorsement intentions are more closely attributable to item rather than respondent characteristics. These findings suggest that consecutive integer scoring algorithms may result in personality scale attenuation – particularly with instruments that contain indicators reflecting an ambiguous or unspecified context.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Sustainability is the process of maintaining change in a balanced fashion, in which the exploitation of resources, the direction of investments, the orientation of technological development and institutional change are all in harmony and enhance both current and future potential to meet human needs and aspirations (Kulas et al., 2013). Sustainability is most often defined as meeting the needs of the present without compromising the ability of future generations to meet theirs. ...
... Eassessment simply means electronic assessment. Kulas et al., (2013), e-assessment is seen as the end-to-end electronic assessment process where ICT is used for the presentation of assessment activities, and the recording of responses. It means that all aspects of assessment, from the planning, setting of the papers examinations, marking, recording and the statistical analysis are done electronically. ...
... E-assessment includes the end-to-end assessment processes from the perspectives of the learners, teachers, learning establishments or institutions, awarding bodies and regulators, and the general public. E-assessment is sustainable due its stability and speed of the on-line assessment system: the eassessment or on-line assessment system is stable while setting up tests; it is stable while students complete a test (even with large number of students); answers can be saved in real time (if there is power failure, the answers must be saved up to that point); the speed of delivery of test from the server to work station is acceptable; the speed of presenting each question per work station is acceptable and the speed of presenting videos and graphics per work station is acceptable (Kulas et al., 2013). The assessment system should indicate what the trainees answered as well as the correct answer, extra time can be set for trainees to work through the feedback after test completion and score per question can be displayed in the feedback. ...
Article
A number of issues arise from Nigeria's lack of a regulated and effective system for evaluating the credentials and abilities of artisans in the building sector, including ineffective verification, subjective evaluation, restricted accessibility, a lack of standardisation, and security issues. The study aimed to assess severity of skill shortages in the Nigerian building construction artisans. The study adopts descriptive survey design and quantitative approach was used as study approach, survey strategy was adopted and data were collected through questionnaire survey. The study also adopts simple random sampling technique and SPSS software version 22 was used for data analysis tools. A descriptive analysis type using mean ranking technique and (percentage) were used for the analysis. The study revealed that practicing building construction artisans in Nigeria lack the requisite anger management skills at their workplaces, delay in building projects delivery in Nigeria. Also, practicing building construction artisans in Nigeria lack the requisite health, safety and house-keeping skills at their workplaces and lack of institutionalization of the NSQ leads to skill shortages for the industry were the major points of consideration when looking at the skill shortages for the building construction delivery in Nigeria. Provision of adequate competency-based trainers was rated the most important strategy, suggesting that qualified trainers are essential for effective skill development. Periodic capacity building for practicing artisans, regular training and development programs for existing artisans was considered crucial for maintaining and enhancing their skills. Also, enactment of enabling legislation for the national building code enforcement and a strong regulatory framework is seen as necessary to ensure industry standards and compliance.
... Respondents verbalize moment-by-moment thoughts that are normally silent, thereby avoiding retrospective interpretations. The think-aloud method has previously been applied to evaluate the construct validity of questionnaires (Darker & French, 2009), understand responses on a Likert-scale (Kulas & Stachowski, 2013), and the process of faking on personality inventories (Hauenstein, Bradley, O'Shea, Shah, & Magill, 2017;Robie, Brown, & Beaty, 2007). To our knowledge, no research has used the think-aloud method to capture the process that respondents go through while completing SD scale items to assess their validity. ...
... Peering into the minds of respondents is especially valuable for questionnaires that attempt to assess constructs which are otherwise difficult to measure. Faking behavior in particular could capitalize on further think-aloud studies or other forms of qualitative studies, as several examples have already shown (Hauenstein et al., 2017;König, Merz, & Trauffer, 2012;Kulas & Stachowski, 2013;Robie et al., 2007). ...
... Nonetheless, in terms of responding, Likert scales may bring different considerations than the True/ False format used in the present study. For example, the presence of a neutral mid-point may prompt respondents to make new considerations (Kulas & Stachowski, 2013). Additionally, SD scales are also frequently used with Likert-scales (Lambert et al., 2016). ...
Article
Full-text available
Social Desirability (SD) scales are sometimes treated, by researchers, as measures of dishonesty and, by practitioners, as indicators of faking on self-report assessments in high-stakes settings, such as personnel selection. Applying SD scales to measure dishonesty or faking, however, remains a point of contention among the scientific community. This two-part study investigated if SD scales, with a True/ False response format, are valid for these purposes. Initially, 46 participants completed an SD scale and 12 personality items while under instruction to “think aloud”, that is, to verbalize all the thoughts they had. These spoken thoughts were recorded and transcribed. Next, 175 judges rated the participants’ honesty in relation to each SD item, based on the participants’ transcribed spoken thoughts and their selected response to the item. The results showed that responses keyed as “socially desirable responding” were judged as significantly less honest than those not keyed as such. However, the effect size was very small, and the socially desirable responses were still being judged as somewhat honest overall. Further, participants’ SD scale sum scores were not related to the judges’ ratings of participant honesty on the personality items. Thus, overall, SD scales appear to be a poor measure of dishonesty.
... Another question that faces psychological scale developersand one about which there is relatively more opinion than data-is whether Likert-type items should include an even or odd number of response options (Kulas & Stachowski, 2013;Nadler, Weston, & Voyles, 2015). With an odd number of response options, the middle option (i.e., neither agree nor disagree) has ambiguous meaning, which should increase measurement error if respondents use that option in different ways that do not reflect their perceived standing on the characteristic being measured. ...
... With an odd number of response options, the middle option (i.e., neither agree nor disagree) has ambiguous meaning, which should increase measurement error if respondents use that option in different ways that do not reflect their perceived standing on the characteristic being measured. Kulas and Stachowski (2013) recently studied the reasons respondents may have for selecting the middle option on an oddly numbered Likert scale, which may include (a) the response reflects moderate standing on the item/trait (which arguably would be the ideal), (b) the respondent has difficulty deciding his or her standing one the item, (c) the respondent is confused about the item's meaning, and/or (d) the respondent feels that his or her response is context-dependent (i.e., what the authors called the "it depends" reason for using the middle option). The latter three options are less than ideal, since they represent construct-irrelevant reasons for using a particular response option and thus should increase measurement error and thus attenuate validity. ...
... Given the ambiguity associated with the use of the middle option on odd-numbered response scales (Kulas & Stachowski, 2013), we predicted that odd-numbered Likert scales would show no advantage, psychometrically speaking, over matched even-numbered scales. This result was generally supported, as alphas and criterion validity correlations generally revealed no advantage for odd-numbered scales relative to matched even-numbered scales. ...
Article
Full-text available
Psychological tests typically include a response scale whose purpose it is to organize and constrain the options available to respondents and facilitate scoring. One such response scale is the Likert scale, which initially was introduced to have a specific 5-point form. In practice, such scales have varied considerably in the nature and number of response options. However, relatively little consensus exists regarding several questions that have emerged regarding the use of Likert-type items. First, is there a "psychometrically optimal" number of response options? Second, is it better to include an even or odd number of response options? Finally, do visual analog items offer any advantages over Likert-type items? We studied these questions in a sample of 1,358 undergraduates who were randomly assigned to groups to complete a common personality measure using response scales ranging from 2 to 11 options, and a visual analog condition. Results revealed attenuated psychometric precision for response scales with 2 to 5 response options; interestingly, however, the criterion validity results did not follow this pattern. Also, no psychometric advantages were revealed for any response scales beyond 6 options, including visual analogs. These results have important implications for psychological scale development. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
... It consisted of 19 five-point Likert-scaled questions (verbally anchored response categories 1=strongly agree; 5=strongly disagree) (see attachment 1 for complete survey), seven open-ended questions, three dichotomous questions and two three-point Likert-scaled questions (1=entirely; 3=not at all). We deliberately used an odd number of response options to allow students to reflect moderate standing to an item [37]. The survey was based upon a frequently used standard survey for assessment of teaching quality at LMU Munich (see attachment 1). ...
... Er bestand aus 19 Fragen mit fünfstufiger Likert-Skala (verbal verankerte Antwortkategorien 1=trifft voll zu; 5=trifft gar nicht zu), sieben offenen Fragen, drei binären Fragen und zwei Fragen mit dreistufiger Likert-Skala (1=vollständig; 3=gar nicht). Wir haben bewusst eine ungerade Anzahl an Antwortoptionen gewählt, um Studierenden die Möglichkeit zu geben, eine moderate Einstellung zu einer Frage zu wählen[37]. Die Evaluation basierte auf einer häufig genutzten Standardevaluation zur Beurteilung der Lehrqualität an der LMU München (siehe Anhang 1).Nach Abschluss des Kurses erhielten die Studierenden beider Kohorten einen Link zu einer Online-Evaluation. ...
Article
Full-text available
Objective: Obtaining a systematic medical history (MH) from a patient is a core competency in medical education and plays a vital role in the diagnosis of diseases. At the Faculty of Medicine at LMU Munich, students have their first course in MH taking during their second year. Due to the COVID-19 pandemic, the traditional bedside MH taking course had to be transformed into an online course (OC). Our objectives were to implement an online MH taking course, to evaluate its feasibility and to compare the evaluation results to a historic cohort that had undertaken the traditional bedside teaching course (BTC). Methods: 874 second-year students participated in the OC (BTC=827). After teaching the theoretical background via asynchronous online lectures, students participated in a practical exercise with fellow students using the video communication platform Zoom where they were able to practice taking a MH on the basis of fictitious, text-based patient cases. Students were then asked to evaluate the course through a standardized online survey with 31 questions on teaching quality and self-perceived learning success, which had also been used in previous years. The survey results were compared to the results of the historic cohort using the Mann-Whitney U test. Results: A total of n=162 students (18.5%) evaluated the OC. In the historic cohort, n=252 (30.5%) completed the survey. 85.3% of the OC respondents thought that the atmosphere during the practical exercise was productive and 83.0% greatly appreciated the flexibility in terms of time management. Moreover, they appreciated the online resources as well as having the opportunity to undertake a MH taking course during the COVID-19 pandemic. 27.7% of the respondents thought that traditional BTCs should be supplemented through more online activities in the future. With respect to the ability of independently taking a MH upon completion of the course, the OC was rated significantly lower relative to the BTC (mean OC=2.4, SD=±1.1 vs. mean BTC=1.9, SD=±1.1 (1=strongly agree; 5=strongly disagree); p<0.0001). Conclusion: OCs are a feasible format and seem to convey the theory and practical implementation in a peer-exercise format of MH taking to medical students. The theoretical background can be acquired with great flexibility. Nevertheless, the students' self-appraisal suggested that the traditional teaching format was more effective at teaching MH taking skills. Thus, we propose a blended learning concept, combining elements of both formats. In this context, we suggest prospective, randomized trials to evaluate blended learning approaches.
... First, we adopted the 4 Likert-point of the original Ainley et al.'s (1986) QSL scale. Secondly, the inclusion of an odd number with a neutral response option such as 5-point Likert-scale could create ambiguous meaning that leads to an increase in the measurement error (Kulas & Stachowski, 2013). This is because responding to the neutral option might not reflect their real perceived standing on the characteristic being measured (Kulas & Stachowski, 2013). ...
... Secondly, the inclusion of an odd number with a neutral response option such as 5-point Likert-scale could create ambiguous meaning that leads to an increase in the measurement error (Kulas & Stachowski, 2013). This is because responding to the neutral option might not reflect their real perceived standing on the characteristic being measured (Kulas & Stachowski, 2013). ...
Article
While the quality of school life is expected to be cultural and context-specific, fewer studies have been conducted to investigate the conceptualisation of quality of school life in a multi-ethnic and multicultural context. This study aims to compare Malay and Chinese primary school students’ perceptions on the quality of school life in Malaysia. This study employed a quantitative cross-sectional survey research design. Survey data were collected from 594 Grade 5 students. Findings revealed that both Malay and Chinese students ranked the highest score on the opportunity dimension and the lowest scores on the negative affect dimension. The Malay students scored higher means in all dimensions of quality of schools than the Chinese students, except the findings revealed cultural differences in the perceptions of quality of school life.
... A Likert scale with few anchor points was preferable, as the questionnaire had to be quick (Preston and Colman, 2000), that is, students typically without long attention spans were responding during limited class time, and the questionnaire had to be quick to understand, i.e. not leading respondents to skip categories, when they are unable to differentiate between seemingly similar options (Chang, 1994). To avoid the use of a midpoint as a dumping ground or easy way out (Kulas and Stachowski, 2013), it is recommended to omit a midpoint, when respondents are unfamiliar with the survey topic or not expected to have formed their opinion about the topic or when they are under strong social desirability pressures Raaijmakers et al., 2000). Hence, a 4-point Likert scale was employed. ...
Article
Purpose Entrepreneurial self-efficacy (ESE) has a dark side largely ignored in the field of entrepreneurship education. Research in educational psychology indicates that self-efficacy is prone to misjudgment, with novice learners often displaying overconfidence. Furthermore, this misjudgment is gendered; studies suggest that men are more likely to display overconfidence and less likely to correct erroneous self-assessments. However, realistic self-assessments are essential for effective learning strategies, pivotal for performance in the ambiguous entrepreneurial context. Therefore, this study explores whether entrepreneurship education helps mitigate overconfidence, and if this impact varies by gender. Design/methodology/approach Common in educational psychology, but new in the field of entrepreneurship education, a calibration design captures discrepancies between perceived and actual performance. Data from before and after an introductory undergraduate entrepreneurship course ( N = 103) inform descriptive analyses, statistical comparison tests and calibration plots. Findings As expected, nearly all novice students showed significant overconfidence. Curiously, gender difference was only significant at the end of the course, as overconfidence had decreased among female students and increased among male students. Originality/value The paper advocates a more nuanced stance toward ESE, and introduces ESE accuracy as a more fitting measure of entrepreneurial overconfidence. The findings flag the common use of self-perception as a proxy for actual competence, and evoke new research avenues on (gender differences in) learning motivations of aspiring entrepreneurs. Finally, the study shares guidance for entrepreneurship educators on fostering a “healthier” level of self-efficacy for better entrepreneurial learning.
... The Likert scale has higher validity if the response category is high (Taherdoost, 2019). The Likert-Scale has a flaw in that researchers should provide an odd or even number of response choices (Kulas & Stachowski, 2013;Nadler et al., 2015;Taherdoost, 2019). The odd numbers in the Likert-Scale are better than even numbers because the respondent could choose a neutral position and not force the respondent to be part of any side (Colman et al., 1997;Taherdoost, 2019). ...
Article
Full-text available
Given that Muslims make up the majority in Indonesia, students' perspectives on science are influenced by their religious beliefs. This research aims to analyze the differences in attitudes toward science and religion between national and Islamic schools. This study employed a survey method to look at the views of students in national and Islamic schools. Two groups comprise the 420 Indonesian secondary school students that comprise the research sample: 212 students from national schools and 208 from the renowned "Pesantren" Islamic school in Kota Bandung, West Jawa. The result of this study shows factors with significant differences in general aspects between national school with 3.5 and islamic school with 3.8. The aspect that show significant differences are competitiveness, critical thinking, religiosity, trust in scientists, interest in doing science, extrinsic motivation for science, general value of science, awareness of environmental issues, science self-concept, science removing the need for God, compatibility between science and religion, and perceptions of science lessons. Factors with no significant differences include attitudes toward theistic faith, creationism, the public value of science, and scientism. Additionally, there are some strengths and weaknesses between the national school and Islamic school, such as the time of the science lessons, lab equipment, internet access, etc. It could be concluded that national school and islamic school has strengths and weaknesses that related with science and religion.
... It offers a high granularity inviting respondents to make less polarized choices (Simms et al., 2019) and allows participants to adopt a neutral position. Despite debates on the ambiguity of using the middle option on oddnumbered response scales (Kulas and Stachowski, 2013), it seems worthwhile to leave the opportunity to indicate neutrality in Table 1. Description of the four content areas of empathy in design in the construction of the Empathy in Design Scale (based on Hess and Fila, 2016b;Kouprie and Sleeswijk Visser, 2009;Rogers, 1959) ...
Article
To design user-centered services, it is essential to build empathy toward users. It is hence strategic to trigger empathy for users among professionals concerned with shaping service user experiences. There is, however, a lack of quantitative tools to measure empathy in design. Through two studies, we report on the development and validation of the Empathy in Design Scale (EMPA-D). The tool aims to measure service employees’ empathy toward users. Grounded in theories from psychology and design, we first generated and tested a pool of items through expert inspection and cognitive interviews. In Study 1, we administered 16 items to 406 full-time service employees from various industries, including employees in customer-facing positions. In Study 2, we iterated on additional items and administered a revised scale to 305 service employees. The selected model consists of 11 items and has a three-factor structure (Emotional interest/Perspective-taking, Personal experience and Self-awareness), which showed an adequate model fit and good internal consistency. Evidence of convergent validity was provided by moderate correlations of the EMPA-D scale with empathy measures in psychology (SITES, Empathy Quotient, Interpersonal Reactivity Index), whereas discriminant validity was demonstrated by low correlations with the narcissism measure Narcissistic Personal Inventory. We outline how this self-reported empathy measure can support organizations in enhancing their services and discuss potential limitations of quantitatively measuring empathy in service teams. Research Highlights We present the development and validation of the Empathy in Design Scale (EMPA-D), a self-report measure of employees’ empathy toward users of a service. We report on two validation studies and document the psychometric properties of the scale. The selected model consists of 11 items and a three-factor structure (Emotional interest/Perspective-taking, Personal experience and Self-awareness). The resulting EMPA-D scale contributes to filling the gap in metrics to assess empathy in the service design context. In industry, measuring employees’ empathy support the selection of appropriate empathic interventions to foster the service user-centeredness.
... If not otherwise stated, participants responded to the questionnaire items on a 6-point Likert scale from 1 (completely disagree) to 6 (completely agree). For the self-constructed scales, we chose the 6-point response format to avoid misuse of a neutral middle category [24]. Due to the small sample size of the pilot study, we refrained from reliability analyses. ...
Conference Paper
As we shift to designing AI agents as teammates rather than tools, the social aspects of human-AI interaction become more pronounced. Consequently, to develop agents that are able to navigate the social dynamics that accompany cooperative teamwork, evaluation criteria that refer only to objective task performance will not be sufficient. We propose perceived cooperativity and teaming perception as subjective metrics for investigating successful human-AI teaming. Corresponding questionnaire scales were developed and tested in a pilot study employing the collaborative card game Hanabi, which has been identified as a unique setting for investigating human-AI teaming. Preliminary descriptive results suggest that rule-based and reinforcement learning-based agents differ in terms of perceived cooperativity and teaming perception. Future work will extend the results in a large user study to psychometrically evaluate the scales and test a conceptual framework that includes further aspects related to social dynamics in human-AI teaming.
... Moreover, a six-point rating scale has been broadly endorsed in the research as the most suitable option due to its precision and user-friendly nature (e.g., Taherdoost, 2019). The two shorter rating scales do not include a middle category because individuals often misuse this category by refusing to provide any responses (Kulas & Stachowski, 2013;Lyu & Bolt, 2022;Murray et al., 2016;Nadler et al., 2015). Additionally, empirical findings have shown that even-numbered rating scales possess similar levels of reliability and validity as their oddnumbered counterparts (Alwin et al., 2018;Donnellan & Rakhshan, 2023;Simms et al., 2019). ...
Article
Full-text available
Rating scales are susceptible to response styles that undermine the scale quality. Optimizing a rating scale can tailor it to individuals’ cognitive abilities, thereby preventing the occurrence of response styles related to a suboptimal response format. However, the discrimination ability of individuals in a sample may vary, suggesting that different rating scales may be appropriate for different individuals. This study aims to examine (1) whether response styles can be avoided when individuals are allowed to choose a rating scale and (2) whether the psychometric properties of self-chosen rating scales improve compared to given rating scales. To address these objectives, data from the flourishing scale were used as an illustrative example. MTurk workers from Amazon’s Mechanical Turk platform ( N = 7042) completed an eight-item flourishing scale twice: (1) using a randomly assigned four-, six-, or 11-point rating scale, and (2) using a self-chosen rating scale. Applying the restrictive mixed generalized partial credit model (rmGPCM) allowed examination of category use across the conditions. Correlations with external variables were calculated to assess the effects of the rating scales on criterion validity. The results revealed consistent use of self-chosen rating scales, with approximately equal proportions of the three response styles. Ordinary response behavior was observed in 55–58% of individuals, which was an increase of 12–15% compared to assigned rating scales. The self-chosen rating scales also exhibited superior psychometric properties. The implications of these findings are discussed.
... Both types have advantages and disadvantages, and their choices depend on the research questions Joshi et al. (2015). Studies have used scales with both even and odd numbers of options (e.g., Rico-Juan et al. (2018), Liu and Carless (2006); Kulas and Stachowski (2013); Huisman et al. (2020)), and a good comparison of scale types and usage scenarios can be found in Taherdoost (2019). ...
Article
50 days free access to full paper: https://authors.elsevier.com/c/1iZVL1HucdZtcM Abstract: Peer assessment is a process in which students rate their peers which has many benefits for both the assessor and the assessed. It actively engages students, increases motivation by giving a sense of ownership of the assessment process, encourages autonomy and critical analysis skills, broadens their understanding of the topic, enhances problem-solving and self-assessment abilities as well as develops soft skills. Peer assessment is also beneficial to the teachers as it reduces the strain of the repetitive grading process thus opening more resources for teaching and the development of course materials, especially in large courses with hundreds of students. Sometimes, peer assessment is the only viable option, as in MOOCs (Massive Open Online Courses). Peer assessment has long been studied at all education levels and is gaining traction in recent years, where MOOCs and even the Covid pandemic, which instigated the development of digital competencies, had a positive influence. However, peer assessment is still a demanding process to carry out, but one that can be assisted by modern Internet technology. This paper has a twofold contribution: we present the peer assessment module of our open-source automated programming assessment system Edgar which has been heavily used and developed for the last six years, while peer assessment has been used for the previous three years. Additionally, we present a methodology used and a two-year case study of peer assessment of open-ended assignments in the undergraduate Databases course where 500+ students per season had to provide an entity-relationship model for a given domain and assess their peers' submissions. We discuss our grading methodology, provide in-depth data analysis, and present the students' opinions of the process acquired through an anonymous questionnaire. We find that the process is both demanding in terms of the design of assignments and assessment questionnaires and rewarding in the assessment phase, where the students’ grades turned out to be of high quality.
... Responses were collected using a 1 (completely disagree) to 8 (completely agree) Likert-type format. The eight-item scale was intended to provide a high degree of granularity while removing options for the selection of a neutral, noncommittal position (Kulas & Stachowski, 2013;Simms et al., 2019). The following instructions were provided for the set: ...
Article
Full-text available
Whereas existing data verify the importance of support networks in facilitating resilience following trauma, the sociocultural perceptions of posttrauma difficulties that provide context for these interactions remain largely unexplored. Folk psychiatry models propose that lay explanations of mental illness can be quantified along distinct moralizing, medicalizing, and psychologizing dimensions. The current project aimed to develop a trauma-specific measure capturing lay explanations of posttraumatic stress disorder (PTSD) based on this framework. Data were collected from three samples of Mechanical Turk respondents (N1 = 367; N2 = 365; N3 = 401) as well as an independent sample of university students (N4 = 311). Factor analysis of the final, 13-item Folk Psychiatry Measure–PTSD (FPM-P) indicated close fit of a correlated three-factor model in MTurk and student respondents. Across samples, moralizing beliefs about PTSD (e.g., people with PTSD lack a moral compass) evidenced moderate-to-strong correlations with general attitudes toward those with mental illness, including positive associations with authoritarianism, social restrictiveness, blame, anger, and perceived dangerousness. Negative associations with benevolence and support for community-based care were also noted. Medicalizing beliefs (e.g., PTSD is caused by a chemical imbalance) demonstrated more modest associations with negative attitudes, as noted through weak correlations with increased authoritarianism, anger, and lower benevolence toward those experiencing psychological difficulties. Finally, psychologizing explanations (e.g., people with poor relationships and low social support are at greater risk of developing PTSD) evidenced weak but positive associations with benevolence and pity for those with mental health concerns. Implications and cultural-based nuances of the scale are discussed.
... These individuals, who had expertise in psychometric research design and scale development, were asked to evaluate each item using a scale Based on expert reviews, the Likert scale was changed from a five-point scale to a 4-point scale [85] (i.e., 1= No Confidence at All to 4= High Confidence). Although the structure of response sets is a perennial debate among survey researchers, evidence suggests that excluding the midpoint (i.e., neutral choice) is appropriate when exploring topics (e.g., perceptions of self-efficacy) for which respondents will or should hold an opinion [86,87] . Eliminating non-neutral options is also useful for mitigating low motivation, as respondents must give careful thought to each question. ...
Article
Full-text available
Despite growing demand by employers across industries to hire employees with soft skills, a soft skill mismatch exists between skill sets that companies are seeking and those possessed by top talent. Thus, collegiate academic programs must equip students with both discipline-based knowledge and the necessary soft skills to meet industry needs and achieve success in their careers. This study investigates instructors' role in facilitating college students' soft skill development using two theoretical foundations. The five-scale survey instrument was refined through expert review and then disseminated at a U.S. university. The 410 respondents were students from all class levels and a range of disciplines. We used nonparametric bootstrap analysis to examine underlying mechanism of the relationship between instructional effectiveness and students' soft skill development, through the mediating effects of career self-efficacy and career motivation, as well as the moderating role of students' attitude toward higher education. Results and findings are discussed.
... However, there are situations where the researchers can have less biased results on applying the middle point on the scale. On vague topics, a neutral opinion can be desired (Johns, 2005); a neutral point can improve instrumental confidence when measuring psychological characteristics (Adelson and McCoach, 2010); some measures can be taken to decrease the erroneous use of the neutral point by improving the clarity of the questionnaire items (Kulas and Stachowski, 2013). ...
Article
Full-text available
Each software context has its specificities. One specfic type of context is the multi-touch, where two or more touches are recognized by the software at the same time. Still in this context, User eXperience (UX) and Usability are relevant criteria related to multi-touch systems quality, and evaluation technologies are used to assess this quality. But which are these technologies? This is the central question that the Systematic Mapping Study (SMS) seeks to answer. Besides this question, another 18 sub-questions were addressed to find out more about the peculiarities of the technologies and how they can affect the state of the art of evaluating multi-touch systems. This SMS returned 622 papers, which were analyzed using two filters. Finally, 65 papers had their data extracted through the 18 sub-questions. These extractions raised information such as the software used, data collection methods, and aspects evaluated by the technologies. Through these data, we noticed an absence of technologies explicitly built for the multi-touch context. Other gaps were also perceived, such as the need for technologies that jointly assess quantitative and qualitative data; and technologies that focus on jointly evaluating Usability and UX. Perceiving the lack of synthesized content and characteristics about the questionnaires, we also performed a benchmark over the questionnaires identified in the SMS to serve as a future guide when applicators must choose the better technology for their context.
... In a study by Nadler et al. (2015), a neutral response option was interpreted 16 different ways including: no opinion (15% of participants), don't care (14%), or unsure (13%). Furthermore, middle response categories can act as a proxy for not applicable responses (Kulas et al., 2008;Kulas & Stachowski, 2009) and can lead to personality scale attenuation (i.e., when a measure includes too few options to reflect a respondent's actual behavior, opinion, or belief; Kulas & Stachowski, 2013). Midpoint endorsement habitude has also been shown not to be an artifact of measurement but is trait-like, possessing qualities of temporal stability and criterion-related validity (Hernández et al., 2004;Sun et al., 2019). ...
Article
Full-text available
Introduction: The purpose of this study was to compare the structural validity of Holland’s circular/hexagonal model in two versions of a commonly used Realistic, Investigative, Artistic, Social, Enterprising, Conventional (RIASEC) occupational interest measure – one that used items with a neutral response option (unsure) and one that used items without a neutral response option. Method: These comparisons were made using a sample of 1,025 undergraduate university business majors. A two-group Cosine Function Model (CFM) implemented in standard structural equation modeling software was used to investigate the circumplex fit across the two versions of the assessment. Results: CFM analyses suggested a high level of equivalence across versions such that (1) the correlation matrix of each group shows a good fit to the RIASEC circumplex, and (2) the two correlation matrices are essentially identical. Conclusion: The neutral response option (or lack thereof) did not seem to affect model fit. The generalizability of these results should be explored in future studies.
... Also, it has been argued that midpoints are not "dumping grounds" and instead the phenomenon can be attributed to a lack of question clarity by the research team. [69] Therefore, researchers should carefully consider the clarity of their Likert questionnaire, which can be accomplished through pilot testing. [38] It is also suggested to include as few items as possible within the survey, as a large amount of items is associated with lower response rates. ...
Article
Full-text available
The use of the Delphi technique is prevalent across health sciences research, and it is used to identify priorities, reach consensus on issues of importance and establish clinical guidelines. Thus, as a form of expert opinion research, it can address fundamental questions present in healthcare. However, there is little guidance on how to conduct them, resulting in heterogenous Delphi studies and methodological confusion. Therefore, the purpose of this review is to introduce the use of the Delphi method, assess the application of the Delphi technique within health sciences research, discuss areas of methodological uncertainty and propose recommendations. Advantages of the use of Delphi include anonymity, controlled feedback, flexibility for the choice of statistical analysis, and the ability to gather participants from geographically diverse areas. Areas of methodological uncertainty worthy of further discussion broadly include experts and data management. For experts, the definition and number of participants remain issues of contention, while there are ongoing difficulties with expert selection and retention. For data management, there are issues with data collection, defining consensus and methods of data analysis, such as percent agreement, central tendency, measures of dispersion, and inferential statistics. Overall, the use of Delphi addresses important issues present in health sciences research, but methodological issues remain. It is likely that the aggregation of future Delphi studies will eventually pave the way for more comprehensive reporting guidelines and subsequent methodological clarity.
... Several studies demonstrate the advantage of online surveys such as reducing the time, cost, and mistakes of the data collection process on the nationwide scale while maintaining the anonymity of the respondents [69]. The five-Likert scale is utilized to allow the respondents for expressing a neutral opinion [41] and increase reliability [1]. The collected data is tested by Cronbach's alpha to verify its internal consistency with the result 0.7649, which is an acceptable value of reliability as it is between 0.70-0.90, ...
Article
Full-text available
This study aims to understand the pivotal factors of housing satisfaction and mobility according to the demographic characteristics with its hindrances in Indonesia. Several studies prove the residents refuse to move despite experiencing housing dissatisfaction by adjusting the housing or adapting to the housing mostly because of the poor financial capacity or have realistic housing preferences to cope with the experienced housing dissatisfaction. This study employs a quantitative research method by collecting 534 respondents through an online questionnaire. According to the regression analysis, this study finds sex, age, monthly income, and marital status are the major demographic characteristics for driving housing satisfaction and mobility in the Indonesian context. In both sex categories, the increasing age tends to increase the monthly income and enter marriage, which enables the respondents to deliver housing mobility.
... Logically, then there is a neutral midpoint between both ends. However, the neutral midpoint (i.e., neither approve or disapprove) can create ambiguity in interpreting what is meant when this option is selected (Garland, 1991;Guy & Norvell, 1977;Krosnick, 2002;Krosnick & Presser, 2009;Kulas & Stachowski, 2013;Nadler, Weston, & Voyles, 2015;Schaeffer & Dykema, 2020;Shulruf et al., 2008;Weems & Onwuegbuzie, 2001;Weijters, Cabooter, & Schillewaert, 2010). Despite the face value of a neutral midpoint, this option may not help identify respondents' true perspectives. ...
Chapter
Full-text available
The development of knowledge depends, in part, on obtaining theoretically structured self-report data. Since the introduction of the balanced, 5-point, agreement rating scale and its summated scoring logic (Likert, 1932), researchers have often adopted this summated rating approach because of its simplicity. The aim of this chapter is to alert researchers to challenges in designing a response scale and provide guidance as to how the response options can best be designed. Because respondents assume that the response options have been intelligently selected to frame normal attitudes, their responses are guided by those options (Schwarz, 1999). Thus, it is important that survey researchers consider the pros and cons of how they structure the response option system. In this chapter, we address a number of common issues in the design of rating scales under three major categories:  Scale Length  Option wording  Statistical analysis
... These two self-efficacy scales were constructed based on the guidelines of Cohen, Manion, and Morrison (2011, p. 402-403) and the instrument development process described by Gentry et al. (2014). Participants were asked to self-assess their selfefficacy on a 4-point Likert scale to avoid middle category endorsement (Kulas and Stachowski 2013). The final part of included the following open-ended question about the teachers' perceived role in high-tech ILEs: 'How do you perceive your own role in the context of a Fablab/Makerspace?' The respondents were invited to describe in their own words how they perceive their role, for example, by giving examples of what they would * calculated as the M to F ratio (number of males divided by number of females) ** in the Swedish sample there were no female engineering teacher *** because teachers could indicate multiple subjects, the total is not equal to the sum of the subjects do in such an environment, or by highlighting how they would support the students or collaborate with staff at the ILE. ...
Article
Full-text available
Informal learning environments (ILEs) like Fablabs and Makerspaces have potential to facilitate development of STEM skills. However, these environments might be difficult for teachers to adopt in their teaching because of teaching approaches grounded in constructionism where the role of the teacher changes from a transmissive instructor to an active co-creator, and using high-tech equipment not normally found in schools. Purpose The aim is to investigate teachers’ self-efficacy and perceived role when teaching STEM in Fablabs and Makerspaces. This is investigated related to teaching in ILEs and using high-tech equipment. The study was conducted in two countries/regions, Flanders (Belgium) and Sweden We also compare differences between teachers depending on nationality, gender, and years of teaching experience. Sample A total of 347 secondary school teachers completed an online survey. Quantitative analyses were used for all questions in the survey, except one open-ended question, which was analyzed through inductive thematic coding. Results The teachers reported moderate self-efficacy for teaching in ILEs, and low self-efficacy for using high-tech equipment. Some teachers described themselves as having active roles as a coach or as co-learner during visits with their students. Others saw themselves as having a passive role. Many teachers did not know what kind of role to take. The teachers who perceived an active role as a teacher in high-tech ILEs reported higher self-efficacy to teach in these environments than other teachers. Conclusions This study shows that a constructionist approach to teaching is important if teachers are to develop self-efficacy to teach in high-tech ILEs. Thus, developing teacher practices in line with constructionism in relation to teaching in high-tech ILEsis imperative, in teacher education. The results also highlight that staff in Fablabs and Makerspaces are important for handling high-tech equipment. Hence, collaboration between staff in ILEs and teachers is of importance.
... Nevertheless, the use of 5-point Likert scale may promote social desirability bias [90], as respondents may use the midpoint to avoid selecting socially undesirable options [91]. In this study, we tried to avoid this issue by clearly explaining the survey items as suggested by Kulas et al. [92]. Studies demonstrate the reliability of 5-point Likert scale as compared to other approaches [93][94][95]. ...
Article
Full-text available
Background Almost 80% of the population in sub-Saharan Africa relies on traditional biomass for cooking, which is typically associated with negative environmental, health, economic, and social impacts. Thus, many stakeholders, including development agencies and national governments in the Global South are promoting the use of the improved cookstove in order to save cooking time, save financial assets, maximize fuel efficiency, and reduce indoor air pollution. However, little attention is paid to the heating practices among households, which can determine food safety levels. Specifically, cooked food should be kept at temperatures above the danger zone (from 5 to 57 °C) prior to its consumption to prevent its contamination by bacteria and other unhealthy contaminants. In general, many studies address food preparation and storage separately, despite being complementary. In this study, we attempt to understand whether, the use of improved cookstove combined with heat retention box would result in improvements with regard to fuel and time saving, and adequate food storage temperatures. Furthermore, we examine the acceptability of food prepared with these two systems based on consumers’ preference analysis. Involving 122 participants, the study was conducted in Gurué district, central Mozambique. Results The use of improved cookstove resulted in energy savings of 9% and 17% for cooking maize porridge and beans curry, respectively. The overall time consumption for cooking decreased by 14% (beans curry) and 24% (maize porridge). The use of heat retention boxes shows a better heat retention ability as compared to the locally used heat retention systems (leftovers, banana leaves). Conclusions The study concludes that improved cookstove is a sustainable mean for saving cooking time and fuel. Heat retention box has a potential to maintain adequate food storage temperatures. Both improved cookstove and heat retention box present a superior performance compared to traditional technologies; thus, can easily be diffused for not affecting the quality of food.
... and projects sometimes distract me from previous ones" ]). Rather than using the 5-point Likert scale for response choices, a 6-point scale was created to prevent the neutral option from being interpreted ambiguously due to social desirability bias (Garland, 1991) or situationally specific item-response patterns (Kulas & Stachowski, 2013). The six response options for each item ranged from 全く当てはまらない (mattaku atehamaranai, "Not like me at all" ) to とても 当てはまる (totemo atehamaru, "Very much like me" ). ...
Article
Full-text available
In this paper, we review the concept of grit, operationalized by Duckworth, Peterson, Matthews, and Kelly (2007), and discuss an initial correlational investigation of how well grit predicted performance on two tasks, vocabulary learning (n = 21) and extensive reading (ER) (n = 58), that were thought to require Japanese university EFL students to demonstrate grit over a long period of time. A modified version of Duckworth et al.'s (2007) original 12-item Grit Scale was administered in Japanese and examined using Rasch analysis (1960), followed by a correlational analysis with the dependent variables of summed vocabulary quiz scores (over one semester) and words read through extensive reading (over one year). Both results were statistically insignificant, with a moderate effect size for the relationship between grit and weekly vocabulary quiz scores, and a weak effect size between grit and the amount of extensive reading. 抄 録 本研究では、grit(継続的な意欲)という概念をもとに、grit と語彙学習および多読と の相関関係分析を実施した。日本人学習者にとって、長期の学習期間における grit は必 要であり、grit がいかにパフォーマンスの良し悪しを予測する指標となるかを考察する。 Duckworth ら(2007)の 12 項目の質問紙を日本語版に修正して実施し、 そのデータをラッ シュモデル(1960)により検証した。1学期間に実施した語彙テストの点数(n = 21)と1 年間の多読で読まれた単語数(n = 58)との相関分析を行った結果、grit と語彙学習には中 程度の相関が認められたが、grit と多読との相関は弱かった。
... The results of the mean plot above show that the odd option scale is higher than the even option scale on the variance of the validity of the environmental personality (big-five Personality) item. The results of research conducted (Kulas & Stachowski, 2013) that the odd-numbered scale will show an advantage, psychometrically, over the even-numbered scale. This result is generally supported, because the correlation between alpha and criterion validity generally shows an advantage for odd scales relative to even scales. ...
Article
Full-text available
Validity is one of the important characteristics in any psychological measuring tool. Measurement tools in his research more generally include a variety of observations and usually include responses that aim to regulate and limit the choices available to respondents and assessments. There are many studies that have assessed how the number of response options on a scale affects validity and reliability, but fewer have discussed whether the midpoint should be included as a response option, or whether the scale is even. Regardless of where the response scale is in psychological measurements, the environmental personality (big-five personality) scale has been expanded and elaborated in various ways in previous research since the introduction of dreams. The big-five personality model is the most extensive model for measuring environmental personality. This study aims to determine whether there is a difference between the number of odd and even option scales on the validity variance of students' environmental personality (big-five personality) items, and to see which scale of the number of options is better or suitable for the environmental personality (big-five personality) instrument. The calculation data were analyzed using the one-way ANOVA method. The results of this study indicate that the scale of the number of odd options (five options) and the scale of the number of even options differ significantly in the variance of the validity of the environmental personality items (big-five Personality). The mean plot results show that the odd option scale higher than the even scale. This research can be carried out for further research in using the odd scale on the environmental personality measurement tool (Big-five Personality) with the expansion being studied.
... The longer one does research, the more one understands that a lot depends on how a construct is specifically measured. Although there is now increasing attention to the field of psychometrics, such as the effect of the response format (e.g., Ackerman, Donnellan ,Roberts, & Fraley, 2016;Rammstedt & Krebs, 2007), and the response scale options (e.g.,Dalal, Carter, & Lake, 2014;Kulas & Stachowski, 2013) on study results, this line of research is still underdevelopedas, for instance, demonstrated by Chapter 4.Chapter 4 described and investigated the too little/too much (TLTM) scale as an innovation in rating scale methodology. Different than the traditional Likert scale ranging from 1 (totally disagree) to 5 (totally agree), the TLTM response format ranges between -4 (much too little), 0 (the right amount) and +4 (much too much). ...
... Is the interval between the neutral midpoint (Neither Disagree nor Agree) and Agree the same as the interval between Agree and Strongly Agree (e.g., Norman, 2010)? What does the neutral midpoint actually mean -somewhere between agreement and disagreement or "it depends" (Kulas & Stachowski, 2009, 2013Kulas et al., 2008)? Given the ambiguity of agreement rating scales, we suggest using the extent of agreement rating scale or frequency scale unless initial scale validation indicated otherwise. ...
Chapter
It seems paradoxical, but much research into creativity is done in a noncreative, standardized way. Survey scales are one of the most common ways of assessing creativity, particularly in social psychology, social science and management research. In this chapter, we will discuss the method’s background, how it has been used to measure creativity, the advantages and disadvantages, and the implications for researchers. In particular, we will share some data which combines the most commonly used creativity survey scales and discuss overlaps, reliability and validity.
... As for the optimal number and wording of response options, Rasch analysis of category functioning revealed that patients were able to correctly discern among the four response options (strongly agree, agree, disagree, and strongly disagree) of the original version [3,5,17,18,22]. Conversely, the central category 'Neither Agree nor Disagree' did not accurately reflect a halfway point along the agreement-disagreement continuum, but rather expressed indecision, indifference, or nonattitude, as observed in many other scales [24,25]. The presence of this central category renders the consecutive integer scoring of the Likert scale problematic and even inappropriate. ...
Article
Patient's satisfaction with device is an important clinical outcome in prosthetics and orthotics. The Client Satisfaction with Device (CSD) - one of the five modules of the Orthotics and Prosthetics Users' Survey (OPUS) - has been defined as the only outcome measure specifically developed to measure user satisfaction with a prosthesis or an orthosis. The aim of this study was to provide a comprehensive review of the psychometric properties of the CSD, summarizing the present evidence on this measure, and verifying if the scoring system is consistent in the literature. A systematic literature search was conducted utilizing PRISMA guidelines. Articles were searched in PubMed and Scopus databases using search terms relating to the psychometric properties of the CSD. Thirteen articles assessing the psychometric properties of the CSD met the inclusion criteria for this review. The CSD has been translated and validated in several languages. However, these versions are not consistent across the studies since they include different number of items, with different number of response options, and scoring systems. The CSD - where used in its eight-item version, rated with a four-point rating scale - can be judged as a tool with acceptable psychometric properties for assessing satisfaction with devices in prosthesis and orthosis users. This CSD version seems the best one for optimizing coverage and psychometric quality with the fewest number of items. Further studies are warranted to assess the degree of suitability of this scale in specific populations of users of prostheses or orthoses and to analyze its psychometric properties in further cultural contexts.
... They may search for evidence for recall and then increment their JOL from 0 upwards as evidence accumulates, with lower JOLs signifying less evidence; here, intermediate ratings indicate intermediate odds of recall, much like a meteorologist uses a "50% chance of rain" to express an intermediate objective probability. Or, subjects may instead use extreme JOLs (e.g., 10%, 90%) to convey high confidence about future performance and intermediate values (e.g., 50%) to convey low confidence; here, intermediate ratings indicate uncertainty, much like a student uses a "50% chance of passing this course" to express reservation about the outcome (for evidence of this "middle response" reporting style in questionnaires, see DuBois & Burns, 1975;Kulas & Stachowski, 2013;Presser & Schuman, 1980). To adjudicate between these possibilities, Dunlosky et al. (2005) had subjects rate their confidence after each JOL. ...
Article
Full-text available
Psychology faces a measurement crisis, and mind-wandering research is not immune. The present study explored the construct validity of probed mind-wandering reports (i.e., reports of task-unrelated thought [TUT]) with a combined experimental and individual-differences approach. We examined laboratory data from over 1000 undergraduates at two U.S. institutions, who responded to one of four different thought-probe types across two cognitive tasks. We asked a fundamental measurement question: Do different probe types yield different results, either in terms of average reports (average TUT rates, TUT-report confidence ratings), or in terms of TUT-report associations , such as TUT rate or confidence stability across tasks, or between TUT reports and other consciousness-related constructs (retrospective mind-wandering ratings, executive-control performance, and broad questionnaire trait assessments of distractibility–restlessness and positive-constructive daydreaming)? Our primary analyses compared probes that asked subjects to report on different dimensions of experience: TUT-content probes asked about what they’d been mind-wandering about, TUT-intentionality probes asked about why they were mind-wandering, and TUT-depth probes asked about the extent (on a rating scale) of their mind-wandering. Our secondary analyses compared thought-content probes that did versus didn’t offer an option to report performance-evaluative thoughts. Our findings provide some “good news”—that some mind-wandering findings are robust across probing methods—and some “bad news”—that some findings are not robust across methods and that some commonly used probing methods may not tell us what we think they do. Our results lead us to provisionally recommend content-report probes rather than intentionality- or depth-report probes for most mind-wandering research.
... Third, the even-numbered rating format of the employability scale that excludes a midpoint and the use of differential anchors for its subscales need to be acknowledged as possible limitations. The advantages and disadvantages of even-numbered scales that lack a midpoint have been the topic of an active debate in the psychometric literature (e.g., Kulas & Stachowski, 2013;Nadler, Weston, & Voyles, 2015). Recent studies have shown no advantage of odd-numbered Likert scales (such as the commonly used 5-and 7-point ones) over matched even-numbered (such as 6-point) scales (e.g., Simms, Zelazny, Williams, & Bernstein, 2019). ...
Article
Full-text available
Are there career benefits to leaders and followers agreeing about the quality of their leader‐member exchange (LMX) relationship? Is LMX disagreement always detrimental for a follower's career? Can the examination of LMX agreement as a substantive variable help us cast new light on some of the inconclusive findings of past research on LMX and career outcomes? These questions motivate our research. Using theories of social exchange and sponsorship, and responses from 967 leader–follower dyads of Information and Communication Technology (ICT) professionals in seven European countries, we examined the role of LMX agreement on subjective and objective career outcomes. After conducting polynomial regression combined with response surface analysis, we found that both follower‐rated and leader‐rated employability were higher when the leader agreed with the follower at a high level of LMX (versus a low level of LMX). In case of disagreement, strong support was found for leader‐rated employability being higher when the leader's perceptions of LMX exceeded those of their follower. Furthermore, follower‐rated employability was found to mediate the relationship between LMX (dis)agreement and perceived career success, promotions, salary, and bonuses. Support was also found for the mediating role of leader‐rated employability in the case of perceived career success, promotions, and salary but not for bonuses. Our findings highlight the importance of LMX (dis)agreement for career outcomes and further point to the possibility of employability offering an alternative explanation for the mixed findings of past LMX‐career research. This article is protected by copyright. All rights reserved
... For example, it would be plausible to think that a respondent who only "disagrees a little" with the item "Jews have created the virus to collapse the economy for financial gain" might also "agree a little" with the extreme belief, but he or she would only be able to select one option. Sutton & Douglas add further imprecision with their use of the notoriously ambiguous midpoint response of "neither agree nor disagree", known to be selected for many different reasons by respondents (Kulas & Stachowski, 2013). If we were to finesse our scale, we would consider adding a "Don"t know" response option, although this too is not without complications since there is a decision to make about how to treat such responses in analyses. ...
Article
Full-text available
Do letters about conspiracy belief studies greatly exaggerate? A reply to Sutton & Douglas - Daniel Freeman, Felicity Waite, Laina Rosebrock, Ariane Petit, Emily Bold, Sophie Mulhall, Lydia Carr, Ashley-Louise Teale, Lucy Jenner, Anna East, Chiara Causier, Jessica Bird, Sinéad Lambe
... and projects sometimes distract me from previous ones" ]). Rather than using the 5-point Likert scale for response choices, a 6-point scale was created to prevent the neutral option from being interpreted ambiguously due to social desirability bias (Garland, 1991) or situationally specific item-response patterns (Kulas & Stachowski, 2013). The six response options for each item ranged from 全く当てはまらない (mattaku atehamaranai, "Not like me at all" ) to とても 当てはまる (totemo atehamaru, "Very much like me" ). ...
Article
Full-text available
In this paper, we review this concept of grit, operationalized by Duckworth, Peterson, Matthews, and Kelly (2007), and discuss an initial correlational investigation of how well grit predicted performance on two tasks, vocabulary learning (n = 21) and extensive reading (ER) (n = 58), that were thought to require Japanese university EFL students to demonstrate grit over a long period of time. A modified version of Duckworth et al.'s (2007) original 12-item Grit Scale was administered in Japanese and examined using Rasch analysis (1960), followed by a correlational analysis with the dependent variables of summed vocabulary quiz scores (over one semester) and words read through extensive reading (over one year). Both results were statistically insignificant, with a moderate effect size for the relationship between grit and weekly vocabulary quiz scores, and a weak effect size between grit and the amount of extensive reading.
... Third, the even-numbered rating format of the employability scale that excludes a midpoint and the use of differential anchors for its subscales need to be acknowledged as possible limitations. The advantages and disadvantages of even-numbered scales that lack a midpoint have been the topic of an active debate in the psychometric literature (e.g., Kulas & Stachowski, 2013;Nadler, Weston, & Voyles, 2015). Recent studies have shown no advantage of odd-numbered Likert scales (such as the commonly used 5-and 7-point ones) over matched even-numbered (such as 6-point) scales (e.g., Simms, Zelazny, Williams, & Bernstein, 2019). ...
... Esto último ha sido comprobado especialmente con la incorporación de una categoría intermedia en ítems dicotómicos (Morales, 2006). Los detractores de una respuesta intermedia consideran que su inclusión perjudica las propiedades psicométricas del instrumento, dado que su elección podría estar influenciada por la deseabilidad social (Johns, 2005), ciertas características del enunciado del ítem como ambigüedad y descontextualización (Kulas & Stachowski, 2013) y rasgos de personalidad (Murray, Booth, & Molenaar, 2016). ...
Article
Full-text available
p>Se comparan las propiedades psicométricas observadas en un test de Confianza para la Matemática al utilizar formatos de respuesta Likert con y sin categoría central. El instrumento mide un conjunto de creencias del estudiante sobre sus dificultades para responder a las habilidades que demanda la matemática. En el estudio participaron 939 estudiantes de psicología (81% mujeres), los cuales completaron el instrumento con 2 formatos Likert de: a) 5 opciones con categoría intermedia Ni de acuerdo ni en desacuerdo y b) 6 opciones con dos categorías centrales (Más bien en desacuerdo y Más bien de acuerdo). La variación de la escala Likert no afectó sustancialmente las evidencias de validez basadas en la estructura interna (análisis factorial confirmatorio y ajuste al Modelo de Crédito Parcial) ni la relación con otras variables. La función de eficiencia relativa reveló que se obtiene similar información para todos los niveles del rasgo.</p
... The independent variable willingness to share ($WtS) is estimated from survey participant ratings on an eight-point, bipolar semantic scale, labeled at each anchor point: 1=Extremely Unwilling,2=Very Unwilling,3=Unwilling,4=Somewhat Unwilling,5=Somewhat Willing,6=Willing,7=Very Willing,and 8=Extremely Willing. This scale omits the midpoint, such as "Indifferent" or "Unsure," which can produce scale attenuation when responses are prone to cluster, and which can indicate vague or ambiguous contexts rather than a respondent's attitude (Kulas and Stachowski 2013). ...
Article
Personal data is increasingly collected and used by companies to tailor services to users, and to make financial, employment, and health-related decisions about individuals. When personal data is inappropriately collected or misused, however, individuals may experience violations of their privacy. Historically, government regulators have relied on the concept of risk in energy, aviation and medicine, among other domains, to determine the extent to which products and services may harm the public. To address privacy concerns in government-controlled information technology, government agencies are advocating to adapt similar risk management frameworks to privacy. Despite the recent shift toward a risk-managed approach for privacy, to our knowledge, there are no empirical methods to determine which personal data are most at-risk and which contextual factors increase or decrease that risk. To this end, we introduce an empirical framework in this article that consists of factorial vignette surveys that can be used to measure the effect of different factors and their levels on privacy risk. We report a series of experiments to measure perceived privacy risk using the proposed framework, which are based on expressed preferences, and which we define as an individual's willingness to share their personal data with others given the likelihood of a potential privacy harm. These experiments control for one or more of the six factors affecting an individual's willingness to share their information: data type, computer type, data purpose, privacy harm, harm likelihood, and individual demographic factors, such as age range, gender, education level, ethnicity, and household income. To measure likelihood, we introduce and evaluate a new likelihood scale based on construal level theory in psychology. The scale frames individual attitudes about risk likelihood based on social and physical distance to the privacy harm. The findings include predictions about the extent to which the above factors correspond to risk acceptance, including that perceived risk is lower for induced disclosure harms when compared to surveillance and insecurity harms as defined in Solove's Taxonomy of Privacy. We also found that participants are more willing to share their information when they perceive the benefits of sharing. In addition, we found that likelihood was not a multiplicative factor in computing privacy risk perception, which challenges conventional theories of privacy risk in the privacy and security community.
Article
Full-text available
Purpose Existing measures of rumination assess ruminative thought without reference to the content of ruminations. The present studies describe the construction and validation of the Rumination Domains Questionnaire, a new measure of rumination which considers the domain specificity of ruminative thought. Methods A theoretical definition of rumination and domains of life were formulated through a literature review. Items were based on these domains, clinical/counselling case studies, and expert feedback. In Study 1, 106 preliminary items were reduced to 60 items through empirical analyses. In Study 2, the content and structural validity were assessed. Results Items were retained based on empirical criteria and the final scale demonstrated acceptable fit for both a 10-factor model and a hierarchical model. Content validity and criterion validity were supported, and both 10-factor and hierarchical models demonstrated acceptable fit. Conclusions Overall, we present strong evidence supporting the validity of the RDQ.
Article
Full-text available
We describe three studies that together provide a first approximation to a comprehensive taxonomy of unique personality facets. In Study 1, we semantically sorted, removed synonyms, and factor analysed 1772 personality items taken from seven major omnibus personality inventories and four narrow inventories. Study 1 identified 61 base facets. In Study 2, we conducted a systematic review of the literature to identify facets missing from the 61 base facets. We identified 16 novel facets. We then created standardised, open access items for the 77 facets. In Study 3, we administered the items to a novel sample ( N = 1096) and assessed the psychometric properties of the facets. The ultimate result was 70 personality facet scales that are open access, psychometrically robust, unidimensional, and discriminant. We call this inventory the Facet-level Multidimensional Assessment of Personality or Facet MAP, version 1. The Facet MAP contains scales equivalent to almost all scales present in major personality inventories, and in most cases, many more as well. As the Facet MAP develops, we hope it will eventually provide a comprehensive taxonomy of personality facets, which will prove useful in reducing construct proliferation and facilitating numerous avenues of important personality research. The Facet MAP items and user manual can be found at: facetmap.org.
Article
Full-text available
Millions of poultry are farmed intensively every year across the United Kingdom (UK) to produce both meat and eggs. There are inevitable situations that require birds to be emergency killed on farm to alleviate pain and suffering. In Europe and the UK, emergency methods are regulated by the European Council Regulation (EC) No. 1099/2009 and The Welfare of Animals at the Time of Killing Regulations (England 2015; Scotland 2012; Wales and Northern Ireland 2014). Cervical dislocation has been reported to be the most widely used method prior to these legislative changes which took place from 1 January 2013. Based on limited scientific evidence and concern for bird welfare, these legislative changes incorporated restrictions based on bird weight for both manual (≤3 kg) and mechanical (≤5 kg) cervical dislocation, and introduced an upper limit in the number of applications for manual cervical dislocation (up to 70 birds per person per day). Furthermore, it removed methods which showed evidence of crushing injury to the neck. However, since legal reform new scientific evidence surrounding the welfare consequences of cervical dislocation and the development of novel methods for killing poultry in small numbers on farm have become available. Whether the UK poultry industry have adopted these novel methods, and whether legislative reform resulted in a change in the use of cervical dislocation in the UK remains unknown. Responses from 215 respondents working across the UK poultry industry were obtained. Despite legal reform, manual cervical dislocation remains the most prevalent method used across the UK for killing poultry on farm (used by 100% of farms) and remains the preferred method amongst respondents (81.9%). The use of alternative methods such as Livetec Nex® and captive bolt guns were available to less than half of individuals and were not frequently employed for broilers and laying hens. Our data suggests there is a lack of a clear alternative to manual cervical dislocation for individuals working with larger species and a lack of gold standard methodology. This risks bird welfare at killing and contributes to inconsistency across the industry. We suggest providing stakeholders with practical alternatives prior to imposing legislative changes and effective knowledge transfer between the scientific community and stakeholders to promote positive change and protect bird welfare.
Article
Empirical studies of tourist socio-cultural aversions and their influence on tourist consumption are limited. A socio-cultural aversion describes the avoidance associated with an ingrained dislike for, and distancing from something representative of a specific social or cultural group's identity, and can be implicit or explicit, aggressive or passive. This study explores socio-cultural aversions in the context of Indigenous tourism. The study reveals that while xenophobia and racism can predict the attitudes of both domestic and international tourists towards Indigenous tourism, this does not necessarily result in a non-willingness to participate. However, regarding self-congruity bias, the less participants relate to or identify with Indigenous tourism, the less likely they are willing to participate. This has implications for the appeal and marketing of tourism products, especially those underpinned by socio-cultural components such as Indigenous tourism. The study proposes marketing and product development solutions for destination marketers and tourism operators seeking to enhance appeal for Indigenous tourism experiences.
Article
Full-text available
Historically, the “ ? ” response category (i.e., the question mark response category) has been criticized because of the ambiguity of its interpretation. Previous empirical studies of the appropriateness of the “ ? ” response category have generally used methods that cannot disentangle the response style from target psychological traits and have also exclusively focused on Western samples. To further develop our understanding of the “ ? ” response category, we examined the differing use of the “ ? ” response category in the Job Descriptive Index (JDI) between U.S. and Korean samples by using the recently proposed item response tree (IRTree) models. Our research showed that the Korean group more strongly prefers the “ ? ” response category, while the U.S. group more strongly prefers the directional response category (i.e., Yes). In addition, the Korean group tended to interpret the “ ? ” response category as mild agreement, while the U.S. group tended to interpret it as mild disagreement. Our study adds to the scientific body of knowledge on the “ ? ” response category in a cross‐cultural context. We hope that our findings presented herein provide valuable insights for researchers and practitioners who want to better understand the “ ? ” response category and develop various psychological assessments in cross‐cultural settings.
Article
Trucking occupies the largest Indonesian transportation market share, making this sector a crucial contributor to the Indonesian supply chain. Any problem in freight transportation impacts the entire supply chain. This study explores how trucking companies were affected by the COVID-19 pandemic and what factors contributed to their resilience. A total of 190 Indonesian trucking companies were involved in this research. This study demonstrates that trucking companies’ performance was significantly affected by both the COVID-19 pandemic and companies' resilience. Their resilience was affected by various factors, including the adoption of digital technologies, strength of financial resources, risk and business continuity management, and relationships with customers. Even though financial resource management directly affected company performance, the effect was more significant when company resilience was a mediating factor between financial resources management and company performance. Hence, survival frameworks and managerial action should emphasize these factors to enhance company resilience and performance. However, successful application of this in the trucking industry still requires further exploration.
Preprint
Full-text available
This study provides a comprehensive assessment of the associations of personality and intelligence. It presents a meta-analysis (N = 162,636, k = 272) of domain, facet, and item-level correlations between personality and intelligence (general, fluid, and crystallized) for the major Big Five and HEXACO hierarchical frameworks of personality: NEO PI-R, Big Five Aspect Scales (BFAS), BFI-2, and HEXACO PI R. It provides the first meta-analysis of personality and intelligence to comprehensively examine (a) facet-level correlations for these hierarchical frameworks of personality, (b) item-level correlations, (c) domain- and facet-level predictive models. Age and sex differences in personality and intelligence, and study-level moderators, are also examined. The study was complemented by four of our own unpublished datasets (N = 26,813) which were used to assess the ability of item-level models to provide generalizable prediction. Results showed that openness (ρ = .20) and neuroticism (ρ = -.09) were the strongest Big Five correlates of intelligence and that openness correlated more with crystallized than fluid intelligence. At the facet-level, traits related to intellectual engagement and unconventionality were more strongly related to intelligence than other openness facets, and sociability and orderliness were negatively correlated with intelligence. Facets of gregariousness and excitement seeking had stronger negative correlations, and openness to aesthetics, feelings, and values had stronger positive correlations with crystallized than fluid intelligence. Facets explained more than twice the variance of domains. Overall, the results provide the most nuanced and robust evidence to date of the relationship between personality and intelligence.
Article
Full-text available
This study provides a comprehensive assessment of the associations of personality and intelligence. It presents a meta-analysis (N = 162,636, k = 272) of domain, facet, and item-level correlations between personality and intelligence (general, fluid, and crystallized) for the major Big Five and HEXACO hierarchical frameworks of personality: NEO Personality Inventory–Revised, Big Five Aspect Scales, Big Five Inventory–2, and HEXACO Personality Inventory–Revised. It provides the first meta-analysis of personality and intelligence to comprehensively examine (a) facet-level correlations for these hierarchical frameworks of personality, (b) item-level correlations, (c) domain and facet-level predictive models. Age and sex differences in personality and intelligence, and study-level moderators, are also examined. The study was complemented by four of our own unpublished data sets (N = 26,813) which were used to assess the ability of item-level models to provide generalizable prediction. Results showed that openness (ρ =.20) and neuroticism (ρ = −.09) were the strongest Big Five correlates of intelligence and that openness correlated more with crystallized than fluid intelligence. At the facet level, traits related to intellectual engagement and unconventionality were more strongly related to intelligence than other openness facets, and sociability and orderliness were negatively correlated with intelligence. Facets of gregariousness and excitement seeking had stronger negative correlations, and openness to aesthetics, feelings, and values had stronger positive correlations with crystallized than fluid intelligence. Facets explained more than twice the variance of domains. Overall, the results provide the most nuanced and robust evidence to date of the relationship between personality and intelligence.
Article
Full-text available
This study aims to understand the determining factors of housing satisfaction of the Indonesian adolescent, who are potential homebuyers. It investigates the factors of housing satisfaction in multi-stages, such as the socio-demographic attributes and housing attributes. It assists in unveiling the distinctive and prominent housing attributes of the residents according to their determining socio-demographic attributes for housing satisfaction in a quantitative method. From the collected 534 respondents, age and monthly income are the pivotal socio-demographic factors of housing satisfaction. Location and neighborhood are the housing norms with constant factorial attribute to housing satisfaction across the age and monthly income groups. While space and expenditure vary in both groups. These findings also provide a general understanding of the important physical and social features of each housing norm to meet the housing satisfaction of the residents. It is useful for the city authority, planners, and architects as a reference to formulate an aptly regulation, program, planning, and design of housing provision for a certain social group.
Article
This study sought to analyse different rating scales used in surveys including two ordinal scales with numerals, an ordinal scale with face images, and a ratio scale. First, statistical analyses were performed such as mean comparison, skewness, and kurtosis. Subsequently the scales’ critical points, such as the extreme levels and middle points, were analysed. Finally, scale preferences were compared according to gender, age, and education. The study was carried out with 595 people, and the results showed a tendency for scales with fewer response items to obtain higher values. Furthermore, there was a higher incidence of responses at extreme levels, mainly at higher levels. It was possible to identify differences between the scale preferences within groups, and it was observed that the easier scale does not necessarily have more rapport with respondent’s feelings. Anyone clicking on this link before November 11, 2021 will be taken directly to the final version of your article on ScienceDirect, which they are welcome to read or download. No sign up, registration or fees are required: https://authors.elsevier.com/c/1do20Xj-jYFp8
Article
Objective: To examine the psychometric properties of the Fear-Avoidance Beliefs Questionnaire (FABQ) and its two subscales, in subjects with chronic low back pain (LBP). Design: Methodological research based on a cross-sectional observational study. Methods: A convenience sample of 155 Italian subjects with chronic LBP (57% men; mean age: 43±11 years; mean pain duration: 23±32 months) completed the FABQ. Rasch analysis was used to investigate dimensionality of the entire scale and key psychometric properties of its two subscales. Results: The FABQ-Physical Activity (FABQ-PA) and FABQ-Work (FABQ-W) subscales showed two distinct unidimensional structures. Their 7-option rating categories were malfunctioning, but after collapsing problematic categories and omitting the central one ("Unsure") the new 4 categories (completely disagree; disagree; agree; completely agree) functioned as intended. After that and accommodation of local response dependency between two items in a testlet solution, each of the two subscales presented acceptable fit to the Rasch model (just one FABQ-W items was slightly underfitting). Person separation reliability was acceptable but not high (0.69 for FABQ-PA, and 0.79 for FABQ-W). Conclusions: FABQ-PA and FABQ-W have adequate unidimensionality. A simplification of the response options of both subscales is strongly recommended to improve the technical quality of the scale. The reliability indexes suggest FABQ-PA and FABQ-W can be used for group judgements about level of fear-avoidance beliefs, but not for clinical decision-making in individuals. The selection of their items is acceptable, although - if future studies corroborate our results - there is room for some refinements to improve the general measurement quality. Clinical rehabilitation impact: Fear-avoidance beliefs are associated with reduction of physical activity, and development of disability and deconditioning. This study examined the measurement properties of the two FABQ subscales, showing their essential unidimensionality, recommending the simplification of the rating categories, and discussing strengths and weaknesses of item selection. Our results extend the evidence for FABQ as a satisfactory (but improvable) measure of fear-avoidance beliefs in chronic LBP.
Article
Full-text available
Self-efficacy encompasses the professional and personal language goals of learners as their progress depends upon a strong motivation to put practical language skills to use when the real world requires it. Intercultural communication and effectiveness are of interest to the professional and personal language goals of learners as their progress depends upon a strong motivation to put practical language skills to use when the real world requires it. Studying or working abroad and engaging in intercultural training are two such contexts that bind research in learner characteristics between applied linguistics and positive psychology as they provide a substrate of concrete interactions, transformative experiences characterized by opportunities for changes in self-concept, negotiations with values and authenticity, and forms of interpersonal development underwritten by intercultural communication as an ability. A tool to capture this domain-specific intercultural communication was previously developed with sojourner educational professionals for use among English speaking populations. However, the original study lacked confirmatory analyses of internal and external validity that would clarify model identification and applicability for research that deals with intercultural communication competence across populations with diverse sample characteristics. A total of 876 teachers (M age = 37.48, SD = 10.81) and 266 university students (M age = 19.48, SD = 0.74) in Japan responded to items from the SEIC instrument. Acceptable model fit was supported for the eight-item short form. Metric invariance was observed for individuals from a sample of sojourning English language teachers similar to the original validation and a nationwide survey of Japanese teachers of English, offering indications of cross-cultural validity. Degrees of equivalence were also found for the Japanese items as extending fitness for use to students from two universities in Japan. Concurrent validity was supported for SEIC measured by the scale with intercultural effectiveness competencies and speaking and listening self-efficacy constructs used in classroom contexts. Together, this study offers a tool of valid indicators for researchers and practitioners who aim to observe self-efficacy in positive education, intercultural training, or international programs that intersect with language learning and intercultural communication.
Article
Full-text available
One of the important research tool is questionnaire. Decision makers and researchers across all academic and industry sectors conduct surveys and questionnaires to uncover answers to specific, significant questions. In fact, questionnaires and surveys can be an effective tools for data collection required for research and evaluation. In order to develop a survey/questionnaire, first the researcher should decide how to collect the required data. In this regard, scaling is the branch of measurement that involves the construction of an instrument. One of the most widely used scaling method is attitude scales to measure instruments and Likert scale is applied as one of the most fundamental and frequently used psychometric tools in sociology, psychology, information system, politics, economy and many more research. However, research methodology research have not particularly suggested the best rating scale to be chosen for a research. This study is going to provide an overview of the Likert scale and comparing rating scales of different lengths. Results will make researchers able to make decision on what number of Likert scale points use for their survey and questionnaire. Taken as a whole this study suggests using of seven-point rating scale and if there is a need to have respondent to be directed on one side, then six-point scale might be the most suitable.
Article
Full-text available
When job satisfaction is measured in national panel surveys using a rating scale that consists of many response categories the psychometric quality of the data obtained is often reduced. One reason lies in an inappropriate category use (e.g., in terms of response styles or ignoring superfluous categories), which occurs when respondents are faced with an overwhelmingly large number of response options. The use of response styles can also be triggered by stable respondent characteristics. The objective of the present between-subject experimental study is to explore the impact of rating scale length on the occurrence of inappropriate category use and scale reliability. In addition, this study investigates which stable respondent characteristics and job-related factors consistently predict the use of a particular response style across all experimental conditions. A sample of MTurk workers (N = 7042) filled out a 12-item online questionnaire on aspects of job satisfaction, with a 4-, 6-, or 11-point rating scale randomly assigned. Considering the three-dimensional structure of the job satisfaction measure, we applied a multidimensional extension of the restricted mixed generalized partial credit model to explore category use patterns within each condition. The results show a similar configuration of three response-style classes in all conditions. Nevertheless, the proportion of respondents who used the rating scale inappropriately was lower in the conditions with fewer response categories. An exception was the extreme response style, which showed a similar prevalence rate in all conditions. Furthermore, we found that the use of extreme response style can be explained by a high level of general self-efficacy and perceived job autonomy, regardless of rating scale length. The findings of the study demonstrate that the prevalence of inappropriate category use can be reduced by administering rating scales with six or four response categories instead of eleven. These findings may be extended to other domains of life satisfaction.
Article
Full-text available
According to many seasoned survey researchers, offering a no-opinion option should reduce the pressure to give substantive responses felt by respondents who have no true opinions. By contrast, the survey satisficing perspective suggests that no-opinion options may discourage some respondents from doing the cognitive work necessary to report the true opinions they do have. We address these arguments using data from nine experiments carried out in three household surveys. Attraction to no-opinion options was found to be greatest among respondents lowest in cognitive skills (as measured by educational attainment), among respondents answering secretly instead of orally, for questions asked later in a survey, and among respondents who devoted little effort to the reporting process. The quality of attitude reports obtained (as measured by over-time consistency and responsiveness to a question manipulation) was not compromised by the omission of no-opinion options. These results suggest that inclusion of no-opinion options in attitude measures may not enhance data quality and instead may preclude measurement of some meaningful opinions.
Article
Full-text available
Evaluation researchers frequently obtain self-reports of behaviors, asking program participants to report on process and outcome-relevant behaviors. Unfortunately, reporting on one’s behavior poses a difficult cognitive task, and participants’ reports can be profoundly influenced by question wording, format, and context. We review the steps involved in answering a question about one’s behavior and highlight the underlying cognitive and communicative processes. We alert researchers to what can go wrong and provide theoretically grounded recommendations for pilot testing and questionnaire construction.
Article
Full-text available
Five split-ballot experiments, plus replications, were carried out in several national surveys to compare the effects of offering or omitting a middle alternative in forced-choice attitude questions. Explicitly offering a middle position significantly increases the size of that category, but tends not to otherwise affect univariate distributions. The relation of intensity to the middle position is somewhat greater on Offered forms than on Omitted forms (less intense respondents being more affected by question form than those who feel more strongly), but in general form does not alter the relationship between an item and a number of other respondent characteristics. Finally, in one instance there is evidence that form can change the conclusion about whether two attitude items are related, but the results are of uncertain reliability.
Article
Full-text available
Reports an error in the original article by Jennifer D. Campbell et al ( Journal of Personality and Social Psychology, 1996[Jan], Vol 70[1], 141–156). On page 145, item 10 in Table 1 contains a typographical error. The item should read: "Even if I wanted to, I don't think I could tell someone what I'm really like." (The following abstract of this article originally appeared as record 1996-01707-011). Self-concept clarity (SCC) references a structural aspect of the self-concept: the extent to which self-beliefs are clearly and confidently defined, internally consistent, and stable. This article reports the SCC Scale and examines (a) its correlations with self-esteem (SE), the Big Five dimensions, and self-focused attention (Study 1); (b) its criterion validity (Study 2); and (c) its cultural boundaries (Study 3). Low SCC was independently associated with high Neuroticism, low SE, low Conscientiousness, low Agreeableness, chronic self-analysis, low internal state awareness, and a ruminative form of self-focused attention. The SCC Scale predicted unique variance in 2 external criteria: the stability and consistency of self-descriptions. Consistent with theory on Eastern and Western self-construals, Japanese participants exhibited lower levels of SCC and lower correlations… (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
We begin this article with the assumption that attitudes are best understood as structures in long-term memory, and we look at the implications of this view for the response process in attitude surveys. More specifically, we assert that an answer to an attitude question is the product of a four-stage process. Respondents first interpret the attitude question, determining what attitude the question is about. They then retrieve relevant beliefs and feelings. Next, they apply these beliefs and feelings in rendering the appropriate judgment. Finally, they use this judgment to select a response. All four of the component processes can be affected by prior items. The prior items can provide a framework for interpreting later questions and can also make some responses appear to be redundant with earlier answers. The prior items can prime some beliefs, making them more accessible to the retrieval process. The prior items can suggest a norm or standard of comparison for making the judgment. Finally, the prior items can create consistency pressures or pressures to appear moderate. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Self-concept clarity (SCC) references a structural aspect oftbe self-concept: the extent to which self- beliefs are clearly and confidently defined, internally consistent, and stable. This article reports the SCC Scale and examines (a) its correlations with self-esteem (SE), the Big Five dimensions, and self-focused attention (Study l ); (b) its criterion validity (Study 2); and (c) its cultural boundaries (Study 3 ). Low SCC was independently associated with high Neuroticism, low SE, low Conscien- tiousness, low Agreeableness, chronic self-analysis, low internal state awareness, and a ruminative form of self-focused attention. The SCC Scale predicted unique variance in 2 external criteria: the stability and consistency of self-descriptions. Consistent with theory on Eastern and Western self- construals, Japanese participants exhibited lower levels of SCC and lower correlations between SCC and SE than did Canadian participants.
Article
Full-text available
This article examines how the exclusion of a neutral or fence-sitting option changes an expressed attitude or preference judgment. Over a series of six studies, we find that the exclusion of a neutral response option (1) affects the judgment of extreme options (strong positive and negative features) more significantly than the judgment of options that are average on all features, (2) results in respondents favoring the option superior on the more important attribute, and (3) results in more risk aversion. We also provide evidence for the underlying process and show that our findings are moderated by individual differences on need for cognition and tolerance for ambiguity. Copyright 2002 by the University of Chicago.
Article
Full-text available
Using a self-administered questionnaire, 149 respondents rated service elements associated with a recently visited store or restaurant on scales that differed only in the number of response categories (ranging from 2 to 11) and on a 101-point scale presented in a different format. On several indices of reliability, validity, and discriminating power, the two-point, three-point, and four-point scales performed relatively poorly, and indices were significantly higher for scales with more response categories, up to about 7. Internal consistency did not differ significantly between scales, but test-retest reliability tended to decrease for scales with more than 10 response categories. Respondent preferences were highest for the 10-point scale, closely followed by the seven-point and nine-point scales. Implications for research and practice are discussed.
Article
Full-text available
The effects of faking on criterion-related validity and the quality of selection decisions are examined in the present study by combining the control of an experiment with the realism of an applicant setting. Participants completed an achievement motivation measure in either a control group or an incentive group and then completed a performance task. With respect to validity, greater prediction error was found in the incentive condition among those with scores at the high end of the predictor distribution. When selection ratios were small, those in the incentive condition were more likely to be selected and had lower mean performance than those in the control group. Implications for using personality assessments from select-in and select-out strategies are discussed.
Article
Full-text available
A new social desirability scale was constructed and correlated with MMPI scales. Comparison was made with correlations of the Edwards Social Desirability scale. The new scale correlated highly with MMPI scales and supported the definition of social desirability. Ss need to respond in "culturally sanctioned ways."
Article
Full-text available
Researchers have studied whether there are classes of people who differ systematically in the way they respond to polytomous ordered scales with a middle category such as ?. The mixed-partial credit model was fitted to a number of scales of a personality questionnaire. Most of the scales fit better with the use of 2 latent subpopulations. The most consistent difference among the latent classes was related to the functioning of the middle response category. For most of the examinees, the probability of choosing the middle category was very close to zero, but a nonnegligible percentage of people selected this category with much higher probability. The total scores from the 2 subpopulations were incommensurate. Some personality factors contributed to explaining class membership.
Article
Evaluation researchers frequently obtain self-reports of behaviors, asking program participants to report on process and outcome-relevant behaviors. Unfortunately, reporting on one’s behavior poses a difficult cognitive task, and participants’ reports can be profoundly influenced by question wording, format, and context. We review the steps involved in answering a question about one’s behavior and highlight the underlying cognitive and communicative processes. We alert researchers to what can go wrong and provide theoretically grounded recommendations for pilot testing and questionnaire construction.
Article
If respondents to a Thurstone-type attitude scale can make a middlemost response, in addition to agreement or disagreement, what concept underlies such responses when they are made? Ambivalence, neutrality, and uncertainty are three processes that can determine choice of the middlemost response. Definitions reflecting one of these processes or an innocuous control definition were presented to subjects as appropriate for a middlemost response on each of two attitude scales. The definitions presented differentially affected use of the middlemost response on one of the two scales. On that scale, an ambivalence definition yielded the greatest use of the middlemost response and differed from an uncertainty definition, which yielded the least use.
Article
Although typically scored as indicating moderate or neutral trait standing, personality assessment respondents endorse the Likert-scale middle response category for a variety of reasons. Through the application of a cognitive processing model and an item characteristic orientation, middle category endorsements were found to exhibit a relatively high response latency, an “it depends” connotation, and a strong, negative relationship with item clarity. These general associations stress the importance of retaining unambiguous items for trait identification but also offer a tool to the personality assessment researcher – investigating the number of elicited middle category endorsements to identify trait indicators in possible need of contextualization.
Article
Although most scaling formats include an intermediate or neutral response category, little research has been devoted to the analysis of the meaning respondents attach to this category. Results obtained from ten different scales, across two types of item formats (Likert and Polar Choice) support the traditional method of scoring the"?" answer. Although the meaning respondents imply when selecting the "?" is not more ambiguous than the meaning implied in the selection of the other response categories, there does exist evidence for the presence of a variety of uses of the "?" including response styles, ambivalence and indifference. Various suggestions are made for further research and alternate methods of approach to the meaning of the question mark response category.
Article
Results of two studies indicate that the way in which Likert scale data are scored can make a difference when statistical significance tests are used. The studies raise a number of questions about the use of Likert scales in communication research. (GT)
Article
One of the most common activities of psychologists and other researchers is to construct Likert scales and then proceed to analyze them as if the numbers constituted an equal interval scale. There are several alternatives to this procedure (Thurstone & Chave, 1929; Muthen, 1983) that make normality assumptions but which do not assume that the answer categories as used by subjects constitute an equal interval scale. In this paper a new alternative is proposed that uses additive conjoint measurement. It is assumed that subjects can report their attitudes towards stimuli in the appropriate rank order. Neither within-subject nor between-subject distributional assumptions are made. Nevertheless, interval level stimulus values, as well as response category boundaries, are extracted by the procedure. This approach is applied to three sets of attitude data. In these three cases, the equal interval assumption is clearly wrong. Despite this, arithmetic means seem to closely reflect group attitudes towards the stimuli. In one data set, the normality assumption of Thurstone and Chave (1929) and Muthen (1983) is supported, and in the two others it is supported with reservations.
Article
discuss cognitive interviews in the laboratory and give examples of the wide range of ways in which we have implemented them / discuss respondent debriefing as a method for taking the laboratory out into the field / present concluding remarks about the ways in which the 2 methods complement each other (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Social desirability is used in reference (1) to scale values of personality statements and (2) to the tendency of subjects to attribute to themselves statements which are desirable and reject those which are undesirable. Topics developed are (1) Comparability of social desirability scale values ( S D S V) derived from different groups of judges. (2) Probability of endorsement and S D S V. (3) The S D Scale. (4) Scale value items in M M P I. (5) S D and faking. (6) Forced-choice inventory. (7) S D in Q-technique. (8) Implications for personality assessment and research. Extensive bibliography. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
A primary goal of scale development is to create a valid measure of an underlying construct. We discuss theoretical principles, practical issues, and pragmatic decisions to help developers maximize the construct validity of scales and subscales. First, it is essential to begin with a clear conceptualization of the target construct. Moreover, the content of the initial item pool should be overinclusive and item wording needs careful attention. Next, the item pool should be tested, along with variables that assess closely related constructs, on a heterogeneous sample representing the entire range of the target population. Finally, in selecting scale items, the goal is unidimensionality rather than internal consistency; this means that virtually all interitem correlations should be moderate in magnitude. Factor analysis can play a crucial role in ensuring the unidimensionality and discriminant validity of scales. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This paper proposes that when optimally answering a survey question would require substantial cognitive effort, some repondents simply provide a satisfactory answer instead. This behaviour, called satisficing, can take the form of either (1) incomplete or biased information retrieval and/or information integration, or (2) no information retrieval or integration at all. Satisficing may lead respondents to employ a variety of response strategies, including choosing the first response alternative that seems to constitute a reasonable answer, agreeing with an assertion made by a question, endorsing the status quo instead of endorsing social change, failing to differentiate among a set of diverse objects in ratings, saying ‘don't know’ instead of reporting an opinion, and randomly choosing among the response alternatives offered. This paper specifies a wide range of factors that are likely to encourage satisficing, and reviews relevant evidence evaluating these speculations. Many useful directions for future research are suggested.
Article
Two studies examined whether the middle response option in graphic rating scales indicates a moderate standing on a trait/item, or rather a “dumping ground” for unsure or non-applicable (N/A) responses. Study One identified middle response-option dysfunction. Study Two indicated that respondents use the middle response option as an N/A proxy, even under implicit ‘skip if you do not know’ instructional sets. Although middle response category ‘misuse’ did not adversely affect reliability and validity in these studies, it is recommended that assessment developers (especially in on-line administration contexts) regularly include an N/A response option when administering graphic rating scales.
Article
The major purpose of this paper is to propose a comprehensive model describing the effects of response sets within the theory framework of the stages of responding to questionnaires, and taking into account the effects of collectivist and individualist attributes within cross-cultural contexts. The introduction of this model aims to provide a construct that may help minimize biases in questionnaire-based research as well as providing new directions for theoretical and empirical research in the field of response sets.
Article
Although don't know replies are often observed in survey research in sizable frequencies, relatively little systematic attention has been focused on understanding the nature of such responses across multiple topic areas. Decisions about the meaning of don't know responses have important ramifications in designing research methodology, conducting data analysis, and interpreting statistical findings. This article 1) outlines consequences of these decisions in survey research analysis, 2) investigates demographic and involvement correlates of don't know responses, and 3) examines the noncontent nature of don't know responses measured in terms of within-subject tendencies to give such responses across topic areas.
Article
A procedure for estimating the reliability of sets of ratings, test scores, or other measures is described and illustrated. This procedure, based upon analysis of variance, may be applied both in the special case where a complete set of ratings from each ofk sources is available for each ofn subjects, and in the general case wherek 1,k 2, ...,k n ratings are available for each of then subjects. It may be used to obtain either a unique estimate or a confidence interval for the reliability of either the component ratings or their averages. The relations of this procedure to others intended to serve the same purpose are considered algebraically and illustrated numerically.
Article
Variable construction requires careful attention to substantive issues; a theory guiding its development, a hierarchy of illustrative items constructed to define the variable, the subsequent production of item difficulties and person measures, and the analysis of fit. Rasch measurement practitioners should give careful attention to these matters so practical suggestions are given for designing variables based on theory, item construction, and Rasch models for the analysis of data. Variable maps are emphasized to guide variable construction and interpret the results.
Article
For the first time in decades, conventional wisdom about survey methodology is being challenged on many fronts. The insights gained can not only help psychologists do their research better but also provide useful insights into the basics of social interaction and cognition. This chapter reviews some of the many recent advances in the literature, including the following: New findings challenge a long-standing prejudice against studies with low response rates; innovative techniques for pretesting questionnaires offer opportunities for improving measurement validity; surprising effects of the verbal labels put on rating scale points have been identified, suggesting optimal approaches to scale labeling; respondents interpret questions on the basis of the norms of everyday conversation, so violations of those conventions introduce error; some measurement error thought to have been attributable to social desirability response bias now appears to be due to other factors instead, thus encouraging different approaches to fixing such problems; and a new theory of satisficing in questionnaire responding offers parsimonious explanations for a range of response patterns long recognized by psychologists and survey researchers but previously not well understood.
Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) Professional Manual A new scale of social desirability independent of psychopathology
  • P T Costa
  • R R Mccrae
Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI). Professional Manual. Odessa, FL: Psychological Assessment Resources. Crowne, D. P., & Marlowe, D. (1960). A new scale of social desirability independent of psychopathology. Journal of Consulting Psychology, 24, 349–354. http:// dx.doi.org/10.1037/h0047358.
Strong Interest Inventory manual: Research, development, and strategies for interpretation
  • D A C Donnay
  • M L Morris
  • N A Schaubhut
  • R C Thompson
Donnay, D. A. C., Morris, M. L., Schaubhut, N. A., & Thompson, R. C. (2005). Strong Interest Inventory manual: Research, development, and strategies for interpretation. Mountain View, CA: CPP Inc..
E-prime user's guide. Pittsburgh: Psychology Software Tools Asking questions about behavior: Cognition, communication and questionnaire construction
  • W Schneider
  • A Eschman
  • A Zuccolotto
Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-prime user's guide. Pittsburgh: Psychology Software Tools, Inc.. Schwarz, N., & Oyserman, D. (2001). Asking questions about behavior: Cognition, communication and questionnaire construction. American Journal of Evaluation, 22, 127–160.
Psychology of reasoning: Structure and content Wonderlic personnel test & scholastic level exam user's manual Wonderlic personnel test normative report
  • D P Warwick
  • L A Lininger
Warwick, D. P., & Lininger, L. A. (1975). Psychology of reasoning: Structure and content. Cambridge, MA: Harvard University Press. Wonderlic (2002). Wonderlic personnel test & scholastic level exam user's manual. Libertyville, IL: Wonderlic, Inc. Wonderlic (2007). Wonderlic personnel test normative report. Libertyville, IL: Wonderlic, Inc.