Article

Information Transmission in the Survey Interview: Number of Response Categories and the Reliability of Attitude Measurement

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

This paper examines the relation between the number of response categories used to measure attitudes in survey interviews and the reliability of such attitude measurements. I review and criticize the hypothesis that reliability increases with the "information carrying capacity" of a response scale. I also review the literature on the relationship between the number of scale points and measurement reliability. This leads to a set of predictions regarding the relationship between the number of scale points and reliability of measurement, which I then examine using results obtained from three-wave panel studies conducted by the General Social Survey and the National Election Study. Reliability estimates were obtained via several procedures (LISREL, EQS, and LISCOMP) employing a variety of statistical-estimation approaches: maximum likelihood (Jöreskog 1979), generalized least squares based on Browne's (1984) asymptotically distribution-free (ADF) approach, and estimation based on categorical variable methods (CVM) (Jöreskog 1990; Muthen 1984). With one important exception, reliability is generally higher for attitudes measured using more response categories. Reliability is relatively higher when attitudes are assessed using two-category rather than three-category response scales, but evidence consistently supports the view that for four or more category scales, reliability increases with the number of response categories, but at a decreasing rate. I also examine the hypothesis that reliability can be enhanced by combining three-category response forms with other types of questions to measure the direction and intensity of attitudes, i.e., via unfolding methods. Support for this hypothesis is lacking, but more research is necessary before firm conclusions can be drawn.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Specifically, the biases induced by using various amounts of ordinal data points to calculate means, covariance, correlations, and reliability coefficients were derived by Krieg (1999), and he concluded that the more points the better, with a continuous scale being the optimal choice. Furthermore, researchers hold a wide variety of views on how to determine the appropriate number of response categories for Likert-type scales to use in measurement (Alwin, 1992;Cox, 1980;McKelvie, 1978;Preston & Colman, 2000;Viswanathan, Bergen, Dutta, & Childers, 1996). Alwin (1992) argued that scales with more response categories are more reliable and more valid. ...
... Furthermore, researchers hold a wide variety of views on how to determine the appropriate number of response categories for Likert-type scales to use in measurement (Alwin, 1992;Cox, 1980;McKelvie, 1978;Preston & Colman, 2000;Viswanathan, Bergen, Dutta, & Childers, 1996). Alwin (1992) argued that scales with more response categories are more reliable and more valid. Using only a few response categories restricts respondents' ability to precisely convey how they feel (Viswanathan et al., 1996). ...
... The VAS-RRP proposed in this study offers a fifth approach for overcoming the difficulties researchers encounter. In addition to the convenience of freeing researchers/ practitioners from being concerned with the issues of the optimal number of points (categories) on the Likert-type scale (Alwin, 1992;Cox, 1980;McKelvie, 1978;Preston & Colman, 2000), the VAS-RRP's finer-grained measurements improved the psychometrical properties of Likert-type scales, and the Cronbach's alpha, parameter recovery, and the composite reliability values were all substantially enhanced. These findings provide more converging evidence for previous claims (e.g., Babakus et al., 1987;Krieg, 1999) that coarsegrained and ordinal data, such as that produced by Likert-type scales, were more prone to measurement errors and reduced reliability. ...
Article
Full-text available
Traditionally, the visual analogue scale (VAS) has been proposed to overcome the limitations of ordinal measures from Likert-type scales. However, the function of VASs to overcome the limitations of response styles to Likert-type scales has not yet been addressed. Previous research using ranking and paired comparisons to compensate for the response styles of Likert-type scales has suffered from limitations, such as that the total score of ipsative measures is a constant that cannot be analyzed by means of many common statistical techniques. In this study we propose a new scale, called the Visual Analogue Scale for Rating, Ranking, and Paired-Comparison (VAS-RRP), which can be used to collect rating, ranking, and paired-comparison data simultaneously, while avoiding the limitations of each of these data collection methods. The characteristics, use, and analytic method of VAS-RRPs, as well as how they overcome the disadvantages of Likert-type scales, ranking, and VASs, are discussed. On the basis of analyses of simulated and empirical data, this study showed that VAS-RRPs improved reliability, response style bias, and parameter recovery. Finally, we have also designed a VAS-RRP Generator for researchers’ construction and administration of their own VAS-RRPs.
... Sudman and Bradburn (1974) were pioneers in their effort to quantify the "response effects" of various question forms. More recently, several efforts have been made to specify an empirical criteria of data quality-for example, using the MTMM approach to reliability and validity assessment, or the use of longitudinal methods of reliability assessment (see Alwin 1992Alwin , 2007Alwin and Krosnick 1991;Andrews 1984;Saris and Andrews 1991;Saris and Gallhofer 2007;Saris and van Meurs 1990;Scherpenzeel 1995;Scherpenzeel and Saris 1997). ...
... The concept of reliability has been applied to survey measurement previously (e.g., Alwin 1989Alwin , 1992Alwin , 2007Alwin , 2010Alwin and Krosnick 1991;Marquis and Marquis 1977), and it has proved useful as a measure of data quality (Biemer et al. 1991;Groves 1989); however, there is a reluctance on the part of many survey methods experts to evaluate questions in terms of their reliability (e.g., see Krosnick and Presser 2010;Schaeffer and Dykema 2011). In general, the psychometric approach defines the observed score as a function of a true score and an error score-that is, as y = t + e, where E(e) = E(te) = 0 and E(t) = E(y). ...
... Estimation of reliability from reinterview designs makes sense only if we can rule out memory as a factor in the covariance of measures over time, and thus the occasions of measurement must be separated by sufficient periods of time to rule out the operation of memory. In cases where the remeasurement interval is insufficiently large to permit appropriate estimation of the reliability of the data, the estimate of the amount of reliability will most likely be inflated (see Alwin 1989Alwin , 1992Alwin and Krosnick 1991), and the results of these studies suggest that longer remeasurement intervals, such as those used here, are highly desirable. ...
Article
Writings on the optimal length for survey questions are characterized by a variety of perspectives and very little empirical evidence. Where evidence exists, support seems to favor lengthy questions in some cases and shorter ones in others. However, on the basis of theories of the survey response process, the use of an excessive number of words may get in the way of the respondent’s comprehension of the information requested, and because of the cognitive burden of longer questions, there may be increased measurement errors. Results are reported from a study of reliability estimates for 426 (exactly replicated) survey questions in face-to-face interviews in six large-scale panel surveys conducted by the University of Michigan’s Survey Research Center. The findings suggest that, at least with respect to some types of survey questions, there are declining levels of reliability for questions with greater numbers of words and provide further support for the advice given to survey researchers that questions should be as short as possible, within constraints defined by survey objectives. Findings reinforce conclusions of previous studies that verbiage in survey questions—either in the question text or in the introduction to the question—has negative consequences for the quality of measurement, thus supporting the KISS principle (“keep it simple, stupid”) concerning simplicity and parsimony.
... Some scholars argue that increasing the number of scale points allows respondents' ability to make better discrimination, in contrast some scholars put forward the idea that increasing the number of alternatives would exceed respondent's ability to discriminate scale points (Jacoby and Mattell, 1971). However, Alwin (1992) emphasis the importance of the aim of researcher as the main determinant of the optimum number of response categories. According to Alwin (1992) when aim is to measure the direction of attitude 2-points would be enough however if the aim is to measure the concentration of the attitude more points are required. ...
... However, Alwin (1992) emphasis the importance of the aim of researcher as the main determinant of the optimum number of response categories. According to Alwin (1992) when aim is to measure the direction of attitude 2-points would be enough however if the aim is to measure the concentration of the attitude more points are required. As the length of a scale increases, it becomes naturally more sensitive, but at a certain point, adding to the answer categories can lead to confusion in the respondents, as the difference between the two similar categories may not be clear (Kieruj and Moors, 2013). ...
... That is, there is no decrement to reliability estimates due to the presence of specific variance in the error term; here, specific variance is contained in the true score. This approach is possible using modern structural equation modeling (SEM) methods for longitudinal data and is discussed further in related literature (see Alwin 1989Alwin , 1992Alwin , 2007Alwin and Krosnick 1991;Saris and Andrews 1991). ...
... This model is frequently considered to be unnecessarily restrictive because it involves a strong set of assumptions compared to the Wiley-Wiley model. However, it is often the case that it provides a more realistic fit to the data (see Alwin and Thornton 1984;Alwin 1989Alwin , 1992Alwin , 2007Alwin and Krosnick 1991). 6 ...
Article
This article reports an investigation of errors of measurement in self-reports of financial data in the Health and Retirement Study (HRS), one of the major social science data resources available to those who study the demography and economics of aging. Results indicate significantly lower levels of reporting reliability of the composite variables in the HRS relative to those found for "summary" income approaches used in other surveys. Levels of reliability vary by type of income source-reports of monthly benefit levels from sources such as Social Security or the Veterans Administration achieve near-perfect levels of reliability, whereas somewhat less regular sources of household income that vary across time in their amounts are measured less reliably. One major area of concern resulting from this research, which may be beneficial to users of the HRS surveys, involves the use of imputation in the handling of missing data. We found that imputation of values for top-end open income brackets can produce a substantial number of outliers that affect sample estimates of relationships and levels of reliability. Imputed income values in the HRS should be used with great care.
... The extent to which the results were due to labeling versus branching is unclear. Using a three-wave panel, Alwin (1992) estimated that the reliability of a composite measure of party identification was only slightly greater than the reliability of the valence component and modestly greater than the intensity component. The reliability of the composite was substantially greater than the reliability of the other 7-point scales in that analysis, but, as in Krosnick & Berent, the items compared differed in extent of labeling (as well as in content). ...
... Based largely on psychophysical studies, the standard advice has been to use five to nine categories (Miller 1956, Cox 1980, although even that number of categories can be difficult to administer in telephone interviews. Both Alwin & Krosnick (1991) and Alwin (1992) found evidence that the reliability of individual rating scales appeared to increase as the number of categories grew, up to approximately seven or nine categories, with the exception that reliability was greater with two than three categories. Their results must be interpreted cautiously, however, because the questions that were compared differed not only in the number of categories, but also in a large variety of other ways. ...
Article
Abstract Survey methodologists have drawn on and contributed to research by cognitive psychologists, conversation analysts, and others to lay a foundation for the science of asking questions. Our discussion of this work is structured around the decisions that must be made for two common types of inquiries: questions about events or behaviors and questions that ask for evaluations or attitudes. The issues we review for behaviors include definitions, reference periods, response dimensions, and response categories. The issues we review for attitudes include bipolar versus unipolar scales, number of categories, category labels, don't know filters, and acquiescence. We also review procedures for question testing and evaluation.
... a five-item Feeling Thermometer (Alwin, 1992;Saguy et al., 2009) was used as the Attitude measure. Participants rated their feelings toward immigrants on five opposite pairs of evaluative dimensions (Warm-Cold, Negative-Positive, Friendly-Unfriendly, Suspicious-Trustworthy, Admiration-Disgust) using a nine-point, vertical scale (e.g., the top, 1 = _______ to 9 = _______). ...
... Participants next completed the five-item Feeling Thermometer (Alwin, 1992;Saguy et al., 2009) as the Attitude measure (M = 4.03, SD = 1.55, α = 0.95). They then completed five additional attitude items that were being pilot tested for a later study. ...
Article
Full-text available
The political divide between liberals and conservatives has become quite large and stable, and there appear to be many reasons for disagreements on a wide range of issues. The current research sought to explain these divides and to extend the Uncertainty-Threat Model to intergroup relations, which predicts that more dispositional, perceived-threat and uncertainty-avoidance will be related to more political conservatism. Given that conservatism is also often related to more negativity to low-status groups such as immigrants, the relationship between political ideology and negative attitudes toward immigrants may be mediated by more threat and uncertainty-avoidance. Study 1 tested this mediational hypothesis in a correlational design and showed that both uncertainty-avoidance and perceived realistic and symbolic threat significantly mediated the relationship between political ideology and attitudes toward immigrants, and that perceived threat was the more influential mediator. Study 2 extended threat management to perceived threats from unspecified outgroups, as opposed to the immigrant outgroup, and it replicated all significant mediations. Study 3 replicated the mediations observed in Studies 1 and 2 for political ideology to attitudes toward immigrants with uncertainty-avoidance and perceived threat from immigrants as mediators; it further replicated the mediations to the negative attitudes measure that had been used in Study 2 and it extended it to an objective and indirect bias measure [i.e., Affect Misattribution Procedure (AMP)]. Overall, almost all of the results supported the idea that perceived threat and uncertainty-avoidance both mediate the relationship between political ideology and attitudes toward immigrants, and that threat management, as opposed to negativity bias, may be a central concern separating liberals and conservatives. Within all three studies, we also observed more evidence for the Uncertainty-Threat Model predictions than we did for the alternative Extremity Hypothesis, which predicted a quadratic relationship between political ideology and threat and uncertainty, and between political ideology and attitudes toward immigrants.
... The branched questions produced stronger test-retest results, and were strongly related to other political attitudes. Similarly, using a branched measure of party loyalty from the ANES, Alwin (1992) found that the reliability of this measure was higher than most other attitudinal items. Krosnick and Berent (1993) showed stronger test-retest results for fully labeled branched questions compared with partially labeled unbranched questions, suggesting higher reliability for the branched questions. ...
... We know that labeling has an impact on responses (Andrews 1984;Alwin and Krosnick 1991), so we cannot be sure it is the branching alone that increases reliability. Alwin (1992) does not compare a branched question with an unbranched one, so when he finds the reliability of responses to the branched question to be high, we cannot compare this with the reliability of an unbranched version. ...
Article
The format of a survey question can affect responses. Branched survey scales are a question format that is increasingly used but little researched. It is unclear whether branched scales work in a different way than unbranched scales. Based on the decomposition principle (Armstrong, Denniston, and Gordon 1975), if breaking a decision task up into component decision parts increases the accuracy of the final decision, one could imagine that breaking an attitudinal item into its component parts would increase the accuracy of the final report. In practice, this is applied by first asking the respondent the direction of their attitude, then using a follow-up question to measure the intensity of the attitude (Krosnick and Berent 1993). A split-ballot experiment was embedded within the Understanding Society Innovation Panel, allowing for a comparison of responses between branched and unbranched versions of the same questions. Reliability and validity of both versions were assessed, along with the time taken to answer the questions in each format. In a total survey costs framework, this allows establishing whether any gains in reliability and validity are outweighed by additional costs incurred because of extended administration times. Findings show evidence of response differences between branched and unbranched scales, particularly a higher rate of extreme responding in the branched format. However, the differences in reliability and validity between the two formats are less clear cut. The branched questions took longer to administer, potentially increasing survey costs.
... We built two sets of models, one per dependent variable: respondents' feeling temperatures towards Greeks and those towards homosexuals. We operationalized prejudice levels towards Greeks and homosexuals with the feeling temperature thermometer, an instrument suggested by Duane Alwin (1992), and widely applied in the prejudice literature to measure attitudes along a response continuum from 'very cold, unfavourable feelings' (0) to 'very warm, favourable feelings' (100). Variable values are responses to the question: 'In a 0-100 scale, where 0 represents the coldest feeling and 100 represents the warmest feeling, how cold/warm do you feel towards Greeks/homosexuals?' ...
Contributors to the education-as-enlightenment approach maintain that education helps to create less prejudicial individuals. This conclusion , emanating mainly from data collected in western democracies that apply multicultural education might not apply to countries where education's primary goal is the establishment of a sense of national unity and belonging. On the one hand, nationalist education could reduce prejudices against groups not targeted by the ethno-nationalist narrative-e.g. through positive comments about them or by not mentioning them at all. On the other hand, education might produce more prejudice towards groups targeted as the hostile Other through a nation-building narrative. We test this argument with a simple random sample of a cellphone public opinion survey collected in Albania in 2015. By framing our analysis inside the intergroup contact theory, we build two sets of models, the first explaining respondents' prejudice levels towards Greeks and the second explaining respondents' prejudice levels towards homosexuals. We found that more education predicted respondents' higher prejudice levels towards Greeks, a group targeted by the Albanian ethnonationalist narrative as the hostile Other, whereas it did not significantly affect prejudices towards homosexuals. ARTICLE HISTORY
... Peabody (1962), Lunney (1970) ve Jacoby ve Matell (1971) yapmış oldukları araştırmalarda 2 ya da 3 katılım düzeyi seçenek sayısının yer aldığı ölçek ifadelerinin, araştırılan konuyla ilgili soyut olguya ilişkin gerekli bilgiyi sağlayabildiğini, daha fazla katılım düzeyi seçenek sayısının olmasının bilgi artışı üzerinde anlamlı bir etkiye sahip olmayacağını belirtmişlerdir. Alwin (1992) ise konuya ilişkin daha detaylı bir ayrım yaparak; eğer amaç tutumun sadece yönelimini ölçmek ise 2 katılım düzeyi seçeneğinin yeterli olduğunu ancak aynı zamanda tutumun yoğunluğu da ölçülmek isteniyorsa ikiden fazla katılım düzeyi seçeneğine ihtiyaç duyulacağını ileri sürmüştür. Revilla ve arkadaşları (2014), 50 bin katılmcı üzerinde yapmış oldukları geniş örneklemli çalışmada, optimal katılım düzeyi seçenek sayısının 5 olduğunu, 5'den fazla katılım düzeyi seçeneğinin olması durumunda verinin kalitesinin düştüğünü ifade etmişlerdir. ...
Article
Full-text available
zet Bu araştırmanın amacı; veri karakteristiğinin (normal dağılım, çarpıklık, basıklık), içsel tutarlılık düzeyinin (Cronbach's Alpha), ölçekler arası korelasyon katsayılarının ve kovaryans matrislerinin yapılarının katılım düzeyi seçenek sayısına duyarlı olup olmadığını sistematik olarak incelemek ve istatistiksel olarak sınamaktır. Araştırma, Eskişehir Osmangazi Üniversitesi İktisadi ve İdari Bilimler Fakültesi öğrencilerinden kolayda örnekleme yöntemiyle elde edilmiş olan üç farklı veri seti kapsamında gerçekleştirilmiştir. Araştırma neticesinde katılım düzeyi seçenek sayısının arttıkça; içsel tutarlılık düzeyinin sistematik olarak arttığı ve 5'li ile 11'li katılım düzeyi seçenek sayısı arasındaki farkın istatistiksel olarak anlamlı olduğu görülmüştür. Ayrıca ölçekler arası korelasyon katsayısının ise sistematik olarak arttığı fakat artışın istatistiksel olarak anlamlı olmadığına işaret eden bulgulara ulaşılmıştır. Değişkenler arası kovaryans matrisleri yapılarının ise katılım düzeyi seçenek sayısına bağlı olarak anlamlı bir şekilde farklılaştığı fakat kovaryans matrisi temelli yapısal eşitlik modelindeki yol katsayılarının katılım düzeyi seçeneği sayısına bağlı olarak anlamlı bir şekilde farklılaşmadığı görülmüştür. Anahtar kelimeler: katılım düzeyi seçenek sayısı, likert, veri karakteristiği, veri kalitesi. Abstract The objective of this study is to systematically analyze and statistically prove whether the internal consistency level of the data characteristics (normal distribution, skewness, and kurtosis) is sensitive to the point scale range of the correlation 1 Bu makale 19. Ulusal Pazarlama Kongresinde sunulmuş olan bildirinin genişletilmiş halidir.
... Peabody (1962), Lunney (1970) ve Jacoby ve Matell (1971) yapmış oldukları araştırmalarda 2 ya da 3 katılım düzeyi seçenek sayısının yer aldığı ölçek ifadelerinin, araştırılan konuyla ilgili soyut olguya ilişkin gerekli bilgiyi sağlayabildiğini, daha fazla katılım düzeyi seçenek sayısının olmasının bilgi artışı üzerinde anlamlı bir etkiye sahip olmayacağını belirtmişlerdir. Alwin (1992) ise konuya ilişkin daha detaylı bir ayrım yaparak; eğer amaç tutumun sadece yönelimini ölçmek ise 2 katılım düzeyi seçeneğinin yeterli olduğunu ancak aynı zamanda tutumun yoğunluğu da ölçülmek isteniyorsa ikiden fazla katılım düzeyi seçeneğine ihtiyaç duyulacağını ileri sürmüştür. Revilla ve arkadaşları (2014), 50 bin katılmcı üzerinde yapmış oldukları geniş örneklemli çalışmada, optimal katılım düzeyi seçenek sayısının 5 olduğunu, 5'den fazla katılım düzeyi seçeneğinin olması durumunda verinin kalitesinin düştüğünü ifade etmişlerdir. ...
Article
Full-text available
Bu araştırmanın amacı; veri karakteristiğinin (normal dağılım, çarpıklık, basıklık), içsel tutarlılık düzeyinin (Cronbach’s Alpha), ölçekler arası korelasyon katsayılarının ve kovaryans matrislerinin yapılarının katılım düzeyi seçenek sayısına duyarlı olup olmadığını sistematik olarak incelemek ve istatistiksel olarak sınamaktır. Araştırma, Eskişehir Osmangazi Üniversitesi İktisadi ve İdari Bilimler Fakültesi öğrencilerinden kolayda örnekleme yöntemiyle elde edilmiş olan üç farklı veri seti kapsamında gerçekleştirilmiştir. Araştırma neticesinde katılım düzeyi seçenek sayısının arttıkça; içsel tutarlılık düzeyinin sistematik olarak arttığı ve 5’li ile 11’li katılım düzeyi seçenek sayısı arasındaki farkın istatistiksel olarak anlamlı olduğu görülmüştür. Ayrıca ölçekler arası korelasyon katsayısının ise sistematik olarak arttığı fakat artışın istatistiksel olarak anlamlı olmadığına işaret eden bulgulara ulaşılmıştır. Değişkenler arası kovaryans matrisleri yapılarının ise katılım düzeyi seçenek sayısına bağlı olarak anlamlı bir şekilde farklılaştığı fakat kovaryans matrisi temelli yapısal eşitlik modelindeki yol katsayılarının katılım düzeyi seçeneği sayısına bağlı olarak anlamlı bir şekilde farklılaşmadığı görülmüştür. The objective of this study is to systematically analyze and statistically prove whether the internal consistency level of the data characteristics (normal distribution, skewness, and kurtosis) is sensitive to the point scale range of the correlation coefficients and covariance matrices. The research was carried out using three different data sets obtained by applying the convenience sampling method on students studying at Eskişehir Osmangazi University, Faculty of Economics and Administrative Sciences. According to the results, the internal consistency level increases when the point scale range increase; and, the difference between 5-point scale and 11-point scale was observed to be statistically significant. Furthermore, the inter-scales correlation coefficient was determined to increase systematically; however, this increase was found not to be statistically significant. It was also observed that the inter-variable covariance matrices change significantly based on the point scale range; yet the path coefficients in covariance-based structural equation modeling do not change significantly.
... A main problem which is under discussion, refers to the methodological issues in the context of similarities/differences between various estimation methods: Maximum likelihood (ML), maximum likelihood mean adjusted (MLM), maximum likelihood mean-variance adjusted (MLMV), weighted least squares (WLS), weighted least squares mean adjusted (WLSM) and weighted least squares mean-variance adjusted (WLSMV) and the data collected on a 7-point Likert scale. Therefore, issues are not discussed which pertain to the selection of the optimal number of categories within the scale, as these issues have been profoundly described in the literature (Alwin 1992;Dawes 2008;Revilla, Saris, and Krosnick 2014;Tarka 2016). However, what one can infer from such studies is that the Likert scale with 7 categories ensures higher quality of information and plays the greatest advantage (as compared to the other variants as: 3,4,5,6,8,9,10) not only in the phase of data collection, but leads to better effects in the assessment of the CFA models which are responsible for the measurement of the respective latent variables. ...
Article
Full-text available
In this article, the author discusses the issues and problems associated with the influence of different estimation methods on the level of obtained parameters and goodness-of-fit of a Structural Equation Model (SEM) in the context of data measured on a 7-point Likert scale. Thus, the objective of the conducted analysis was to compare the selected methods of estimation such as maximum likelihood (ML), maximum likelihood mean adjusted (MLM), maximum likelihood mean-variance adjusted (MLMV), weighted least squares (WLS), weighted least squares mean adjusted (WLSM) and weighted least squares mean-variance adjusted (WLSMV) on the basis of respective parameter statistics, for which the quality of the SEM model fit was assessed. Eventually , among the presented methods, the best estimation procedure was selected. The area of empirical study and the subject of investigation refers to the opinion of consumers about the unethical behavior of companies in the area of marketing.
... This point of view is confirmed by the information theory which states that if more response categories are added to scale, more information about the variable of interest can be obtained. For example, Alwin (1992) when considered a set of hypotheses related to the theory of information and when he tested them with panel data, he found that except for the 2-point scales, "the reliability is generally higher for measures involving more response categories" (p. 107). ...
Chapter
In this chapter author presents the results of comparative analysis in reference to scales based on 5-, 7-, 9-, and 11-point response categories. An attempt was made to find the optimum number of responses among these scales but in this regard to the assumptions underlying the Confirmatory Factor Model and MultiTrait-MultiMethod. For this purpose, the data was collected from a sample of young consumers (n = 200) studying at the universities in Poland. The specific aim of the research was focused on their attitudes, which measured different aspects of the companies’ unethical behavior in the context of marketing activities. For the comparison of scales, the author has applied four models derived from the generalized CFA-MTMM model. This model allowed the recommendation of the best scale, and also helped to evaluate the effects associated with the use of particular type of scale on the CFA-MTMM alternative models and extracted, through their agency, factors.
... Satisfactory values for reliability and validity are found from five categories onwards [24,26,27]. Research suggests that this trend stagnates from seven options up [26,28,29]. ...
Article
Full-text available
In the past, steering feel in trucks was a compromise between acceptable steering torques at low velocities and a direct steering feel at higher velocities. Today, steering characteristics can be specifically adapted to the current driving situation due to the use of electric power steering systems. Following the human-centred design for interactive systems (ISO 9241-210) [1], the end user of a product must be included into the development process. Thus, a questionnaire to measure steering feel in truck drivers was designed. Therefore, evaluation criteria were derived from interview studies (N = 76 drivers) and the literature. Afterwards, vehicle dynamics experts supplied them with specific descriptions to maximize comprehensibility and determined the ideal situations in which the criteria can be assessed. Based on another interview study (N = 98 drivers) and a subsequent cluster analysis, the criteria were allocated to dimensions of steering feel. Lastly, a driving study (N = 41 drivers) was conducted to evaluate and further improve the questionnaire. In future research, it can be used to measure and optimize steering feel from the truck drivers’ perspective.
... Therefore, more information can be obtained by using longer scales and middle points. However, the recommendations about how many points should be used vary in the literature (Likert 1932;Alwin 1992;Dawes 2008). ...
Article
Full-text available
The formulation of theories and hypotheses is done at the level of concepts. These concepts are often tested by operationalizing them using survey questions. However, measurement errors make it impossible for survey questions to measure the concepts of interest perfectly. In order to correct for measurement errors, information is needed about their size, or the size of their complement, the quality. For the USA and Europe, a lot is already known about the quality of questions, but this has not yet been studied in some other parts of the world. In this paper, we use a multitrait-multimethod approach to estimate the quality of 27 questions in Mexico and Colombia. These initial results on quality for Central and Latin American countries show quality estimates that are relatively similar in terms of their relationships with the scale characteristics to what has been observed in the USA and Europe. © 2015 GIGA German Institute of Global and Area Studies. All Rights Reserved.
... Second, scholars are more and more encouraged to check for internal consistency of the underlying indicators when a given measure of a concept is an index based on multiple survey items (Foster et al. 2013). Third, there is an ongoing debate as to the 'optimal' number of points in a Likert-scale question item (Alwin 1992;Cummins and Gullone 2000;Lozano et al. 2008;Krosnick and Presser 2010;Weijters et al. 2010). These cover, for example, topics such as whether the inclusion (or exclusion) of middle categories in ordered responses can possibly bias results, have implications for measurement's validity (Moors 2008;Sturgis et al. 2014) or simply whether more categories improve or deteriorate the measurement itself (Dawes 2008;Lozano et al. 2008). ...
Article
Full-text available
The European Quality of Government Index (EQI) is the only measure of institutional quality available at the regional level in the European Union. The index, published in 2010 and again in 2013, is based on an ad-hoc survey that measures three different broad aspects of governance within countries: corruption, impartiality and quality. The EQI is assessed in this paper for the first time by means of Rasch modelling, a popular Item Response Theory method. It is demonstrated that Rasch modelling allows for a wide scope of validity and consistency tests of surveys of this kind. The analysis helped strengthening the survey, and consequently the index, by highlighting areas for improvement that can be applied to future rounds of the EQI survey. For instance, it allowed for testing the questions equivalence across different countries and respondents’ socio-demographic background, the validity and fit of each question’s measurement scale and the internal consistency of the EQI domains of corruption, impartiality and quality. Several of the shortcomings that were highlighted by the Rasch analysis will be addressed in the upcoming round of data collection for the third edition of the EQI. The analysis is then expected to have a positive impact on improving the first measure of quality of government in the European Union regions. © 2017 Springer Science+Business Media B.V., part of Springer Nature
... Some empirical results seem to support this theory. For instance, Alwin (1992) considers a set of hypotheses related to this theory of the information. Testing them with panel data, he finds that except for the 2- point scales, ''the reliability is generally higher for measures involving more response categories'' (p. ...
Article
Although agree-disagree (AD) rating scales suffer from acquiescence response bias, entail enhanced cognitive burden, and yield data of lower quality, these scales remain popular with researchers due to practical considerations (e.g., ease of item preparation, speed of administration, and reduced administration costs). This article shows that if researchers want to use AD scales, they should offer 5 answer categories rather than 7 or 11, because the latter yield data of lower quality. This is shown using data from four multitrait-multimethod experiments implemented in the third round of the European Social Survey. The quality of items with different rating scale lengths were computed and compared.
... The minimum number of categories is two, for example, the Yes or No response options to the question The light in this room is too bright as was used by Boyce et al. 43 A two-point scale is sufficient to measure attitude direction: Longer response scales add information regarding intensity but may also encourage rating scale biases. 100 A brief study using response ranges of five-, six-, seven-and eight-points found that these different scale formats did not lead to significant differences in central tendency -the same conclusion as to population opinion about the environment would be drawn with any of these scales. 72 With respect to these mixed results, the number of points in the response range was not used to screen previous studies in the current review. ...
Article
Full-text available
Light sources are available in a variety of spectral power distributions (SPDs) and this affects spatial brightness in a manner not predicted by quantities such as illuminance. Tuning light source SPD to better match the sensitivity of visual perception may allow the same spatial brightness but at lower illuminance with potential reductions in energy consumption. Consideration of experimental design was used to review 70 studies of spatial brightness. Of these, the 19 studies considered to provide credible evidence of SPD effects were used to explore metrics for predicting the effect of SPD but did not provide conclusive evidence of a suitable metric, in part because of incomplete reporting of SPD characteristics. For future work, these data provide an independent database for validating proposed metrics.
... Although the α's for the items comprising the final factors are not very high, reliability coefficients W 0.6 have been considered acceptable for descriptive research 14 BJM 10,1 (Moore and Carpenter, 2008;Robinson et al., 1991) and 0.5 for explorative research (Hair et al., 2007;Nunnally, 1978). The α's can indeed be deemed satisfactory for our explorative purposes in particular based on the fact that we would expect lower alphas with three-point Likert scales as compared to five or seven point scales (Alwin, 1992). ...
Article
Full-text available
Purpose – The purpose of this paper is to examine managerial styles of Russian managers in the context of institutional and economic environment of contemporary Russia. Design/methodology/approach – The study is based on a sample of 482 line and middle managers covering eight geographic regions, 14 industries and 80 organizations in Russia. Findings – Employing factor and cluster analyses the paper identifies four distinct managerial styles: paternalistic, exploitative, performance oriented and passive. In addition, the paper analyzes a number of contingent characteristics of these typological Russian managers such as their age, career development, regional, industrial and organizational presence. Originality/value – The analysis enriches the understanding of managerial style idiosyncrasy, heterogeneity and evolution in Russia. The identified plurality of managerial styles, differentially related to a number of contingency variables, indicates that it pays off for western companies to avoid using stereotypical ideas when dealing with their Russian counterparts and employ conscious strategies when recruiting managers to their Russian operations instead.
... This basic assumption about the underlying structure of attitudes underpins the methods typically used to measure and analyse them in survey research. Most survey attitude measures typically attempt to assess both the direction of the evaluation and its intensity, using response scales that capture these two dimensions simultaneously (Alwin, 1992). Probably the most widely-used of this type of attitude measure is the bipolar response scale, in which respondents are asked to rate the extent to which they agree or disagree with a statement intended to capture positive or negative aspects of the attitude object (Likert 1932). ...
Article
Full-text available
A persistent problem in the design of bipolar attitude questions is whether or not to include a middle response alternative. On the one hand, it is reasonable to assume that people might hold opinions which are `neutral' with regard to issues of public controversy. On the other, question designers suspect that offering a mid-point may attract respondents with no opinion, or those who lean to one side of an issue but do not wish to incur the cognitive costs required to determine a directional response. Existing research into the effects of offering a middle response alternative has predominantly used a split-ballot design, in which respondents are assigned to conditions which offer or omit a midpoint. While this body of work has been useful in demonstrating that offering or excluding a mid-point substantially influences the answers respondents provide, it does not offer any clear resolution to the question of which format yields more accurate data. In this paper, we use a different approach. We use follow-up probes administered to respondents who initially select the mid-point to determine whether they selected this alternative in order to indicate opinion neutrality, or to indicate that they do not have an opinion on the issue. We find the vast majority of responses turn out to be what we term `face-saving don't knows' and that reallocating these responses from the mid-point to the don't know category significantly alters descriptive and multivariate inferences. Counter to the survey-satisficing perspective, we find that those with this tendency is greatest amongst those who express more interest in the topic area.
... This variability, which can be high (see, e.g., Lee and Jeon, 2011;Genta et al., 2013), may be due to individual differences in cognitive ability or preference. Studies of category scale design have indicated that data quality (e.g., reliability, sensitivity) tends to improve as the number of answer categories increases (e.g., Alwin, 1992). An alternative seven-point scale for rating LD was proposed by Gover and Bradley (2007), and a five-point scale attempting to address the saturation issue but not the variation issue was proposed by Genta et al. (2013), who suggested on the basis of their results that there was a need for alternative implementations of the method. ...
Article
Conversational speech produced in noise can be characterised by increases in intelligibility relative to such speech produced in quiet. Listening difficulty (LD) is a metric that can be used to evaluate speech transmission performance more sensitively than intelligibility scores in situations in which performance is likely to be high. The objectives of the present study were to evaluate the LD of speech produced in different noise and style conditions, to evaluate the spectral and durational speech modifications associated with these conditions, and to determine whether any of the spectral and durational parameters predicted LD. Nineteen subjects were instructed to speak at normal and loud volumes in the presence of background noise at 40.5 dB(A) and babble noise at 61 dB(A). The speech signals were amplitude-normalised, combined with pink noise to obtain a signal-to-noise ratio of -6dB, and presented to twenty raters who judged their LD. Vowel duration, fundamental frequency and the proportion of the spectral energy in high vs. low frequencies increased with the noise level within both styles. LD was lowest when the speech was produced in the presence of high level noise and at a loud volume, indicating improved intelligibility. Spectrum balance was observed to predict LD.
... Peabody (1962), Lunney (1970) ve Jacoby ve Matell (1971) yapmış oldukları araştırmalarda 2 ya da 3 katılım düzeyi seçenek sayısının yer aldığı ölçek ifadelerinin, araştırılan konuyla ilgili soyut olguya ilişkin gerekli bilgiyi sağlayabildiğini, daha fazla katılım düzeyi seçenek sayısının olmasının bilgi artışı üzerinde anlamlı bir etkiye sahip olmayacağını belirtmişlerdir. Alwin (1992) ise konuya ilişkin daha detaylı bir ayrım yaparak; eğer amaç tutumun sadece yönelimini ölçmek ise 2 katılım düzeyi seçeneğinin yeterli olduğunu ancak aynı zamanda tutumun yoğunluğu da ölçülmek isteniyorsa ikiden fazla katılım düzeyi seçeneğine ihtiyaç duyulacağını ileri sürmüştür. Revilla ve arkadaşları (2014), 50 bin katılmcı üzerinde yapmış oldukları geniş örneklemli çalışmada, optimal katılım düzeyi seçenek sayısının 5 olduğunu, 5'den fazla katılım düzeyi seçeneğinin olması durumunda verinin kalitesinin düştüğünü ifade etmişlerdir. ...
Article
Bu araştırmanın amacı; veri karakteristiğinin (normal dağılım, çarpıklık, basıklık), içsel tutarlılık düzeyinin (Cronbach’s Alpha), ölçekler arası korelasyon katsayılarının ve kovaryans matrislerinin yapılarının katılım düzeyi seçenek sayısına duyarlı olup olmadığını sistematik olarak incelemek ve istatistiksel olarak sınamaktır. Araştırma, Eskişehir Osmangazi Üniversitesi İktisadi ve İdari Bilimler Fakültesi öğrencilerinden kolayda örnekleme yöntemiyle elde edilmiş olan üç farklı veri seti kapsamında gerçekleştirilmiştir. Araştırma neticesinde katılım düzeyi seçenek sayısının arttıkça; içsel tutarlılık düzeyinin sistematik olarak arttığı ve 5’li ile 11’li katılım düzeyi seçenek sayısı arasındaki farkın istatistiksel olarak anlamlı olduğu görülmüştür. Ayrıca ölçekler arası korelasyon katsayısının ise sistematik olarak arttığı fakat artışın istatistiksel olarak anlamlı olmadığına işaret eden bulgulara ulaşılmıştır. Değişkenler arası kovaryans matrisleri yapılarının ise katılım düzeyi seçenek sayısına bağlı olarak anlamlı bir şekilde farklılaştığı fakat kovaryans matrisi temelli yapısal eşitlik modelindeki yol katsayılarının katılım düzeyi seçeneği sayısına bağlı olarak anlamlı bir şekilde farklılaşmadığı görülmüştür. (The objective of this study is to systematically analyze and statistically prove whether the internal consistency level of the data characteristics (normal distribution, skewness, and kurtosis) is sensitive to the point scale range of the correlation coefficients and covariance matrices. The research was carried out using three different data sets obtained by applying the convenience sampling method on students studying at Eskişehir Osmangazi University, Faculty of Economics and Administrative Sciences. According to the results, the internal consistency level increases when the point scale range increase; and, the difference between 5-point scale and 11-point scale was observed to be statistically significant. Furthermore, the inter-scales correlation coefficient was determined to increase systematically; however, this increase was found not to be statistically significant. It was also observed that the inter-variable covariance matrices change significantly based on the point scale range; yet the path coefficients in covariance-based structural equation modeling do not change significantly.)
... Another reason that has been proposed for using branching is its apparent ability to explore the midpoint information (Miller, 1984;Krosnick and Berent, 1993;Malhotra, Krosnick, and Thomas, 2009). In addition, the technique of branching bipolar questions takes advantage of the effectiveness of two-category scales to reliably capture the direction and also of longer scales to measure the extremity (Alwin, 1992). Krosnick and Berent (1993) found that the branching format has higher validity and reliability than the non-branching format. ...
Article
Past research has recognized the effectiveness of verbally labelled branching questions with two aims: to simplify the judgement task, and to explore and optimize the midpoint answer. However, I recommend branching since positive and negative extremity answers are asked in the second step, both with positive labels, resulting in the same unit of measurement. Numerical scales are usually inappropriate for measuring attitudes because they exclude the negative part of the continuum. Accordingly, I conducted a split-ballot experiment (SAQ; N=146) to study the effect of branching with numerical scales, comparing 1-10 scales of trust toward 16 different institutions with their branching versions (i.e. dichotomous question followed by 5-point scale of extremity). Only a few studies have attempted to disambiguate the meaning of numerical scales using negative numbers. Their findings showed a positivity bias effect, which significantly shifts distributions to the positive side. I argue that there is also an avoidance effect with negative numbers, because respondents do not use them in everyday life. The solution that includes negative answers, but not negative numbers with rating scales, is branching.
... Besides that optimizing and satisficing are important in describing respondents respond thus increasing task difficulty, decreased respondents abilities and motivation. These factors are important to make sure less random errors happen when respondents not using all of the response levels (Alwins, 1992). ...
... Respondents in POLPAN were offered a five point rating scale, while respondents in RO-WVS -a shorter, four point scale. The following linear transformation can be applied to make scales in the same matric: for the source n-point scale, for k values ranging from 1 to n, k may be recoded to new scale, with l values from 1 to m, so that: l = (m -1)/ n*2 + k (m -1) / n As the methodology literature indicates, usually scales that have a middle point and are longer perform better (Gardner 1960, Alwin 1992, Krosnick and Fabrigar 1997, Østerås et al. 2008, Dawes 2008, Lundmark, Gilljam, and Dahlberg 2016; for a notable exception, see Revilla, Saris, and Krosnick 2014). In the case of institutional trust, the five point scale would be preferable over the four point scale (for specific comparison of four and five point rating scales, see Dawes 2002). ...
Article
Full-text available
If researchers wish to use surveys to understand the attitudes and behaviors of those who live in former State Socialist countries, they face a research landscape densely populated by cross-sectional studies. Panel surveys with individuals as the units of analysis, which are ideal for understanding change within people over time, are rare. As a service to researchers, this article presents possibilities for cross-national comparison via two sets publicly available panel data: the Polish Panel Survey POLPAN (focusing on its 2013 and 2018 waves) and the novel Romanian World Values Survey Panel RO-WVS (2012 and 2018 waves), which is the only panel version of World Values Survey (WVS). We present the research designs of each, and explore their ex-post harmonization. Conceptual overlap between these sources occurs mainly (but not only) with major socio-demographics and with political attitudes and behavior, including interest in politics, political participation, democratic values, and institutional trust. Whereas POLPAN is relatively well known, we argue that RO-WVS panel stands out as a unique resource that provides data on the dynamic nexus between social structure and cultural context. Keeping RO-WVS alive for a long period would help researchers to understand Romanian society in the European context, and provide for future comparisons between it and its neighbors.
... Initially, the instrument comprised 39 items, one item for each CFF (listed in Table 2) was measured on a five-point ordinal scale (1 = least frequently occurring, 5 = most frequently occurring). The reasons for selection of this measurement scale were, first, an ordinal scale allows communication of a greater differentiation in the judgements among the response categories of the items (Krosnick 2018); and secondly, the purpose of the instrument was to gather the frequency of occurrence of CFFs across the CI stages and not to check the magnitude of impact of CFFs (Alwin 1992). ...
Article
Full-text available
Organisations implement various Continuous Improvement (CI) practices such as Total Quality Management, Lean, Six Sigma and Lean Six Sigma (LSS) for improving their processes. Drawing from the success and failure stories of these structured CI practices, scholars enumerated Critical Success Factors and Critical Failure Factors (CFFs). This study empirically examines the occurrence of various CFFs across different stages of CI deployment. Further, from a contingency theoretic perspective, this study investigates their associations with contextual variables by collecting survey data from 213 business units from the USA, the UK, China, and India. Principal Component Analysis is used to group CFFs across five CI deployment stages leading to an empirically refined framework for CI. Crosstab analysis using the chi-square likelihood ratio presented associations of CFFs with contextual variables. Findings reveal significant differences in the occurrence of CFFs across countries. There is evidence that LSS is less prone to failures when compared with TQM, Lean and Six Sigma. The occurrence of CFFs has been relatively lower in smaller and medium-sized enterprises operating in the service sector. Findings also reveal that lessons learned from each CI deployment cycle, contribute to organisational learning, and thence, leading to success at the strategic CI level of maturity.
... The review found a marked difference in the duration of time between surveys for hypothetical and actual values, with 65% occurring concurrently and 25% with more than a 4-week gap between the surveys. A two-week interval is the generally recommended retest period to enhance reliability of the values obtained (Duane, 1992). However, while longer durations could potentially introduce recall bias, short durations of time difference means that respondents may remember what they said in the hypothetical survey and deliberately repeat the value to appear publicly consistent. ...
Article
Full-text available
Background: The contingent valuation (CV) method is used to estimate the willingness to pay (WTP) for services and products to inform cost benefit analyses (CBA). A long-standing criticism that stated WTP estimates may be poor indicators of actual WTP, calls into question their validity and the use of such estimates for welfare evaluation, especially in the health sector. Available evidence on the validity of CV studies so far is inconclusive. We systematically reviewed the literature to (1) synthesize the evidence on the criterion validity of WTP/willingness to accept (WTA), (2) undertake a meta-analysis, pooling evidence on the extent of variation between stated and actual WTP values and, (3) explore the reasons for the variation. Methods: Eight electronic databases were searched, along with citations and reference reviews. 50 papers detailing 159 comparisons were identified and reviewed using a standard proforma. Two reviewers each were involved in the paper selection, review and data extraction. Meta-analysis was conducted using random effects models for ratios of means and percentage differences separately. Meta-bias was investigated using funnel plots. Results: Hypothetical WTP was on average 3.2 times greater than actual WTP, with a range of 0.7-11.8 and 5.7 (0.0-13.6) for ratios of means and percentage differences respectively. However, key methodological differences between surveys of hypothetical and actual values were found. In the meta-analysis, high levels of heterogeneity existed. The overall effect size for mean summaries was 1.79 (1.56-2.04) and 2.37 (1.93-2.80) for percent summaries. Regression analyses identified mixed results on the influence of the different experimental protocols on the variation between stated and actual WTP values. Results indicating publication bias did not account for differences in study design. Conclusions: The evidence on the criterion validity for CV studies is more mixed than authors are representing because substantial differences in study design between hypothetical and actual WTP/WTA surveys are not accounted for.
... The TSE was also implemented as a measure of self-efficacy. Following best practices in measurement theory, the response options were reduced from the original nine options to five before the survey was administered (Alwin, 1992). Four independent t tests (by overall and by subscale) were conducted. ...
Article
Alternative pathways to teaching licensure were developed to address teacher shortages. These programs differ widely, making it difficult to generalize the effects. This study compares the impact of two alternative licensure programs on the development of fundamental elements of science teacher preparation and persistence. The fast‐track programs include a 6‐month teacher preparation program and a one‐year residency teacher preparation program. The study concluded that licensure type was unassociated with the impact on teaching self‐efficacy, beliefs about teacher‐focused/student‐focused teaching, preferences for inquiry instructional practices, and experiences with student misbehavior. However, the study revealed that licensure type was associated with a number of other variables: residency students had more confidence in their ability to provide quality instruction; preferred inquiry‐based instruction more often; and may be better prepared for the high‐needs classroom. Those in the 6‐month program were more likely to score higher on practical versus theoretical approaches to teaching, and while they had a more realistic idea of how to measure success in the high‐needs classroom, the residency students had more knowledge of educational theory and how to apply it. Findings suggest that more traditionally licensed teachers may be more inclined to use inquiry‐based methods suggested in current reforms.
... A high number of items is generally acknowledged to be a means of improving reliabil- ity. However, if the list is too extensive, it is assumed to have a negative effect (although this has yet to be demonstrated), since the informant tires of the number of items (Alwin 1992;Alwin and Krosnick 1991;B€ ohme and St€ ohr 2014;Cannell, Miller, and Oksenberg 1981;Gummer and Roßmann 2015). After reviewing this guidance, and in light of the circumstances and objectives of our study, we decided that an interval between 20 and 25 impacts was appropriate (from the technical point of view) and realistic (from the point of view of the circumstances and needs of ESIA and the participatory application of MCDA). ...
Article
Full-text available
Environmental and social impact assessment (ESIA) can be an extremely useful tool for identifying and evaluating the repercussions of a wide range of initiatives. Typically when the project and its impacts are highly complex, an ESIA can detect a large number of issues that need to be prioritized so that they can be effectively and efficiently addressed. This article presents a mixed-methodology proposal for impact prioritization in ESIA, divided into four phases: (1) creation of the stakeholders' platform; (2) identification and assessment of impacts; (3) impact categorization; and (4) impact assessment and prioritization using multi-criteria decision analysis (MCDA). This procedure was applied as an ex-post evaluation of a golf-based tourism project in the southwest of the Iberian Peninsula (Huelva, Spain), but can also potentially be used to conduct ex-ante assessments. The main contribution of the study is in the design and testing of a parsimonious procedure, which condenses a large amount of qualitative information into relatively simple operations using MCDA. The process is grounded in the constructivist social impact assessment (SIA) paradigm through stakeholder evaluation of impacts and criteria. ARTICLE HISTORY
... We think this may have been due to very low prevalence of certain forms of behavior such as drug use and suicidal behavior in this setting, which meant that certain response options were often redundant. Indeed the number of response categories for all these items were within the recommended ranges of four to nine options [48]. Rattray and colleagues [49] advise that although redundant response options may warrant deletion of an item, it is always crucial to refer to the original research question and retain items that are thought to reflect important underlying theoretical domains. ...
Article
Full-text available
Background: Health risk behavior (HRB) is of concern during adolescence. In sub-Saharan Africa, reliable, valid and culturally appropriate measures of HRB are urgently needed. This study aims at assembling and psychometrically evaluating a comprehensive questionnaire on HRB of adolescents in Kilifi County at the coast of Kenya. Methods: The Kilifi Health Risk Behavior Questionnaire (KRIBE-Q) was assembled using items on HRB identified from a systematic review and by consulting 85 young people through 11 focus group discussions and in-depth interviews with 10 key informants like teachers and employees of organizations providing various services to young people in Kilifi County. The assembled list of HRB items were back and forward translated from English to Swahili and harmonized by a panel of experts. A total of 164 adolescents completed the assembled Swahili questionnaire at baseline and two weeks later 85 of them completed the questionnaire again. A classical test theory approach was utilized for psychometric evaluation. We computed the amount of missing data at item-level to verify data quality. Scaling evaluation was assessed by spread of responses across options at an item-level. Using Gwet's AC1 coefficient, test-retest reliability was assessed using data from the 85 adolescents who answered the questionnaire twice. Observations and completion of a brief questionnaire were done for non-psychometric evaluation of the KRIBE-Q administered via audio-computer assisted self-interview (ACASI) in Swahili language to 40 adolescents. Results: The KRIBE-Q showed high data quality, good spread of responses across options and a very good test-retest reliability (Gwet's AC1 = 0.82). It comprised 8 components with acceptable test-retest reliability: behavior resulting in unintentional injury and violence (0.85); tobacco use (0.85); alcohol and drug use (0.96); sexual behaviors (0.94); dietary behaviors (0.60); physical activity (0.74); gambling (0.73); and hygiene behavior (0.89). About 96% of the adolescents found the ACASI private and easy to use. Prevalence of bullying (32%), physical fights (40%) and engagement in gambling (26%) was high. Conclusion: The KRIBE-Q assembled in this study is a psychometrically sound instrument for adolescents in rural coastal Kenya and feasible to administer via ACASI. This measure may be useful for surveys and planning interventions in similar settings.
... Many efforts have been paid to seeking for an optimal number of response categories in a Likert scale to maximize its psychometric properties, but an agreement has been hard to reach. On one hand, having a larger number of response categories transmits more bits of information (Alwin, 1992) and allows for finer mapping from respondents' attitude towards a latent continuum to the limited response options in the Likert scale. On the other hand, too many response options would go beyond the discrimination ability of respondents, invite satisficing behaviors such as selecting an option only for the ease of cognitive burden, and hence introduce more response error (Krosnick & Fabrigar, 1997). ...
Article
Full-text available
Likert‐type rating scales are among the most widely used tools in psychological research. Different numbers of response categories would likely affect response style, data distribution, reliability, and construct validity. There is a lack of research in factor structure invariance under Likert scales with different numbers of categories. The purpose of this study is to examine the effects of varying numbers of Likert points (4–11) on scale properties such as factor structure, external validity, and latent means based on the Rosenberg Self‐Esteem Scale (M. Rosenberg, 1989). The sample consists of 1,807 students from secondary schools in Macau. Confirmatory factor analysis shows that the correlated two‐factor model is the most appropriate one; longitudinal invariance analysis reveals that measurement invariance across Likert scales was satisfied at the scalar level. In addition, latent mean scores on the two factors as well as observed means on the subscales are comparable across Likert scales. Moreover, the measurement model across Likert scales exhibit similar external validity. Although psychometric properties are mostly similar among a different number of points, the 4‐point Likert scale is not recommended for its higher skewness and lower loadings; the 11‐point Likert scale from 0 to 10 is slightly preferred for its higher loadings and composite reliability.
... The Feeling Thermometer (Alwin, 1992) measured adult children's ingroup bias via their attitudes toward racial outgroups. Participants were instructed to choose a "temperature reading," ranging from very cold and unfavorable (0) to very warm and favorable (100), that best represents their attitudes. ...
Article
Full-text available
The purpose of this study was to examine the influence of family communication environments on children’s intergroup socialization. Adult children (n = 200) reported on their parents’ conformity and conversation orientations and their own racial attitudes and intergroup orientations. Results evidenced ingroup bias, social dominance, and identification with parent as mediators of the positive relationship between conformity orientation and racial prejudice and the negative relationship between conformity orientation and racial tolerance. Results also revealed that children from consensual and protective families harbor the most racial prejudice and least racial tolerance. Future directions related to intergroup contact interventions, racially diverse families, and qualitative assessments of parent–child interactions are discussed.
Article
Full-text available
In this paper, we use follow-up probes administered to respondents who initially select the mid-point to determine whether they selected this alternative in order to indicate opinion neutrality, or to indicate that they do not have an opinion on the issue. We find the vast majority of responses turn out to be what we term 'face-saving don't knows' and that reallocating these responses from the mid-point to the don't know category significantly alters descriptive and multivariate inferences. Our findings have important implications for the design and analysis of bipolar ratings scales.
Chapter
Dieses Kapitel befasst sich mit verschiedenen Möglichkeiten, wie die Antworten der Testpersonen auf die Testaufgaben/Fragen erfasst und kodiert werden können („Antwortformate“). Daraus ergeben sich verschiedene Itemtypen. Unter Beachtung von Vor- und Nachteilen wird das freie Antwortformat dem gebundenen Antwortformat gegenübergestellt. Bei Letzterem sind vor allem Ordnungs- und Auswahlaufgaben sowie kontinuierliche und diskrete Beurteilungsaufgaben als Itemtypen weitverbreitet. Unter Heranziehung zahlreicher Beispiele werden viele praktische Konstruktionsaspekte thematisiert und unter Bezug auf verschiedene Zielvorgaben diskutiert. Entscheidungshilfen für die Wahl des Aufgabentyps runden das Kapitel ab.
Article
Attitude toward the ad is an important construct frequently measured in advertising and marketing research. However, it is somewhat ambiguous in regard to how to quantify attitude toward the ad with a numerical scale. This study examines the practice and effect of using arbitrary scales when measuring attitude toward the ad (i.e., 1–5, 1–7, 1–9, 0–10, and 0–100). A longitudinal experiment with thousands of adult participants (Time 1: n = 2,366, Time 2: n = 1,165) was conducted. The experimental results revealed that different scales, in general, will lead to consistent findings, but if the conventional p < .05 criterion is used, the study conclusion may differ. Based on these findings, theoretical discussions and practical recommendations are provided.
Article
While extensive efforts have been made to harness benefit of computing technologies in education, little attention focuses on how such efforts lead to students’ positive attitudes toward science and technology. Building on the technology acceptance model and motivation literature, the current study proposed that hands-on experiences with computing devices allow students to perceive their technology use as being useful and enjoyable, which in turn leads to positive attitudes toward science and technology in general. Data collected from a pedagogical intervention support our predictions regarding the role of utility perception and enjoyment. Furthermore, it is suggested that students’ prior attitudes toward science and technology and the type of device used in the intervention influence perceived usefulness and enjoyment of classroom computing. When using education-specific devices, students’ prior attitudes were positively associated with postintervention attitudes as well as with utility perception and enjoyment. When using general-purpose devices, however, students’ prior attitudes were not related to those outcomes. These results imply that distribution of technologies to schools may improve attitudes toward science and technology, particularly in populations that have been underrepresented in the fields of science and technology thus far.
Article
The anorexic voice (AV) is defined as a critical internal dialogue which has been implicated in the development and maintenance of anorexia nervosa (AN). Systematic research to explore this further requires a valid and reliable measurement tool. This study aimed to develop and assess the validity of the Experience of an Anorexic VoicE Questionnaire (EAVE‐Q). EAVE‐Q items were developed and checked for face and content validity through cognitive interviews with seven individuals diagnosed with AN. Participants with a diagnosis of AN (N = 148) completed the EAVE‐Q, sociodemographic questions and measures of mood and quality of life to assess internal consistency and construct validity. Forty‐nine participants completed the EAVE‐Q twice more to assess test‐retest reliability. The EAVE‐Q had good face and content validity and good acceptability. Principal axis factoring resulted in an 18‐item scale organised into five domains with high internal consistency (α = .70 to α = .85). Domains correlated significantly with eating disorder symptoms, psychological distress and quality of life. The EAVE‐Q did not discriminate between participants on the basis of body mass index. Test‐retest reliability was moderate. Although the factor structure of the EAVE‐Q requires replication in other AN samples, the EAVE‐Q is the first measure of a critical internal dialogue in AN. It is hoped that it will aid future research to increase understanding of AN and the continued development of person‐centred treatments.
Article
This special issue of Sociological Methods & Research contributes to recent trends in studies that exploit the availability of multiple measures in sample surveys in order to detect the level and patterning to measurement errors. Articles in this volume focus on topics in one of (or some combination of) the three areas: (1) those that develop and test theoretical hypotheses regarding the behavior of measurement errors under specific conditions of measurement, (2) those that focus on the methodological problems encountered in the design of data collection permitting the estimation of measurement models, and (3) those that focus on the evaluation of existing models for detecting and quantifying the nature of measurement errors. The designs included in these investigations include those that incorporate follow-up probes, record-check studies, multitrait-multimethod designs, longitudinal designs, and latent class models for assessing measurement errors for categorical variables.
Article
Full-text available
In Information Age, it is expected that individuals reconstruct what they read by analyzing and synthesizing. In this respect, open reading which is favorable for behaviorist approach has lost its mission and creative reading appropriate for constructivist approach has taken place of open reading. While open reading is a process of memorizing information in a text without operationalizing it in cognitive processes, creative reading is re-creating the text by one’s own life into consideration. For successful creative reading, both teachers and students should hold positive attitudes towards creative reading. The higher positive perceptions of creative reading of especially the elementary school students will increase their creative reading achievement. With this regard, in this study, a perception scale related to creative reading was developed. As a result of validity and reliability studies, a scale consisting of 3 dimensions with 25 questions was developed. Cronbach Alfa coefficient of this scale was found to be 0.87. Following the reliability analysis, it was seen that Spearman-Brown value was 0.73 and Guttman Split-Half value was 0.72. The split half reliability coefficient was 0.73 and this value was enough for split half reliability coefficient. The scale was formed as three point Likert as it was for 4th graders of elementary schools Keywords: Scale development, creative reading, validity, reliability.
Article
What explains the recent rise in extremely negative feelings towards presidential candidates? Using the American National Election Studies survey data from 1984 to 2016, we show that negative feelings towards presidential candidates have grown steadily in recent elections, with unusually large numbers of zero ratings on candidate thermometers in 2004, 2012, and, especially, 2016. Such evaluations are primarily reserved for candidates of the other party and shown to be strongly related to partisan polarization. Importantly, however, candidate traits have long played and continue to play major roles in candidate evaluations, though their effects vary by year. Indeed, the unprecedented number of the most negative scores in 2016 appears due more to increases in negative perceptions of candidate leadership, competence and empathy than to polarization. Clinton and Trump are just as much to blame for the public's animosity as the rising tide of polarization.
Article
This paper provides empirical evidence regarding the causal effects that upgrading slum dwellings has on the living conditions of the extremely poor. In particular, we study the impact of providing better houses in situ to slum dwellers in El Salvador, Mexico and Uruguay. We experimentally evaluate the impact of a housing project run by the NGO TECHO (“roof”), which provides basic pre-fabricated houses to members of extremely poor population groups in Latin America. The main objective of the program is to improve household well-being. Our findings show that better houses have a positive effect on overall housing conditions and general well-being: the members of treated households are happier with their quality of life. In two countries, we also document improvements in children’s health; in El Salvador, slum dwellers who have received the TECHO houses also feel that they are safer. We do not find this result, however, in the other two experimental samples. There are no other noticeable robust effects in relation to the possession of durable goods or labor outcomes. Our results are robust in terms of both their internal and external validity because they are derived from similar experiments in three different Latin American countries.
Article
This article provides a summary of the literature's suggestions on survey design research. In doing so, it points researchers toward question formats that appear to yield the highest measurement reliability and validity. Using the American National Election Studies as a starting point, it shows the general principles of good questionnaire design, desirable choices to make when designing new questions, biases in some question formats and ways to avoid them, and strategies for reporting survey results. Finally, it offers a discussion of strategies for measuring voter turnout in particular, as a case study that poses special challenges. Scholars designing their own surveys should not presume that previously written questions are the best ones to use. Applying best practices in questionnaire design will yield more accurate data and more accurate substantive findings about the nature and origins of mass political behavior.
Article
Bu araştırmanın amacı; veri karakteristiğinin (normal dağılım, çarpıklık, basıklık), içsel tutarlılık düzeyinin (Cronbach's Alpha), ölçekler arası korelasyon katsayılarının ve kovaryans matrislerinin yapılarının katılım düzeyi seçenek sayısına duyarlı olup olmadığını sistematik olarak incelemek ve istatistiksel olarak sınamaktır. Araştırma, Eskişehir Osmangazi Üniversitesi İktisadi ve İdari Bilimler Fakültesi öğrencilerinden kolayda örnekleme yöntemiyle elde edilmiş olan üç farklı veri seti kapsamında gerçekleştirilmiştir. Araştırma neticesinde katılım düzeyi seçenek sayısının arttıkça; içsel tutarlılık düzeyinin sistematik olarak arttığı ve 5'li ile 11'li katılım düzeyi seçenek sayısı arasındaki farkın istatistiksel olarak anlamlı olduğu görülmüştür. Ayrıca ölçekler arası korelasyon katsayısının ise sistematik olarak arttığı fakat artışın istatistiksel olarak anlamlı olmadığına işaret eden bulgulara ulaşılmıştır. Değişkenler arası kovaryans matrisleri yapılarının ise katılım düzeyi seçenek sayısına bağlı olarak anlamlı bir şekilde farklılaştığı fakat kovaryans matrisi temelli yapısal eşitlik modelindeki yol katsayılarının katılım düzeyi seçeneği sayısına bağlı olarak anlamlı bir şekilde farklılaşmadığı görülmüştür. Anahtar kelimeler: katılım düzeyi seçenek sayısı, likert, veri karakteristiği, veri kalitesi. Abstract The objective of this study is to systematically analyze and statistically prove whether the internal consistency level of the data characteristics (normal distribution, skewness, and kurtosis) is sensitive to the point scale range of the correlation 1 Bu makale 19. Ulusal Pazarlama Kongresinde sunulmuş olan bildirinin genişletilmiş halidir.
Article
Subjective well-being may not improve in step with increases in material well-being due to hedonic adaptation, a psychological process that attenuates the long-term emotional impact of a favorable or unfavorable change in circumstances. As a result, people’s degree of happiness eventually returns to a stable reference level. We use a multicountry field experiment to examine the impact on subjective measures of well-being of the provision of improved housing to extremely poor populations in order to test whether they exhibit hedonic adaptation when their basic housing needs are met. After 16 months, we find that subjective perceptions of well-being improve substantially for recipients of improved housing but that, after, on average, eight additional months, 60% of that gain has dissipated. Extrapolation achieved through estimation of a structural model of hedonic adaptation suggests that the decay rate of the treatment effect is 20% per month. As a result, after 28 months of treatment exposure, we forecast that the entire treatment effect will have disappeared. (JEL: D0, I31)
Article
The article is devoted to the analysis of cognitive availability of measurement scales, based on the comparison of answers given by respondents to the very same question that is provided with scales having different number of categories. The measure of cognitive control over the scales presented to the respondent is inferred from consistence of his answers. The scale that is difficult to master will be known by the answers that are inconsistent with the ones given on other scales. First part of the article contains a formal definition of consistence and the way of constructing formula for the level of beyond-chance consistence of answers. In the second part I present the results of empirical research conducted in a group of students taking part in statistical analysis classes. The results initially confirm the hypothesis that increase in number of scale categories heightens the level of difficulty that a scale presents to the respondent.
Chapter
This chapter deals with some aspects of two sources of systematic error in surveys: non-response and measurement error. The quality of the obtained response is discussed first with a focus on non-response bias estimation and non-response bias reduction. Measurement error is studied by evaluating the quality of registered responses through question wording, order, and response scale effects. The different approaches to measurement error are discussed. Practical examples of dealing with bias and measurement error are offered. These include the evaluation of non-(response) rates and response enhancement strategies, comparing cooperative and reluctant respondents on non-response issues based on the analysis of European Social Survey. Concerning measurement error, the split ballot approach and multitrait multimethod are extensively discussed including theoretical concepts and methods/models, and an example of acquiescence when a balanced set of items is available. The chapter also presents some debates concerning theoretical construct and model developments in the respective fields, and emphasizes that both errors are strongly related. This means response distributions, correlations, and regression parameters can be seriously affected.
Conference Paper
With the growth of people's living standard and economic income, the effect of prices is weakened while customers are shopping, but brand effect is increasingly obvious. This paper uses the customers' brand cognitive level as independent variable, and uses simulation scenario experiment method and use the eye tracker to record and analyze the eye movement data of the participants of different cognitive levels during the process of choosing and buying different brands, then uses science methods to process and analyze dates, and find out the relationship between the customer's cognitive level and value perception of the brand. The results prove that improve customers' cognitive level of a brand can also improve the brand perceived value to them.
Article
This paper provides empirical evidence regarding the causal effects that upgrading slum dwellings has on the living conditions of the extremely poor. In particular, we study the impact of providing better houses in situ to slum dwellers in El Salvador, Mexico and Uruguay. We experimentally evaluate the impact of a housing project run by the NGO TECHO (“roof”), which provides basic pre-fabricated houses to members of extremely poor population groups in Latin America. The main objective of the program is to improve household well-being. Our findings show that better houses have a positive effect on overall housing conditions and general well-being: the members of treated households are happier with their quality of life. In two countries, we also document improvements in children’s health; in El Salvador, slum dwellers who have received the TECHO houses also feel that they are safer. We do not find this result, however, in the other two experimental samples. There are no other noticeable robust effects in relation to the possession of durable goods or labor outcomes. Our results are robust in terms of both their internal and external validity because they are derived from similar experiments in three different Latin American countries.
Article
Full-text available
In Information Age, it is expected that individuals reconstruct what they read by analyzing and synthesizing. In this respect, open reading which is favorable for behaviorist approach has lost its mission and creative reading appropriate for constructivist approach has taken place of open reading. While open reading is a process of memorizing information in a text without operationalizing it in cognitive processes, creative reading is re-creating the text by one’s own life into consideration. For successful creative reading, both teachers and students should hold positive attitudes towards creative reading. The higher positive perceptions of creative reading of especially the elementary school students will increase their creative reading achievement. With this regard, in this study, a perception scale related to creative reading was developed. As a result of validity and reliability studies, a scale consisting of 3 dimensions with 25 questions was developed. Cronbach Alfa coefficient of this scale was found to be 0.87. Following the reliability analysis, it was seen that Spearman-Brown value was 0.73 and Guttman Split-Half value was 0.72. The split half reliability coefficient was 0.73 and this value was enough for split half reliability coefficient. The scale was formed as three point Likert as it was for 4th graders of elementary schools Keywords: Scale development, creative reading, validity, reliability.
Article
Full-text available
This article is based on data from the 1980 National Election Study surveys. It reports findings concerning the rates at which voters become familiar with presidential candidates and their policy positions, trends in public opinion during the 1980 presidential campaign, and the dynamics of individual attitudes that underlie those trends. The impact of political attitudes on the individual vote decision is assessed within the context of a simultaneous equation model. In addition, the net effects of attitudinal distributions on the election outcome are estimated. The analysis yields support for the retrospective voting model and provides no evidence for the contention that Reagan's victory was the result of his policy or ideological positions.
Article
Full-text available
Errors can be introduced into scientific research when continuous concepts are measured on scales that rank the concepts into a few categories. This presents a potential problem because measures of association between two variables may differ depending on whether continuous or collapsed measures are used. We analyzed simulated data and examined differences in the correlation between two normally distributed continuous variables and the same two variables collapsed into a small number of categories. In general, the differences in correlation coefficients computed on continuous variables and the same variables collapsed into a few categories are small. The greatest differences in the correlations between the two types of variable occur when the continuous variables' correlation is high and only a few categories are used for the collapsed variables. When as few as five categories are used to approximate the continuous variables, the correlation coefficients and their standard deviations for the collapsed and continuous variables are very close. These findings suggest that under certain conditions it may be justifiable to analyze categorical data as if it were continuous.
Article
Full-text available
A method is proposed for empirically testing the appropriateness of using tetrachoric correlations for a set of dichotomous variables. Trivariate marginal information is used to get a set of one-degree of freedom chi-square tests of the underlying normality. It is argued that such tests should preferrably preceed further modeling of tetrachorics, for example, modeling by factor analysis. The assumptions are tested in some real and simulated data.
Article
Recent work by Jacoby and Mattell [6] has suggested that three-point Likert scales are sufficient to meet criteria of test-retest reliability, concurrent validity, and predictive validity. Green and Rao [3], using the criterion of data configuration recovery, concluded that sixor seven-point scales are preferable, and the authors are "skeptical about the ability of large numbers of such scales (three- or two-point scales) to 'make up' for the limited information provided by each scale separately." In a reply to Green and Rao, Benson [1] argued that the frequent applicability and practical convenience of twoor three-point scales are strong points in their favor. Moreover, the focus of marketing research on population averages, rather than individuals, suggests that scales with few categories are adequate. This article delineates the conditions under which a two- or three-point scale may be good enough. BACKGROUND
Article
Through the use of computer simulations, Labovitz's (1970) examination of the effects of "randomly stretching" measurement scales on the correlation between these stretched scales and an equal distance scoring system are reformulated and extended. Specifically, we examine the effects of the number of rank categories (C) for rank-order variables on the product moment correlation (r) between stretched scales and an equal distance scoring system. The stretched scales are drawn from three types of distributions: (1) the uniform distribution (i.e., the one used by Labovitz), (2) the normal distribution, and (3) a skewed distribution (log-normal distribution). We find that the average correlation (r̄) between the equal distance scoring system and the stretched scale is quite high for both the uniform and normal distributions, and that r̄ increases with C only when C is greater than four or five. Thus, contrary to suggestions in the literature, r̄ is not a monotonic function of C. More importantly, for the skewed distribution, r̄ is a monotonically decreasing function of C and is substantially smaller than r̄'s based on uniform and normal distributions. The implications of these findings for the use of Pearson's r with rank-order values are discussed.
Article
The author questions the procedure and the advice given researchers in a previously published analysis of simulated data.
Article
Managers and researchers concerned with marketing and attitudinal research frequently encounter the problem of determining the number of rating scales to use and the number of response categories to provide for each scale that is used. This article approaches the problem through a numerical simulation designed to measure the sensitivity of "solution recovery" to changes in these variables.
Article
"Ss (N = 236) rated 20 foods as to preference using rating scales containing 2, 3, 5, 7, and 9 categories. Test reliability (summed ratings for each S) and rater reliability (summed ratings for each food) were computed for each scale. Test reliability was constant over the entire range of categories and was very similar to reliabilities found in another study. Rater reliability was constant from five to nine categories, but was slightly lower at two and slightly higher at three categories. It was concluded that test reliability is independent of the number of scale categories, and that rater reliability is relatively constant but" warrants further research. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
In a Monte Carlo study, the number of response categories, number of items, covariance among items, and item "error" were varied to simulate scores following classical true score assumptions. Despite considerable literature examining the optimal number of response categories, this variable accounted for very little variance in the correlation of fallible composite scale scores and known "true" scores. In no situation did correlations substantially increase with the use of more than 5 response categories. The effects of the 4 variables were largely additive. The relative importance of the variables differed, however, according to whether an internal consistency or a stability estimate was used as the dependent variable. Results are discussed in terms of possible trade-offs for applied researchers. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Discusses the contradictions and confusion in the literature on determining the optimal number of scale points in a rating scale, and suggests a mathematical model that allows for the simulation of the rating situation. The model involves generating data with different item variance-covariance structures and with different numbers of scale points. Such data were generated and used to calculate 3 reliability measures. The effects of different numbers of scale points and different covariance structures upon these reliability measures are examined, and the results help explain a large number of empirical studies exploring the "optimal number of scale points" problem. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
The distribution of an O's absolute judgments, in which he identifies a stimulus as having a particular value, may indicate how much information he obtained about which of several alternative stimuli occurred at a particular time. The amount of information conveyed to O can be measured in bits. This measure may give an estimate of the minimum number of stimulus categories which will transmit the maximum amount of information. A technique is described for constructing a scale of equal discriminability to select the stimuli for maximum information transmission. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
A review of methodological research on the semantic differential (SD) shows that: (a) metric assumptions in SD scales are in some ways inaccurate but adequate for many applications; (b) biased errors may arise because of social desirability effects or scale-checking styles; (c) a substantial portion of variation in SD ratings is due to individual differences and temporal variations in responses; (d) basic dimensions of average response on SD scales are evaluation, potency, and activity, and no extensive proliferation of basic dimensions beyond these can be expected; (e) there are individual differences in the size and character of the semantic space; (f) the appearance of scale-concept interactions frequently is a methodological artifact which would not occur in adequately designed studies; and (g) the existence of real scale-concept interactions demands tailoring the SD to different stimulus domains, but these studies must be carried out with care. (58 ref.) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
The purpose of this study was to determine whether mean occupation evaluation ratings would differ as a function of 7 variations in rating-scale format. 60 basic airmen rated 15 occupations on 9 occupation-requirement factors for each format. A 3-way analysis of variance (occupations, factors, scale format) resulted in statistically significant terms for each of the main effects and for all 4 interaction terms. It was concluded that rating-scale format was a determiner of the judgment of raters in this sample and that selection of an optimal format should be based upon capability to predict a criterion. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Ratings of parent behavior were obtained by means of a graphic rating scale. The scale was divided into various numbers of units, and the score of each subject was computed using each scale. As the number of intervals was increased up to about 12 reliability increased markedly. A less marked increase was manifested up to about 30 intervals, beyond which point there was a slight decrease. This curve does not correspond to that expected on the basis of Symonds' function for the optimal number of scale divisions. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This paper considers the problem of applying factor analysis to non-normal categorical variables. A Monte Carlo study is conducted where five prototypical cases of non-normal variables are generated. Two normal theory estimators, ML and GLS, are compared to Browne's (1982) ADF estimator. A categorical variable methodology (CVM) estimator of Muthén (1984) is also considered for the most severely skewed case. Results show that ML and GLS chi-square tests are quite robust but obtain too large values for variables that arc severely skewed and kurtotic. ADF, however, performs well.Parameter estimate bias appears non-existent for all estimators. Results also show that ML and GLS estimated standard errors are biased downward. For ADF no such standard error bias was found. The CVM estimator appears to work well when applied to severely skewed variables that had been dichotomized. ML and GLS results for a kurtosis only case showed no distortion of chi-square or parameter estimates and only a slight downward bias in estimated standard errors. The results are compared to those of other related studies.
Article
This paper proposes that when optimally answering a survey question would require substantial cognitive effort, some repondents simply provide a satisfactory answer instead. This behaviour, called satisficing, can take the form of either (1) incomplete or biased information retrieval and/or information integration, or (2) no information retrieval or integration at all. Satisficing may lead respondents to employ a variety of response strategies, including choosing the first response alternative that seems to constitute a reasonable answer, agreeing with an assertion made by a question, endorsing the status quo instead of endorsing social change, failing to differentiate among a set of diverse objects in ratings, saying ‘don't know’ instead of reporting an opinion, and randomly choosing among the response alternatives offered. This paper specifies a wide range of factors that are likely to encourage satisficing, and reviews relevant evidence evaluating these speculations. Many useful directions for future research are suggested.
Article
The project conceived in 1929 by Gardner Murphy and the writer aimed first to present a wide array of problems having to do with five major "attitude areas"--international relations, race relations, economic conflict, political conflict, and religion. The kind of questionnaire material falls into four classes: yes-no, multiple choice, propositions to be responded to by degrees of approval, and a series of brief newspaper narratives to be approved or disapproved in various degrees. The monograph aims to describe a technique rather than to give results. The appendix, covering ten pages, shows the method of constructing an attitude scale. A bibliography is also given.
Article
Compared the effectiveness with which job-task anchored equal-appearing interval scales could be used in contrast with scales anchored only by simple numerical benchmarks. 2 groups of judges rated identical lists of job-task statements in terms of both types of scales. Ratings were made on 5 sensory/physical dimensions of job activities. The reliabilities of ratings for all scales were computed by an analysis of variance approach. In a test of statistical significance across all 5 scale dimensions, it was found that job-task anchored scales could generally be used with significantly greater reliability than simple numerically anchored scales.
Article
Methods for obtaining tests of fit of structural models for covariance matrices and estimator standard errors which are asymptotically distribution free are derived. Modifications to standard normal theory tests and standard errors which make them applicable to the wider class of elliptical distributions are provided. A random sampling experiment to investigate some of the proposed methods is described.
Article
Subjects "rated themselves on their knowledge of 12 foreign countries on rating scales with 3, 5, 7, 9, or 11 categories and with these scales verbally anchored either in the center, at both ends, or at both center and ends. The data were analyzed within an information theory syntax as to the effect of variations in number of scale categories and amount of verbal anchoring upon the information transmitted by the scale. Results indicated an increase in the absolute amount of transmitted information as the number of scale categories was increased. Increased verbal anchoring of the rating scale resulted in a slight increase in the information transmitted by the scale."
Article
A variety of researches are examined from the standpoint of information theory. It is shown that the unaided observer is severely limited in terms of the amount of information he can receive, process, and remember. However, it is shown that by the use of various techniques, e.g., use of several stimulus dimensions, recoding, and various mnemonic devices, this informational bottleneck can be broken. 20 references. (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
Information transmission measures and discriminability scaling procedures can be used to aid in the evaluation of the effects of the number of categories of rating scales. An illustrative set of data clarifies the application of these methods and shows their close relationship. Both information transmission and discriminability increase monotonically with an increase in the number of rating categories. From Psyc Abstracts 36:02:2AE43G. (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
Various measures in psychology use a set of response categories extending in two directions about some neutral point. Scores for several responses are summed or averaged to give a composite score. Examples would include some of the methods of absolute judgment in psychophysics, the Likert-type of attitude scale, Osgood's semantic differential, rating scales used in personality assessment. The object of measurement for the composite scores may thus be some attribute of physical stimuli, of verbal concepts, of other persons, or of oneself. Two components can be distinguished in the responses: (a) the direction from the neutral point, representing the basic dichotomy--heavy versus light, agreement versus disagreement with the statement, etc.; (b) the degree or extremeness of the response from the neutral point. For example, in a six-point Likert scale the subject writes + 1, + 2, + 3; or - 1, - 2, - 3 to indicate slight, moderate, and strong agreement or slight, moderate, and strong disagreement. The basic dichotomy of direction is determined by whether he agrees or disagrees (writes + or -); extremeness by whether he indicates slight, moderate, or strong feelings (writes 1, 2, or 3). A neutral response category may or may not be provided--e.g., Likert scales traditionally permitted a neutral response (writing 0), but many recent Likert scales do not. When scores for several responses are summed or averaged to give a composite score, the two components become: the number (more generally, the proportion) of responses scored in each direction; the mean extremeness of the responses in each direction. The present paper derives a method for estimating the relative contribution of these two components to composite scores, and applies the method to data from several Likert attitude scales.
Reliability of Attitude Scores Based on a Latent-Trait Model Pp. 97-123 in Sociological Methodol-ogy 1991The Reliability of Self-Ratings as a Function of the Amount of Verbal Anchoring and the Number of Categories on the Scale
  • David J F Bartholomew
  • Schuessler
Bartholomew, David J., and Karl F. Schuessler. 1991. "Reliability of Attitude Scores Based on a Latent-Trait Model." Pp. 97-123 in Sociological Methodol-ogy 1991, edited by Peter V. Marsden. Oxford: Basil Blackwell. Bendig, A. W. 1953. "The Reliability of Self-Ratings as a Function of the Amount of Verbal Anchoring and the Number of Categories on the Scale." Journal of Applied Psychology 37:38-41.
EQS-Structural Equations Program Manual. Los An-geles: BMDP Statistical Software Inc. Birkett, Nicholas J. 1986. Selecting the Number of Response Categories for a Likert-Type Scale Sec-tion on Survey Research Methods The Challenge of Response Sets
  • Bentler
  • Peter
Bentler, Peter M. 1989. EQS-Structural Equations Program Manual. Los An-geles: BMDP Statistical Software Inc. Birkett, Nicholas J. 1986. Selecting the Number of Response Categories for a Likert-Type Scale. Proceedings of the American Statistical Association. Sec-tion on Survey Research Methods. Washington, DC: American Statistical Association. Block, Jack. 1965. The Challenge of Response Sets. New York: Appleton-Century-Crofts.
Potential Contributions of Cognitive Research to Questionnaire Design Pp. 104-29 in Cognitive As-pects of Survey Methodology: Building a Bridge Between Disciplines
  • Norman M Bradburn
Bradburn, Norman M., and Catalina Danis. 1984. "Potential Contributions of Cognitive Research to Questionnaire Design." Pp. 104-29 in Cognitive As-pects of Survey Methodology: Building a Bridge Between Disciplines, edited by T. B. Jabine et al. Washington, DC: National Academy Press.
American Panel Study
  • Campbell
  • Philip Angus
  • Warren Converse
  • Donald Miller
  • Stokes
Campbell, Angus, Philip Converse, Warren Miller, and Donald Stokes. 1971. American Panel Study: 1956, 1958, 1960. Ann Arbor: Inter-University Con-sortium for Political and Social Research.
The Manner of Inquiry: An Analysis of Survey Question Form Across Organizations and Over Time New York: Russell Sage FoundationPlus ca Change .. .: The New CPS Election Panel StudyThe Optimal Number of Response Alternatives for a Scale: A Review
  • Converse
  • Howard M Jean
  • Philip E Schuman
  • Gregory B Markus
Converse, Jean M., and Howard Schuman. 1984. "The Manner of Inquiry: An Analysis of Survey Question Form Across Organizations and Over Time." Pp. 283-316 in Surveying Subjective Phenomena, vol. 2, edited by Charles F. Turner and Elizabeth Martin. New York: Russell Sage Foundation. Converse, Philip E., and Gregory B. Markus. 1979. "Plus ca Change...: The New CPS Election Panel Study." American Political Science Review 73:32-49. This content downloaded from 169.229.32.138 on Fri, 9 May 2014 17:19:29 PM All use subject to JSTOR Terms and Conditions Coombs, Clyde H. 1964. A Theory of Data. New York: Wiley. Cox, Eli P. 1980. "The Optimal Number of Response Alternatives for a Scale: A Review." Journal of Marketing Research 17:407-22.
Response Sets and Test ValidityFurther Evidence on Response Sets and Test DesignCoefficient Alpha and Internal Structure of Tests
  • Cronbach
  • Lee
Cronbach, Lee J. 1946. "Response Sets and Test Validity." Educational and Psychological Measurement 6:475-94.. 1950. "Further Evidence on Response Sets and Test Design." Educa-tional and Psychological Measurement 10:3-31.. 1951. "Coefficient Alpha and Internal Structure of Tests." Psycho-metrika 16:297-334.
Analyzing One-Variable, Three-Wave Panel Data: A Comparison of Two Models
  • R Erikson
Erikson, R., S. 1978. "Analyzing One-Variable, Three-Wave Panel Data: A Comparison of Two Models." Political Methodology 5:151-61.
Assessing the Reliability of Linear Composites Pp. 160-75 in Sociological Methodology
  • Vernon L Greene
  • Edward G Carmines
Greene, Vernon L., and Edward G. Carmines. 1979. "Assessing the Reliability of Linear Composites." Pp. 160-75 in Sociological Methodology 1980, edited by Karl F. Schuessler. San Francisco: Jossey-Bass.
Cognitive Aspects of Survey Methodology: Building a Bridge Between Disciplines. Report of the Advanced Research Seminar on Cognitive Aspects of Survey Methodology
  • Thomas B Jabine
  • L Miron
  • Judith M Straf
  • Roger Tanur
  • Tourangeau
Jabine, Thomas B., Miron L. Straf, Judith M. Tanur, and Roger Tourangeau. 1984. Cognitive Aspects of Survey Methodology: Building a Bridge Between Disciplines. Report of the Advanced Research Seminar on Cognitive Aspects of Survey Methodology. Washington, DC: National Academy Press.
Estimation and Testing of Simplex ModelsStatistical Analysis of Sets of Congeneric TestsAnalyzing Psychological Data by Structural Analysis of Covari-This content downloaded from 169.229.32.138 on Fri
  • Joreskog
Joreskog, Karl G. 1970. "Estimation and Testing of Simplex Models." British Journal of Mathematical and Statistical Psychology 23:121-45.. 1971. "Statistical Analysis of Sets of Congeneric Tests." Psychometrika 36:109-33.. 1974. "Analyzing Psychological Data by Structural Analysis of Covari-This content downloaded from 169.229.32.138 on Fri, 9 May 2014 17:19:29 PM All use subject to JSTOR Terms and Conditions Method of Maximum Likelihood. User's Guide, Version 6. Chicago: Scien-tific Software.
The Impact of Verbal Label-ing of Response Alternatives and Branching on Attitude Measurement Reli-ability in Surveys
  • Jon A Krosnick
  • Matthew K Berent
Krosnick, Jon A., and Matthew K. Berent. 1990. "The Impact of Verbal Label-ing of Response Alternatives and Branching on Attitude Measurement Reli-ability in Surveys." Paper presented at the Annual Meetings of the American Association for Public Opinion Research, Lancaster, PA.