
Lawrence J. StrickerEducational Testing Service | ETS · Division of Research and Development
Lawrence J. Stricker
About
107
Publications
13,364
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,946
Citations
Publications
Publications (107)
This chapter is the first of two that present an account of a portion of ETS research conducted in cognitive, personality, and social psychology since the organization’s inception. The topics covered include, in cognitive psychology, the structure of abilities; in personality psychology, response styles and social and emotional intelligence; and in...
Developmental psychology was a major area of research at ETS from the late 1960s to the early 1990s. This work was a natural extension of the programs in cognitive, personality, and social psychology that had begun shortly after the organization’s founding in 1947, consistent with Henry Chauncey’s vision of investigating intellectual and personal q...
This study explores stereotype threat on low-stakes tests used in a large-scale assessment, math and reading tests in the Education Longitudinal Study of 2002 (ELS). Issues identified in laboratory research (though not observed in studies of high-stakes tests) were assessed: whether inquiring about their race and gender is related to the performanc...
Previous research assessing Obama's effectiveness as a role model in alleviating the effects of stereotype threat on Black Americans' test performance yielded provocative though conflicting results. A field study with research participants observed that Black–White mean differences were not detectable at points in his 2008 presidential campaign whe...
Nathan Kogan, professor emeritus of psychology at the New School for Social Research and visiting scholar at the Educational Testing Service (ETS), died on April 28, 2013, in Princeton, New Jersey, at the age of 86. The son of immigrants from Poland and Ukraine, Nat was born in Bethlehem, Pennsylvania, on May 2, 1926. Nat described himself as a gen...
This is an account of a portion of the research on cognitive, personality, and social psychology at ETS since the organization's inception. The topics in cognitive psychology are the structure of abilities; in personality psychology, response styles and social and emotional intelligence; and in social psychology, prosocial behavior and stereotype t...
This study assessed the relationships between characteristics of biographical items from the Armed Services Applicant Profile and the items' validity in predicting the retention of enlisted military personnel. Item characteristics were appraised with ratings by expert judges and test takers, word and alternative counts, and response latencies. Item...
This study assessed the relationships between characteristics of biographical items from the Armed Services Applicant Profile and the items’ validity in predicting the retention of enlisted military personnel. Item characteristics were appraised with ratings by expert judges and test takers, word and alternative counts, and response latencies. Item...
The principal aims of this study, a conceptual replication of an earlier investigation of the TOEFL® computer-based test, or TOEFL CBT, in Buenos Aires, Cairo, and Frankfurt, were to assess test takers' reported acceptance of the TOEFL Internet-based test, or TOEFL iBT™, and its associations with possible determinants of this acceptance and with te...
This construct validation study investigated the factor structure of the Test of English as a Foreign Language (TM) Internet-based test (TOEFL (R) iBT). An item-level confirmatory factor analysis was conducted for a test form completed by participants in a field study. A higher-order factor model was identified, with a higher-order general factor (...
This study assessed the invariance in the factor structure of the Test of English as a Foreign Language™ Internet-based test (TOEFL® iBT) across subgroups of test takers who differed in native language and exposure to the English language. The subgroups were defined by (a) Indo-European and Non-Indo-European language family, (b) Kachru's classifica...
The present study investigated the factor structure of a field trial sample of the Test of English as a Foreign Language™ Internet-based test (TOEFL® iBT). An item-level confirmatory factor analysis (CFA) was conducted for a polychoric correlation matrix of items on a test form completed by 2,720 participants in the 2003–2004 TOEFL iBT Field Study....
This is a rebuttal to Danaher & Crandall's (2008) rejoinder to Stricker & Ward's (2004) article. The article reports 2 field experiments concerning the effects of inquiring about gender and ethnicity on the performance of women and Black students taking operational tests. The rejoinder's conclusion (after reanalyzing our data) that several thousand...
The aim of this study was to assess test takers' attitudes and beliefs about an admissions test used extensively in graduate schools of business in the United States, the Graduate Management Admission Test (GMAT), and the relationships of these attitudes and beliefs to test performance. A set of attitude and belief items was administered by compute...
Steele and Aronson (1995) found that the performance of Black research participants on ability test items portrayed as a problem-solving task, in laboratory experiments, was affected adversely when they were asked about their ethnicity. This outcome was attributed to stereotype threat: Performance was disrupted by participants' concerns about fulfi...
A measure of social status was devised: a multiple-choice question that obtains occupational information and is scalable with both the Duncan Socioeconomic Index and the Siegel Prestige Scale. The correlations of this device with other measures of social status and with tests of general ability were assessed in a study of white and black Navy recru...
A bstract
Several studies have found a substantial association between subjects' judgments about the co‐occurrence of personality items and the items' actual co‐occurrence, lending support to the validity of implicit personality theory. This work has recently been called into question by an investigation of Mirels (1976) that reported a nonsignific...
The Tzeng and Tzeng (1982) criticisms of the assumptions underlying the Jackson, Chan, and Stricker (1979) study of Implicit Personality Theory fail to take account of the relevant empirical data. For example, the key contention of Tzeng and Tzeng–that the measures of judged and empirical trait-co-occurrence used by Jackson et al. were not comparab...
This study evaluated the connection between gender differences in examinees’ familiarity, interest, and negative emotional reactions to items on the Advanced Placement Psychology Examination and the items’ gender differential item functioning (DIF). Gender DIF and gender differences in interest varied appreciably with the content of the items. Gend...
The aim of this study was to appraise whether different forms of the SA T used since the mid-1970s varied in their correlations with academic performance criteria in the same cohort of examinees. A 1975 form and a 1985 form were administered to two random samples of high school juniors, and self-reported grade point average and high school rank wer...
This study assessed the factor structure of the LanguEdge™ test and the invariance of its factors across language groups. Confirmatory factor analyses of individual tasks and subsets of items in the four sections of the test, Listening, Reading, Speaking, and Writing, was carried out for Arabic-, Chinese-, and Spanish-speaking test takers. Two fact...
Abiographical inventory has been used in the selection of students for naval aviation training since World War II, and its validity in predicting their retention in this training has been well established. This study investigated the constructs underlying the inventory and their relations to student retention criteria. A factor analysis of the item...
The purpose of this study was to replicate previous research on the construct validity of the paper-and-pencil version of the TOEFL test and extend it to the computer-based TOEFL. Two samples of GRE test takers were used: native speakers of English specially recruited to take the computer-based TOEFL, and ESL test takers who had routinely taken the...
This study investigated (a) the ability to minimize or eliminate stereotype threat by reducing the difficulty of items administered via a computer-adaptive version of the Graduate Record Examinations General Test; and (b) the generalizability of these findings for Black students as well as women, and for verbal as well as quantitative sections of t...
The principal aim of this study was to assess test takers’ acceptance of the computer-based version of the Test of English as a Foreign Language (TOEFL), and the links between this acceptance, general attitudes about admissions tests, other possible determinants, and test performance. A secondary goal was to evaluate differences in the pattern of r...
The aim of this study was to demonstrate the feasibility of biographical inventories free of the limitations common to many current biographical measures by constructing and validating an inventory composed of homogeneous scales, with item content that is factual and fair, to assess personality traits predictive of leadership. The experimental inve...
This study investigated the extent and nature of preparation for the Pre-Professional Skills Tests (PPST), and the reasons for preparing or not preparing. The PPST, a high-stakes test with a high failure rate, is used for admission to teacher education programs and for teacher licensing. Recent test takers were surveyed. Preparation was limited and...
This study explored individual differences in educational disadvantage—deficits in formal and informal education in the school, home, and elsewhere—in the SAT® test-taking population. Data on variables that reflect educational disadvantage were obtained from SAT I: Reasoning Test takers via a mail survey and from archival records for their schools...
This study investigated the extent and nature of preparation for the Pre-Professional Skills Tests (PPST®), the reasons for preparing or not preparing, and differences in these results for White and minority-group test takers and for middle-class and working-class test takers. Recent PPST test takers were surveyed. Preparation for the PPST was limi...
Measures of accomplishments-notable attainments that have been publicly recognized-have promise for research and practice in education and other applied fields. In this study we investigated sex and ethnic group differences on these measures for students bound for graduate school. Examined were 6 accomplishments scales (Academic Achievement, Leader...
The principal aim of this study was to assess examinees' acceptance of the TOEFL-CBT, and its associations with possible determinants of this acceptance and with test performance. A secondary goal was to evaluate differences in the pattern of results for examinees from different countries. A questionnaire concerning attitudes about the test, famili...
This study explored the value of obtaining a just noticeable difference (JND) for a test--the difference in scores needed before observers detect a difference in examinees' behavior--as a means of interpreting the practical meaning of scores. Classical psychophysical methods were adapted and applied to the scores of foreign teaching assistants (TAs...
This study explored the value of obtaining a just noticeable difference (JND) for a test—the difference in scores needed before observers detect a difference in examinees' behavior—as a means of interpreting the practical meaning of scores. Classical psychophysical methods were adapted and applied to the scores of foreign teaching assistants (TAs)...
This study assessed the usefulness ofresponse latency data for biographical inventory items in enhancing the inventory's validity. The Armed Services Applicant Profile (ASAP) was administered by computer to Navy recruits, and the regular score, la- tency-weighted scores, and measures of deviant latencies were obtained. The la- tency-weighted scores...
Steele and Aronson (1995) found that the performance of African-American subjects on test items portrayed as a problem-solving task, in a laboratory experiment, was adversely affected when they were asked about their ethnicity. This outcome was attributed to “stereotype threat”: performance was disrupted by the subjects' concerns about fulfilling t...
Laboratory experiments by Steele and Aronson (1995) found that African-American subjects' performance on difficult verbal items, described as a verbal problem-solving task, was adversely affected when they were asked about their ethnicity just before working on the items. These results were attributed to stereotype threat: asking about ethnicity pr...
The aim of this study was to demonstrate the feasibility of biographical inventories free of the limitations common to many current biographical measures by constructing and validating an inventory composed of homogeneous scales, with item content that is factual and fair, to assess personality traits predictive of leadership. The experimental inve...
This study evaluated the connection between gender differences in Advanced Placement (AP) Psychology students' familiarity, interest, and negative emotional reaction to items on the AP Psychology test and the items' gender differential item functioning (DDF). Gender DDF and gender differences in interest and emotional reaction varied appreciably wi...
This study explored the value of obtaining a Just Noticeable Difference (JND) — the difference in scores needed before observers discern a difference in examinees' English proficiency – for the current Test of Spoken English (TSE®) as a means of interpreting scores in practical terms, using college students' ratings of their international teaching...
This study identified the salient work values of college bound and noncollege bound high school students from their perceptions of occupations. Multidimensional scaling analyses of the students' judgments about the similarity of occupations extracted three factors in the college bound sample and seven in the noncollege bound sample. The same blue c...
This study appraised the utility of SAT scores in combination with grades in high school courses and the number and quality of these courses in predicting college grades in various fields of study, with the object of providing students with predictions of their academic performance for guidance purposes. The possible impact of this feedback on the...
This study examined three questions about measures of accomplishments--notable attainments that have been publicly recognized: their pseudoipsativity, the correspondence between quantity and quality scores, and their dimensionality. Comparable samples of graduate students described their accomplishments on a questionnaire or judged the similarity o...
The relations between examinee background characteristics and performance on the Graduate Record Examinations (GRE) General Test were appraised by a structural equation modeling analysis. The examinees' initial characteristics (gender, ethnicity, parental education, geographic region, and age) had modest relations with their test performance. Of th...
This study appraised the validity of SAT scores, in combination with grades in high school courses and the number and difficulty level of these courses, in predicting college grades in various fields of study, with the objective of providing SAT takers with predictions of their academic performance in different fields for guidance purposes. The pos...
In 1992, Electricité de France (EDF) decided to improve the degree to which radiological protection was incorporated in the overall management of the utility and set itself the objective of ensuring the same level of protection both for contractors' and EDF's workers. This decision was taken in a context marked by a deterioration in exposure values...
Compared the effectiveness of several methods for statistically adjusting college grade point average (GPA) criteria for course and departmental differences in grading standards, using 1st-semester grades from an entire entering class at a large state university. Most of the adjusted GPAs produced by these methods functioned similarly and, despite...
The aim of this study was to delineate departmental differences in the length of time that doctoral students take to receive
their degrees and the institutional characteristics linked with it. Variables describing graduate departments in three disciplines
(chemistry, English, and psychology) and their parent universities were obtained from availabl...
Appraised 2 explanations for sex differences in over- and underprediction of college grades by the Scholastic Aptitude Test: sex-related differences in (1) the nature of the grade criterion and (2) the variables associated with academic performance. An entire freshmen class at a large state university was studied. Women's GPA was underpredicted (an...
This study compared the effectiveness of several existing and proposed methods for statistically adjusting college GPAs for course and departmental differences in grading standards, using first-semester grades from an entire entering class at a large state university. Most of the adjusted GPAs produced by these methods functioned similarly and, des...
This study examined the role that sex-related differences in the nature of the grade criterion and in variables associated with academic performance play in the over- and underprediction of college grades by the SAT when the test is used alone and in combination with high school grades. An entire freshman class at a large state university was studi...
This study assessed the usefulness of response latency data for biographical inventory items in improving the inventory's validity. The Armed Services Applicant Profile (ASAP) was computer administered to Navy Recruits, and the regular score, latency-weighted scores, and measures of deviant latencies were obtained. The latency-weighted scores did n...
The goal of this study was to appraise the construct validity of a videotape-based situational test, the Interpersonal Competence Instrument (ICI), that calls for the examinee to take the role of a superior talking to a subordinate in a business setting. The ICI was administered to college students along with a battery of other devices: self-rating...
A recent multidimensional scaling analysis of item response data for the TOEFL ® identified clusters of items in the test sections and suggested that these clusters might be more homogeneous and more distinct than their parent sections, and hence better suited for diagnostic use. The present study explored the feasibility and value of using such cl...
The aim of this study was to appraise whether different forms of the SAT used since the mid 1970s varied in their correlations with academic performance criteria in the same cohort of examinees. A 1975 form and a 1985 form were administered to equivalent samples of high school juniors, and self-reported grade-point average and high school rank were...
Multidimensional scaling was used to analyze item response data for the Test of English as a Foreign Language (TOEFL) to uncover the dimensions underlying the test. Four dimensions were identified for samples varying in native language and level of English proficiency: 3 corresponded to the test's sections and 1 was an end-of-test phenomenon. Dimen...
The aim of this study was to construct and validate a biographical inventory to measure personality traits that are predictive of leadership. The experimental inventory, consisting of tentative scales for Dominance, Emotional Stability, Need for Achievement, Self-Confidence, and Sociability, was administered to 642 incoming midshipmen at the Naval...
Responses on the TOEFL© test may reflect both the influence of the examinees' native language and their level of English proficiency. The aim of this study was to appraise the effect of these examinee variables on the structure of the test. The interrelations among TOEFL items, using all of the information provided by the various responses to the i...
The aim of this study was to appraise the extent to which the Graduate Record Examinations (GRE) General Test measures the same constructs for older test takers that it does for younger examinees. Confirmatory factor analyses of the items in the test were carried out for three samples of examinees, 20 to 29 years old, 30 to 39 years old, and 40 to...
The membership of the Society for Personality and Social Psychology was surveyed in order to determine its opinions about a variety of matters concerning the Society, including the relations of the Society and its members with the American Psychological Association, the Society's structure, its publications, and its convention program. How the surv...
The aim of this study was to appraise the extent to which the GRE General Test has the same construct validity for older test takers as it has for other examinees by comparing the interrelationships among the test items for subgroups of examinees defined by age. The items in the test were factor analyzed for three samples of examinees, 20-29 years...
The stability was evaluated of a partial correlation index and two other methods-comparisons of item characteristic curves and comparisons of item difficulties-in assessing race (white vs. black) and sex differences in the performance of verbal items on the GRE Aptitude Test. In general, the partial correlation index, like the other indexes, exhibi...
This paper is the second in a series of reports emanating from a four‐year research program meant to further knowledge of college and graduate admissions testing and handicapped people. The purpose of this paper is to document existing research on the test performance of handicapped people with respect to admissions and other similar tests. In addi...
The aim of this study was to evaluate the effects of disclosing a Scholastic Aptitude Test (SAT) form on the retest performance of examinees who initially took the disclosed form and subsequently took a different form. Retest performance was compared for three ran dom samples of examinees who took the SAT as high school juniors in the May 1981 admi...
Verbal items on the GRE Aptitude Test were analyzed for race (white vs. black) and sex differences in their functioning, using a new procedure—item partial correlations with subgroup standing (race or sex), controlling for total score—as well as two standard methods—comparisons of subgroups' item characteristic curves and item difficulties. The par...
The aim of this study was to determine the major dimensions of social stratification for whites as well as blacks. A survey was conducted with household heads in the Toledo, Ohio area of both races, using an interview that covered a comprehensive set of potentially important variables. Eighteen first-order factors were found for whites and 19 for b...
A prototype measure of interpersonal competence, designed to measure effectiveness in dealing with other people, was developed-the Interpersonal Competence Instrument (ICI). The ICI is based on the videotape presentation of scenes of subordinates talking to a superior in a business setting. The examinee takes the role of the superior, his or her ta...
The value of noncognitive measures in medical school admissions was assessed in light of the exist ing literature. These measures appear to have limit ed usefulness in predicting success in academic work but may be valuable in forecasting both per formance in clinical training and performance as a physician, as well as forecasting choice of the typ...
The items on the GRE Aptitude Test were analyzed for race (white vs. black) and sex differences in their functioning, using three procedures for identifying items that perform differently: item partial correlations with subgroup membership (race or sex), controlling for total score; comparisons of item characteristic curves for subgroups; and plots...
This study is concerned with the ability of standard indexes, of socioeconomic status and similar devices to measure major dimensions of social stratification for whites and blacks. Factor analyses of survey data for samples of both races uncovered five matching factors and two others unique to a single sample, all of which seem to represent import...
This study is concerned with the ability of standard indexes of socioeconomic status and similar devices to measure major dimensions of social stratification for whites and blacks. Factor analyses of survey data for samples of both races uncovered five matching factors and two others unique to a single sample, all of which seem to represent importa...
This study determines major dimensions of social stratification for whites and blacks and explores the existence of distinct social classes. It is based on a survey conducted on 225 white and 206 black household heads in Toledo, Ohio, using a highly structured interview that included variables reflecting major theoretical dimensions of stratificati...
Examined the factor structure of the Personality Research Form (PRF) and its relations with response styles. Ss were 71 11th and 12th graders who also completed a battery of response bias measures. In general, the PRF content scales correlated moderately with each other and with measures of acquiescence, social desirability, and defensiveness respo...
To assess the validity of naive Ss' implicit personality theories, the correspondence among the theories, and the influence of social desirability on them, 18 female high school seniors classified the items from the Minnesota Multiphasic Personality Inventory Psychopathic Deviate scale into clusters representing different traits. These clusters agr...
This study's aim was to explore the relationship of acquiescence, social desirability (SD), and defensiveness response styles with first, second, and any higher order factors on the 16 PF. All the various kinds of response bias indexes were appreciably correlated with the first order factor scales. Each kind of response style measure predominantly...
Presents a fable that was suggested by Richard H. Willis's models of social influence and his response to the authors' findings about them. The main topics discussed in this fable include social influence and conformity. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Examined the dimensionality of responses of high school students to group pressure and its generality across procedures employing different kinds of social situations and experimental tasks. 2 group pressure procedures were based on the Asch situation (counting clicks and responding to attitude items), and 2 others involved questionnaires with fict...
Explored test-wiseness on self-report personality scales, using measures of accuracy in estimating the frequency of endorsement of personality items, estimating their social desirability, and identifying and "keying" items that measured the same factor, as well as indexes of ability to change scores on standard personality scales when they were adm...
This study examined the dimensionality of responses to group pressure and their generality across procedures employing different kinds of social situations and experimental tasks. Two group pressure procedures were based on the Asch situation (counting clicks and responding to attitude items), and two procedures involved questionnaires with fictiti...
Reviews studies on the use of deception in psychological research, indicates other directions that such investigations might take, and suggests solutions to the problems posed by this tactic. Deception is widely used, but its efficacy is rarely evaluated. Ss' suspicion is a useful index of effectiveness and the only aspect that has been investigate...
The research reported in this study explores two problematic avenues of conformity research: (1) the widely assumed generality of diverse measures of group pressure, and (2) the dimensionality of conformity, anticonformity, and independence. These two conformity situations, present and nonpresent norm groups, used two tasks (an objective counting o...
This study investigated the usefulness of desirability judgments of personality items as an indirect measure of the judge's personality. Self-reports and judgments were compared on separate forms of five personality scales. With a few striking exceptions, the two kinds of measures were generally unrelated, and they had markedly different patterns o...
REVIEWS PUBLISHED STUDIES TO APPRAISE THE PREVALENCE AND EFFECTIVENESS OF DECEPTION IN PSYCHOLOGICAL RESEARCH. SOME SUBSTANTIVE AREAS RELY HEAVILY ON DECEPTION, AND ARE HIGHLY CONSISTENT IN THEIR USE OF CERTAIN KINDS OF DECEPTIONS. FEW STUDIES USING THIS TACTIC REPORTED ANY INFORMATION ABOUT SS' SUSPICIONS OF THE DECEPTIONS, REGARDLESS OF THE STUDI...
Ss' suspicions were appraised about 2 conformity procedures--a simulated-group version of the Asch situation and questionnaires with fictitious norms. Many Ss suspected that the purpose of both procedures was to determine whether their responses would be influenced by others, that they did not hear spontaneous responses by others in the simulated g...
Replicated and extended earlier studies which found that 2 indirect measures of compulsivity (the Strong Accountant scale and a ratio score of reading speed to vocabulary) moderated the correlations of other Strong interest scales with grade-point average (GPA) for male engineering freshmen––the correlations were higher for the less compulsive stud...
This study, the last in a series by the present authors, investigated the ability of the Myers-Briggs Type Indicator, a self-report inventory, to predict grades and dropout at Wesleyan and Caltech, both for classes and, using estimates, for Caltech applicants. Continuous scores derived from the Indicator's four scales had some ability to predict th...
This study examined the relationship between the difficulty of items on an achievementဠtype test of report writing and (a) the items' correlations with the test's criticalness response style score, and (b) the items' correlations with the test's content score. Item readability and item format variables were also analyzed. The major findings were th...
The Myers-Briggs Type Indicator is a self-report inventory which is intended to measure four variables stemming from the Jungian personality typology: extraversion-introversion, sensation-intuition, thinking-feeling, and judging-perceiving. The construct validity of each of its scales was assessed in a series of studies which investigated the scale...