ArticlePDF Available

Looking for evidence of the Dunning-Kruger effect: an analysis of 2400 online test takers

Authors:
  • Ulster Institute for Social Research

Abstract and Figures

The Dunning-Kruger effect is a well-known psychological finding. Unfortunately, there are two aspects of the finding, one trivial, indeed a simple statistically necessary empirical pattern, and the other an unsupported theory that purports to explain this pattern. Recently, Gignac & Zajenkowski (2020) suggested two ways to operationalize and test the theory. We carried out a replication of their study using archival data from a larger dataset. We used two measures of self-estimated ability: estimated sumscore (correct responses), and estimated own-centile. We find no evidence of nonlinearity for either. We find evidence of heteroscedasticity for self-centile estimates, but not raw score estimates. Overall, the evidence was mostly inconsistent with Dunning-Kruger theory.
Content may be subject to copyright.
Submitted: 15th of February 2021 DOI: 10.26775/OP.2021.08.29
Published: 29th of August 2021 ISSN: 2597-324X
Looking for evidence of the Dunning-Kruger eect: an
analysis of 2400 online test takers
Emil O. W. Kirkegaard Arjen Gerritsen
OpenPsych
Abstract
The Dunning-Kruger eect is a well-known psychological finding. Unfortunately, there are two aspects of the finding, one
trivial and the other an unsupported theory that purports to explain this pattern. Recently, (Gignac & Zajenkowski,2020)
suggested two ways to operationalize and test the proposed theory. They found no support for the theory’s predictions. We
carried out a replication of their study using archival data from a large dataset of online subjects (n = 2,413). We used two
measures of self-estimated ability: estimated sumscore (correct responses), and estimated own-centile. Both had strong
correlations with objective performance (r’s .50 and .54), but we find no evidence of nonlinearity for either. We find some
limited evidence of heteroscedasticity for self-centile estimates, but not raw score estimates. Overall, the evidence was
mostly incongruent with Dunning-Kruger theory.
Keywords: Dunning-Kruger eect, regression towards the mean, intelligence, science knowledge, self-estimated
intelligence, self-perception, replication
1 Introduction
The Dunning-Kruger eect is one of the most popular psychological findings. The original study has collected
about 6,600 citations on Google Scholar since being published in 1999 (Kruger & Dunning,1999). The typical
Dunning-Kruger pattern is shown in Figure 1.
In the typical Dunning-Kruger pattern, there is a positive relationship between own measure ability and self-
estimated (“perceived”) ability or performance. However, it can be seen by comparing the two lines that below
average persons tend to overestimate themselves quite strongly, while above average persons underestimate
themselves, but less strongly than the below average overestimate themselves. The theory advocated to explain
this pattern is that below average subjects not only suer from a lack of ability to perform well, but also suer
from below average metacognitive ability to recognize their own poor performance.
However, criticism of the original study were soon published, though these were mostly ignored (Ackerman et al.
2002;Krueger & Mueller 2002; see Schimmack 2020 for a review). In fact, the familiar Dunning-Kruger pattern
arises from two simple facts. First, self-estimated ability is positively, but imperfectly, correlated with actual
ability. A large meta-analysis found a mean observed r = .33 (Freund & Kasten,2012). Thus, from a regression
perspective, the true ability level of a person is much closer to the mean than a given estimate, which is why
this has sometimes also been referring to as an example of regression towards the mean (Krueger & Mueller
2002; see in general, Dalliard 2017). Second, there is a general tendency to overestimate own performance in
general (called the illusory superiority eect or the better-than-average eect, (Zell et al.,2020)). When these
two facts are combined, they yield the familiar Dunning-Kruger pattern, shown in Figure 2.
The simulated results closely approximate the real results. Though everybody has some accuracy, the below
average subjects are more in error than the above average subjects. Those high in ability tend to underestimate
Ulster Institute for Social Research, United Kingdom, Email: emil@emilkirkegaard.dk
Independent Researcher, The Netherlands, Email: arjengerritsen@icloud.com
1
Published: 29th of August 2021 OpenPsych
Figure 1: Typical Dunning-Kruger pattern. Reproduced from (Kruger & Dunning, 1999).
Figure 2: An example of the Dunning-Kruger pattern. Based on simulated data from
http://emilkirkegaard.dk/
understanding_statistics/?app=DunningKruger
. In this case, a correlation of .30 is assumed between true and self-rated
scores, as well as a 7.5 IQ (0.50 d) overestimation bias. The values are then binned into 4 bins.
2
Published: 29th of August 2021 OpenPsych
themselves, but less so than the below average subjects overestimate themselves. Because this pattern above
arises from two simple statistics facts mentioned above, there is nothing for the meta-cognitive theory advanced
by Dunning, Kruger and others to explain, leaving it in an uncertain position.
Recently, however, (Gignac & Zajenkowski,2020) proposed two dierent ways to test the theory. The core claim
of the Dunning-Kruger theory is that below average subjects on some trait are lacking in a metacognitive ability
to estimate themselves correctly in some sense. Their purported evidence is the greater dierence between their
centile estimates and real estimates. Since this arises trivially from the above two facts, this is not evidence of
the Dunning-Kruger theory.
1
However, a dierent way to operationalize this theory, that is, derive a testable
prediction, is that it makes a claim that below average persons should have a weaker association between their
self-estimates and the real estimations. In statistical terms, the relationship should exhibit heteroscedasticity
with greater residual variance in the below average ability region (‘dual burden’). A second derived prediction
is that the association between self-estimated and real ability should diverge from the overall trend, with a
weaker slope in the below average region. Both of these predictions involve the below average persons being
worse in some sense at predicting their own performance level. These predictions are testable using existing
methods and data. (Gignac & Zajenkowski,2020) tested both predictions in a dataset of 929 subjects who had
taken the Raven’s advanced progressive matrices test (a standard nonverbal intelligence test) and who had
estimated themselves on a 1-25 scale. First, they found no evidence of heteroscedasticity using the Glejser test.
This test involves saving the residuals from the linear model (self-estimated ability
objectively measured
ability, where
denotes “regressed on”), converting to absolute values, and correlating with the predictor (i.e.,
objectively measured ability). The correlation was -.05 with 95 % confidence interval of -.11 to .02. Second, they
looked for a nonlinear association using a model comparison with a quadratic model. The model comparison
found no incremental validity of the nonlinear model (incremental R
2
< 1 %). They plotted the data using
a smoothing function (local regression, LOESS), which also showed no notable deviation from linearity. The
purpose of this paper was to replicate the findings of (Gignac & Zajenkowski,2020) in a new and larger sample
using more robust methods for testing for heteroscedasticity and nonlinearity.
2 Data
We used archival data from an online pilot test of a new 25-item science knowledge scale under development
(the items can be found in the supplementary materials). During test development, we posted a link to a
questionnaire on Twitter with a science knowledge scale. The tweet was retweeted by some prominent users
and went viral, resulting in about 2400 subjects taking the test. Aside from their performance on the items,
they filled out a few related questions such as their age and educational attainment. Two of these asked them to
assess their own performance:
The previous page featured 25 questions testing your knowledge. How many correct answers do you
think you gave?
With regards to your knowledge of science, what percentile of the general population do you think you
are in?
All data, R analysis code, and materials are available at
https://osf.io/fhqap/
. The R notebook can also be
viewed at https://rpubs.com/EmilOWK/Dunning_Kruger_2021.
3 Results
We scored the cognitive ability data using both simple sum scores (sum of correct answers) and item response
theory (IRT) analysis (DeMars,2010), using the 2PL model as implemented in the mirt package for R (Chalmers
et al.,2020). The latent variable extracted this way we labelled g, since it approximates the general factor of
intelligence (Jensen,1998). Figure 3shows the distribution of scores by scoring method. Their correlation was
1
In the Bayesian sense, to be evidence of something, the data must be more probable on some model compared to another model. However,
since the statistical artifact model (overestimation + imperfect linear association) model predicts the exact same Dunning-Kruger pattern,
this pattern is not evidence of any Dunning-Kruger model, but must be neutral. Insofar as the simple model involves only known facts
and does not require positing a new mechanism, its prior probability is higher is thus the posterior is also higher given the available
evidence.
3
Published: 29th of August 2021 OpenPsych
Figure 3: Distributions of scientific knowledge by scoring method. Left panel shows sum scores, and the right panel, item
response theory standard scores (density curve overlaid).
.95. The estimated reliability was .74 for the actual data, and .74 with an assumed perfect normal distribution.
2
These values are probably underestimates of the test-retest reliability. For instance, the retest reliability for the
similar WAIS-R information scale was .81 in a sample of 101 elderly persons who were tested again after 1 year
(Snow et al.,1989), and reached .92 in a representative sample of 100 Australians (Shores & Carstairs,2000).
Cronbach’s alpha for the same data was .68.
Both distributions were very normal despite the somewhat unusual recruitment method and the untested test.
Since we lack normative data, we don’t know which Greenwich IQ (British norms) the results compare to, but
this is likely an above average group, as it was recruited from Twitter users’ followers who post a lot of science.
Table 1shows descriptive statistics for the numerical variables.
Table 1: Descriptive statistics for numerical variables. SD = standard deviation, MAD = median absolute deviation from
the median. g = general intelligence factor.
Var n Mean Median SD MAD Min Max Skew Kurtosis
score 2413 15.28 15.00 3.48 2.97 0.00 25.00 -0.15 0.03
g 2412 0.00 0.00 1.00 1.04 -3.84 2.66 -0.03 -0.23
centile guess 2406 68.26 70.00 20.21 19.27 0.00 100.00 -0.76 0.16
score guess 2408 14.06 14.00 4.29 4.45 0.00 25.00 -0.21 -0.27
age 2412 35.92 33.00 13.44 11.86 12.00 130.00 1.54 5.36
time taken min 2413 20.19 8.85 169.65 3.78 0.37 5619.52 27.39 821.36
The average centile guess was 68th, which is above average (d = 0.47 above the 50th centile). However, since this
was a self-selected sample and the question asked about the general population, this estimate is not necessary too
high. The mean score guess was 14 and the mean score obtained was 15.3, thus subjects slightly underestimated
their ability level. The mean age was 36 with a standard deviation of 13.4. This is somewhat younger than the
adult population average (40-45), but much closer to representative than typical college/university student
samples. The questionnaire did not ask about sex/gender, so the distribution is unknown. However, based on
prior surveys by the author, it is likely to be about 90 % male. The median time to take the survey was 8.9
minutes. The educational attainment levels were: 37 % bachelor’s, 25 % master’s, 12 % doctorate. Considering
2For details of the calculation, see documentation for marginal rxx() and empirical rxx() in the mirt package.
4
Published: 29th of August 2021 OpenPsych
Figure 4: Scatterplot of relationship between two dierent self-estimated ability variables.
that much of the sample was too young to have completed their education entirely, this is a relatively elite
sample.
We had two dierent sets of self-estimates: one based on the raw score and another based on the centile relative
to the general population. Surprisingly, the estimates correlated only at .61, as shown in Figure 4. In an
ideal world, these two variables should be near perfectly correlated. However, since they were not, this opens
questions about dierential validity and perhaps incremental validity.
The distributions of the self-estimates were otherwise unremarkable, as shown in Figure 5.
Moving on to the main tests, since we had two measures of self-estimated ability, we had two main tests of the
Dunning-Kruger eect. Figures 5(5a-5b) shows the results for the nonlinear fits.
Moving on to the main tests, since we had two measures of self-estimated ability, we had two main tests of the
Dunning-Kruger eect. Figures 6(6a-6b) shows the results for the nonlinear fits.
The correlations are relatively strong: r’s .52 and .48, for score
∼∼
self-estimated score, and g
∼∼
self-estimated
ability centile, respectively (both p’s < .001). The relationship between self-estimated ability and objectively
measured ability is close to linear with the exception of the area below about -2z. The upward pattern is caused
by a few outliers with very poor scores and maximum self-rated ability. These are likely not serious survey
responses (Alexander,2013). We left the outliers in the dataset here to illustrate the dangers of not plotting the
data for testing purposes. Figure 7shows the same plots but with data points beyond 2.5z in either direction
removed (21 cases removed).
We see now in the plots that the association is now near-perfectly linear. To be fair, when tested for nonlinearity
using a model comparison (natural spline model vs. linear model) using a likelihood ratio test (specifically,
lrtest() from the rms package, (Harrell,2019)), one finds small p values (p’s .004 and p < .001), and thus
evidence for nonlinearity, but the deviation from linearity was very small and not worth caring about (model
R2 adj. changes: 0.3 % and 1.5 %). Thus, contrary to the predictions of the Dunning-Kruger theory, we find
essentially no nonlinearity in the estimates.
Turning to the question of heteroscedasticity, we employed the same method as in (Kirkegaard,2021). The
approach is as follows:
First, the model of interest is fit. This is the statistical model that wants to evaluate for heteroscedasticity.
5
Published: 29th of August 2021 OpenPsych
(a) (b)
Figure 5: Distributions of self-estimated science knowledge.
(a) (b)
Figure 6: Scatterplots showing linear (orange) and LOESS (blue; locally estimated scatterplot smoothing) fits for self-
estimated ability and measured ability. Left plot: sumscore and self-estimated sumscore. Right plot: item response theory g
score and self-estimated ability centile.
Second, the residuals are saved, standardized, and then converted to positive (absolute) values.
Third, linear and nonlinear models are then fit to the residuals using the predictor of interest to look for
evidence of heteroscedasticity.
In the simulation study carried out by (Kirkegaard,2021), it was found that this approach was able to detect
6
Published: 29th of August 2021 OpenPsych
Figure 7: Scatterplots showing linear (orange) and LOESS (blue; locally estimated scatterplot smoothing) fits for self-
estimated ability and measured ability. Left plot: sumscore and self-estimated sumscore. Right plot: item response theory g
score and self-estimated ability centile. Outliers beyond 2.5 z removed.
real heteroscedasticity, and without excessive false positives. It can also detect the dierence between linear and
nonlinear heteroscedasticity, though not with optimal statistical properties (elevated false positive rates with
regards to confusion between types of heteroscedasticity). Figure 8illustrates the concept of heteroscedasticity.
For the g and centile guess relationship, we find strong evidence of linear heteroscedasticity, and it is concen-
trated as increased residual variance in the below average region. Figure 9shows the estimated 10 and 90th
centiles, using quantile general additive model smoothing.
Figure 8: Types of heteroscedasticity. Left: no heteroscedasticity (homoscedasticity), 2) monotonic linear increasing
heteroscedasticity, and 3) nonmonotonic nonlinear heteroscedasticity.
7
Published: 29th of August 2021 OpenPsych
Figure 9: Estimated 10th and 90th centiles of ability variables for the complete dataset.
Figure 10: Estimated 10th and 90th centiles of ability variables for the dataset with outliers removed.
It can be seen that the left plot shows essentially no heteroscedasticity (i.e., the spread around the regression
line is constant), while the right plot shows some nonmonotonic nonlinear heteroscedasticity. However, we
reasoned this was likely due to the outliers at the very low end of the ability, as was seen in the prior section.
Thus, we reran the tests on the reduced dataset. Figure 10 shows the results.
The pattern for the centiles (right plot) is now simpler. The eect size of the heteroscedasticity seen is not large:
about 2 % of the variance in the residuals can be explained by the predictor variable (p < .0001, linear rank
data test). In contrast, the model adj. R
2
for the sumscores is 0 % (p = .07, linear rank data test, left plot). Thus,
we find very little heteroscedasticity in the estimates, contrary to predictions of the Dunning-Kruger theory.
Finally, we modeled the data to see if the two ways of measuring self-estimated ability had incremental validity
8
Published: 29th of August 2021 OpenPsych
Table 2: Correlation matrix. Above diagonal: results based on outlier-filtered dataset. Below diagonal: results based on all
datapoints. All correlations p < .0001.
Score g Score guess Centile guess
Score 1.00 0.95 0.53 0.47
g 0.95 1.00 0.54 0.50
Score guess 0.52 0.53 1.00 0.61
Centile guess 0.46 0.48 0.61 1.00
Table 3: Regression model results for incremental validity tests. *** = p < .001. Outcome variable in the 3 leftmost models
= sumscore, outcome variable in 3 rightmost models = item response theory g factor score.
Predictor
Sumscore +
raw guess
Sumscore +
centile guess
Sumscore
combined
g + centile
guess
g + raw
guess
g combined
Intercept 0.00 (0.017) 0.00 (0.018) 0.00 (0.017) 0.00 (0.018) 0.00 (0.017) 0.00 (0.017)
score guess
0.53
(0.017***)
0.38
(0.021***)
0.54
(0.017***)
0.38
(0.021***)
centile guess
0.47
(0.018***)
0.23
(0.021***)
0.49
(0.018***)
0.26
(0.021***)
R2 adj. 0.279 0.221 0.313 0.246 0.295 0.337
N 2388 2386 2386 2386 2388 2386
to predict actual ability. Table 2shows the model results, while Table 3shows the correlation matrix between
the ability variables.
The model results show that in each case, there is notable incremental validity in using more than one measure
of self-estimated ability. The best predictor to predict own sumscore was the estimated sumscore, but adding
the estimated centile added another 4 % variance. Results were similar for the g factor scores.
4 Discussion
We carried out a large replication study looking for evidence of the Dunning-Kruger metacognitive theory using
two methods. We found no convincing evidence in favor of the theory. What little is found seems to be mostly
due to a small set of outliers who likely provided dishonest data. Our replication study was about 2.5 times
larger than the prior study by (Gignac & Zajenkowski,2020, n=929). For this reason, we have more statistical
precision and our study carries more weight. However, both studies found the same result, namely that the
patterns the Dunning-Kruger theory should generate were not found. Thus, we successfully replicated (Gignac
& Zajenkowski,2020). Our study thus casts further doubt on the Dunning-Kruger metacognitive theory.
We found substantially stronger correlations between self-estimated ability and measured ability than indicated
by a prior meta-analysis on this topic, which found r = .33 (Freund & Kasten,2012), while we find r’s .50 and
.54 (outliers removed). This meta-analysis did not account for known dierences in the reliability of ability
measures, or range restriction, as the authors were not able to find these in the published studies. The present
study used a relatively brief cognitive test (25 items), so it seems unlikely our measure was overall more reliable
than their average test (reliability was estimated at .74). Thus, we are not sure why we find a substantially
higher correlation. The moderator analysis in the meta-analysis found that using relative self-ratings (such
as centiles; estimated r = .33 + .09 = .42) produced stronger correlations, whereas we find the raw score
estimate produced slightly stronger correlations. They did not study self-estimates on knowledge tests (such
as ours), but they found no evidence that verbal tests produced more accurate self-estimates. In fact, they
found that numerical tests produced the strongest (estimated r = .33 + .16 = .49). There is also a large body
of research in industrial-organizational psychology that has investigated the role of question formats in self-
and other-estimated abilities. Regrettably, ncertainty exists regarding what has been found, and what should
be used in practice (DeNisi & Murphy,2017). As such, this body of literature, though large, is unfortunately
not immediately applicable to the interpretation of our results. Taken together then, it is unclear why we find
larger correlations between measured and self-rated ability scores compared to other studies.
9
Published: 29th of August 2021 OpenPsych
References
Ackerman, P. L., Beier, M. E., & Bowen, K. R. (2002). What we really know about our abilities and our
knowledge. Personality and Individual Dierences,33(4), 587–605. doi: 10.1016/S0191-8869(01)00174-X
Alexander, S. (2013, April 12). Lizardman’s constant is 4%. Slate Star Codex. Re-
trieved from
https://slatestarcodex.com/2013/04/12/noisy-poll-results-and-reptilian-muslim
-climatologists-from-mars/
Chalmers, P., Pritikin, J., Robitzsch, A., Zoltak, M., Kim, K., Falk, C. F., . . . Oguzhan, O. (2020). mirt:
Multidimensional item response theory (1.32.1) [computer software]. Retrieved from
https://CRAN.R-project
.org/package=mirt
Dalliard, M. (2017, July 1). Measurement error, regression to the mean, and group dierences. Human
Varieties. Retrieved from
https://humanvarieties.org/2017/07/01/measurement-error-regression-to
-the-mean-and-group-differences/
DeMars, C. (2010). Item response theory. Oxford University Press.
DeNisi, A. S., & Murphy, K. R. (2017). Performance appraisal and performance management: 100 years of
progress? The Journal of Applied Psychology,102(3), 421–433. doi: 10.1037/apl0000085
Freund, P. A., & Kasten, N. (2012). How smart do you think you are? a meta-analysis on the validity of
self-estimates of cognitive ability. Psychological Bulletin,138(2), 296–321. doi: 10.1037/a0026556
Gignac, G. E., & Zajenkowski, M. (2020). The dunning-kruger eect is (mostly) a statistical artefact: Valid
approaches to testing the hypothesis with individual dierences data. Intelligence,80, 101449. doi:
10.1016/
j.intell.2020.101449
Harrell, F. E. (2019). rms: Regression modeling strategies (5.1-3.1) [computer software]. Retrieved from
https://
CRAN.R-project.org/package=rms
Jensen, A. R. (1998). The g factor: The science of mental ability. Praeger.
Kirkegaard, E. O. W. (2021). Are there complex assortative mating patterns for humans? analysis of 340 spanish
couples. Mankind Quarterly,61(3), 578–598. doi: 10.46469/mq.2021.61.3.12
Krueger, J., & Mueller, R. A. (2002). Unskilled, unaware, or both? the better-than-average heuristic and statistical
regression predict errors in estimates of own performance. Journal of Personality and Social Psychology,82(2),
180–188. doi: 10.1037/0022-3514.82.2.180
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How diculties in recognizing one’s own
incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology,77(6), 1121–1134.
doi: 10.1037//0022-3514.77.6.1121
Schimmack, U. (2020, September 13). The dunning-kruger eect explained. Replicability-Index. Retrieved from
https://replicationindex.com/2020/09/13/the-dunning-kruger-effect-explained/
Shores, E. A., & Carstairs, J. R. (2000). The macquarie university neuropsychological normative study
(munns): Australian norms for the wais-r and wms-r. Australian Psychologist,35(1), 41–59. doi:
10.1080/
00050060008257467
Snow, W. G., Tierney, M. C., Zorzitto, M. L., Fisher, R. H., & Reid, D. W. (1989). Wais-r test-retest reliability
in a normal elderly sample. Journal of Clinical and Experimental Neuropsychology,11(4), 423–428. doi:
10.1080/01688638908400903
Zell, E., Strickhouser, J. E., Sedikides, C., & Alicke, M. D. (2020). The better-than-average eect in comparative
self-evaluation: A comprehensive review and meta-analysis. Psychological Bulletin,146(2), 118–149. doi:
10.1037/bul0000218
10
... The results show excellent self-knowledge (r = .68) in line with recent results using the same question phrasing with a 25-item science knowledge test (Kirkegaard & Gerritsen, 2021). We also asked participants about their ability level in a different way: "Compared to the other Prolific survey users who took this survey, how well do you think you did?" (in centiles). ...
... Heteroscedasticity was tested using the method used in(Kirkegaard & Gerritsen, 2021). This method is based on the absolute valued residuals and is implemented in the test_HS() function from the kirkegaard package. ...
Article
Full-text available
Vocabulary is one of the best measures of general intelligence (g). However, readily available English vocabulary tests are often based on outdated words or are proprietary. To remedy this problem and to provide a large item pool for computerized adaptive testing (CAT), we sought to construct a new English vocabulary test. We constructed a total of 224 items using different formats (choose 1 of 5, choose 2 of 5, and choose 3 of 5). After initial experimentation, we validated and normed the test using an online US sample from Prolific (N = 441). The resulting full test had near-perfect overall reliability, 0.97. The reliability was high for a wide range of ability, above 0.90 for z scores from-3.89 to 2.32. Differential item functioning testing found negligible bias for sex. The total test score showed a small advantage in favour of women (d = 0.16) in the White sample but virtually no sex difference in the entire sample (d = 0.07). Abbreviated versions of the test were constructed based on multiple algorithms.
... However, utilizing the CSEOF method to analyze spatiotemporal sea level reveals the windowing effect in CSEOF trend mode (Multiplication of CSLVs that explain most of the trend in the altimetry sea level 330 anomalies and respective PC), as discussed by Hamlington et al. (2019). This effect exhibits nonmonotonic, nonlinear heteroscedasticity, characterized by elevated standard deviations at the start and end of extracted trend components, decreasing towards the middle (Kirkegaard and Gerritsen, 2021). To clarify this issue, we use CSEOF decomposition on AVISO altimetry sea level anomalies and estimate the regional average of the trend modes, as shown in Figure 4. Notably, areas like Northern Europe from 1993 to 1997 ...
Preprint
Full-text available
Rapidly rising sea level is one of the major adverse consequences of anthropogenic climate change. Sea level rise poses an existential threat to coastal populations, particularly for urban settlements with accelerating growth rates. Contemporary empirical sea level reconstructions have been used to conflate short-term (~3 decades) satellite altimetry geocentric sea level data and long-term (50 years or longer) tide gauge records to better estimate reliable sea level rise towards multi-decadal to centennial time scales. However, adequate separations and quantifications of low-frequency climate patterns and sea level trends globally at regional scales remain elusive. Here, we propose a new sea level reconstruction framework that incorporates Empirical Orthogonal Function (EOF) into the contemporary Cyclostationary EOF with Reduced Space Optimal Interpolation (CSEOF-OI) algorithm to better reconstruct sea level fields. Using 225 selected long-term gap-filled tide gauge records with vertical land motion adjusted and satellite altimetry, our global reconstructed monthly sea level time series, January 1950– January 2022, exhibit distinct delineations between modeled climate patterns and sea level trends at regional scales. The separated sea level patterns include trends, modulated annual cycles, the El Niño Southern Oscillation (ENSO), and the Pacific Decadal Oscillation (PDO). The third principal component of the reconstructed sea level exhibits a Pearson correlation coefficient of 0.87 with the Niño 3.4 ENSO index, and the fourth principal component correlates at 0.75 with the PDO index, indicating excellent agreement. The global mean sea level trend, accounting for the predominant climate periodicities, is 1.9 ± 0.2 mm yr⁻¹ (95 % confidence), and the estimate during the satellite altimetry era (January 1993–December 2021) is 3.2 ± 0.3 mm yr⁻¹ (95 % confidence). Compared with previous studies, we conclude that our 72-year sea-level reconstruction allows us to better separate the ENSO and PDO climate patterns, as well as the sea level they induced. Finally, we show that the short-term (5-year) rates of ENSO and PDO patterns significantly affect sea level both on a global and regional scale, altering global mean sea level trends by up to 1.1 ± 0.5 mm yr¹ (January 2011–January 2016). Over the past seven decades, the climate pattens exerted a minor impact on sea level trends, but substantially modulated apparent regional sea level accelerations, particularly in the western Pacific (e.g., 0.09 ± 0.05 mm yr⁻² at the Kuroshio Current), and in the east and central equatorial Pacific Ocean (e.g., −0.04 ± 0.03 mm yr⁻² near Costa Rica). The reconstructed sea level and analysis results datasets are available at https://doi.org/10.5281/zenodo.15288817 (Wang, 2025).
... This is based on the empirical_rxx() function. 2 Census values from https://www.census.gov/quickfacts/fact/table/US/PST045223. 3 The tests were based on the squared residuals method, as used inKirkegaard & Gerritsen, (2021). ...
Article
Full-text available
A prior study found that general intelligence (g) is highly predictable from the items of the MMPI-2 (Minnesota Multiphasic Personality Inventory), achieving a cross-validated correlation of .85 in the Vietnam Experience Study, a sample of American military men. Here, the validity of a reduced version of this prediction model with 107 MMPI items was tested in a true out-of-sample dataset using a newly collected online (Prolific) sample of American adults (n = 499) who took a 226-item English vocabulary test. The model had an accuracy of r = .55 compared to the .81 in the original sample. There was some evidence of prediction bias by sex, which is not surprising because the model was trained only on men. A retrained version of the model was fit on the current dataset and achieved an accuracy of .65. Potential applications are discussed. Keywords: Machine learning, Personality, Psychopathology, Intelligence
Article
Full-text available
People tend to hold overly favorable views of their abilities in many social and intellectual domains. The authors suggest that this overestimation occurs, in part, because people who are unskilled in these domains suffer a dual burden: Not only do these people reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the metacognitive ability to realize it. Across 4 studies, the authors found that participants scoring in the bottom quartile on tests of humor, grammar, and logic grossly overestimated their test performance and ability. Although their test scores put them in the 12th percentile, they estimated themselves to be in the 62nd. Several analyses linked this miscalibration to deficits in metacognitive skill, or the capacity to distinguish accuracy from error. Paradoxically, improving the skills of participants, and thus increasing their metacognitive competence, helped them recognize the limitations of their abilities.
Article
Full-text available
Assortative mating for both physical and psychological traits is well-established in many animal species, including humans. Most studies, however, only compute linear measures of mate similarity, typically Pearson correlations. However, it is possible that trait similarity, or dissimilarity, has complex patterns missed by the correlation metric. We investigated a dataset of 340 Spanish couples for evidence of relationships across 7 traits: age, educational attainment, intelligence, and the scales of the Eysenck Personality Questionnaire: Extroversion, Psychoticism, Neuroticism, and the Lie scale. We replicated well known linear assortative mating for age, intelligence and education. Like most studies, we find weak to no assortative mating for the personality traits. Analysis of nonlinear patterns using regression splines failed to reveal anything beyond the linear relations. Finally, we examined cross-trait variation for couples but we found little of note. Overall, it does not appear that there are complex patterns for traits in human couples.
Article
Full-text available
The better-than-average-effect (BTAE) is the tendency for people to perceive their abilities, attributes, and personality traits as superior compared with their average peer. This article offers a comprehensive review of the BTAE and the first quantitative synthesis of the BTAE literature. We define the effect, differentiate it from related phenomena, and describe relevant methodological approaches, theories, and psychological mechanisms. Next, we present a comprehensive meta-analysis of BTAE studies, including data from 124 published articles, 291 independent samples, and more than 950,000 participants. Results indicated that the BTAE is robust across studies (dz = 0.78, 95% CI [0.71, 0.84]), with little evidence of publication bias. Further, moderation tests suggested that the BTAE is larger in the case of personality traits than abilities, positive as opposed to negative dimensions, and in studies that (a) use the direct rather than the indirect method, (b) involve many rather than few dimensions, (c) sample European Americans rather than East-Asians (especially for individualistic traits), and (d) counterbalance self and average peer judgments. Finally, the BTAE is moderately associated with self-esteem (r = .34) and life satisfaction (r = .33). Results from selection model analyses clarify areas of the BTAE literature in which publication bias may be of elevated concern. Discussion highlights theoretical and empirical implications. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Article
Full-text available
We review 100 years of research on performance appraisal and performance management, highlighting the articles published in JAP, but including significant work from other journals as well. We discuss trends in eight substantive areas: (1) scale formats, (2) criteria for evaluating ratings, (3) training, (4) reactions to appraisal, (5) purpose of rating, (6) rating sources, (7) demographic differences in ratings, and (8) cognitive processes, and discuss what we have learned from research in each area. We also focus on trends during the heyday of performance appraisal research in JAP (1970-2000), noting which were more productive and which potentially hampered progress. Our overall conclusion is that JAP’s role in this literature has not been to propose models and new ideas, but has been primarily to test ideas and models proposed elsewhere. Nonetheless, we conclude that the papers published in JAP made important contribution to the filed by addressing many of the critical questions raised by others. We also suggest several areas for future research, especially research focusing on performance management.
Article
Full-text available
This paper reports on the conorming of the WAIS-R and WMS-R on a stratified random sample of 399 healthy young adults in the age range 18 to 34 years in Sydney, Australia. These data were collected as part of the Macquarie University Neuropsychological Normative Study (MUNNS). In deriving the norms, the influences of age, sex, and education were taken into account, and tables for deriving demographically adjusted scaled scores are provided. Tables for the IQ and memory index scores are provided, with values for the 90% and 95% confidence intervals bands around the predicted true scores. The confidence intervals for both test and retest conditions were derived using standard errors of estimation and prediction. Base-rate tables of both within-test and between-test differences are presented, as well as tables for predicting memory index scores from the IQ scores. A hypothetical set of data demonstrates the differences in results using the American norms versus the demographically adjusted Australian norms. Finally, a step-by-step example is presented to demonstrate the use of the MUNNS tables.
Article
Full-text available
Individuals' perceptions of their own level of cognitive ability are expressed through self-estimates. They play an important role in a person's self-concept because they facilitate an understanding of how one's own abilities relate to those of others. People evaluate their own and other persons' abilities all the time, but self-estimates are also used in formal settings, such as, for instance, career counseling. We examine the relationship between self-estimated and psychometrically measured cognitive ability by conducting a random-effects, multilevel meta-analysis including a total of 154 effect sizes reported in 41 published studies. Moderator variables are specified in a mixed-effects model both at the level of the individual effect size and at the study level. The overall relationship is estimated at r = .33. There is significant heterogeneity at both levels (i.e., the true effect sizes vary within and between studies), and the results of the moderator analysis show that the validity of self-estimates is especially enhanced when relative scales with clearly specified comparison groups are used and when numerical ability is assessed rather than general cognitive ability. The assessment of less frequently considered dimensions of cognitive ability (e.g., reasoning speed) significantly decreases the magnitude of the relationship. From a theoretical perspective, Festinger's (1954) theory of social comparison and Lecky's (1945) theory of self-consistency receive empirical support. For practitioners, the assessment of self-estimates appears to provide diagnostic information about a person's self-concept that goes beyond a simple "test-and-tell" approach. This information is potentially relevant for career counselors, personnel recruiters, and teachers.
Article
The Dunning-Kruger hypothesis states that the degree to which people can estimate their ability accurately depends, in part, upon possessing the ability in question. Consequently, people with lower levels of the ability tend to self-assess their ability less well than people who have relatively higher levels of the ability. The most common method used to test the Dunning-Kruger hypothesis involves plotting the self-assessed and objectively assessed means across four categories (quartiles) of objective ability. However, this method has been argued to be confounded by the better-than-average effect and regression toward the mean. In this investigation, it is argued that the Dunning-Kruger hypothesis can be tested validly with two inferential statistical techniques: the Glejser test of heteroscedasticity and nonlinear (quadratic) regression. On the basis of a sample of 929 general community participants who completed a self-assessment of intelligence and the Advanced Raven's Progressive Matrices, we failed to identify statistically significant heteroscedasticity, contrary to the Dunning-Kruger hypothesis. Additionally, the association between objectively measured intelligence and self-assessed intelligence was found to be essentially entirely linear, again, contrary to the Dunning-Kruger hypothesis. It is concluded that, although the phenomenon described by the Dunning-Kruger hypothesis may be to some degree plausible for some skills, the magnitude of the effect may be much smaller than reported previously.
Article
This book addresses an important issue for the design of survey instruments, which is rarely taught in graduate programs beyond those specifically for statisticians. Item Response Theory is used to describe the application of mathematical models to data from questionnaires and tests as a basis for measuring abilities, attitudes, or other variables. It is used for statistical analysis and the development of assessments, often for high stakes tests such as the Graduate Record Examination. This volume includes examples of both good and bad write-ups for the methods sections of journal articles.
Article
Recently, it has become popular to state that “people hold overly favorable views of their abilities in many social and intellectual domains” [Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: how difficulties in recognising one's own incompetence lead to inflated self-assessments. Journal of Personality & Social Psychology, 77(6), (1999) 1121]. Research that supports this point tells only half of the story—in a manner documented by Cronbach’s [Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist, 12, (1957) 671] classic article on the “two disciplines of scientific psychology.” That is, the recent research has only documented the experimental side of the scientific divide (which focuses on means and ignores individual differences). The current paper shows that research from the other side of the scientific divide, namely the correlational approach (which focuses on individual differences), provides a very different perspective for people’s views of their own intellectual abilities and knowledge. Previous research is reviewed, and an empirical study of 228 adults between 21 and 62 years of age is described where self-report assessments of abilities and knowledge are compared with objective measures. Correlations of self-rating and objective-score pairings show both substantial convergent and discriminant validity, indicating that individuals have both generally accurate and differentiated views of their relative standing on abilities and knowledge.
Article
A pesar de la relativamente corta historia de la Psicología como ciencia, existen pocos constructos psicológicos que perduren 90 años después de su formulación y que, aún más, continúen plenamente vigentes en la actualidad. El factor «g» es sin duda alguna uno de esos escasos ejemplos y para contrastar su vigencia actual tan sólo hace falta comprobar su lugar de preeminencia en los modelos factoriales de la inteligencia más aceptados en la actualidad, bien como un factor de tercer orden en los modelos jerárquicos o bien identificado con un factor de segundo orden en el modelo del recientemente desaparecido R.B.Cattell.