ArticlePDF Available

Genetic ancestry and social race are nearly interchangeable

  • Ulster Institute for Social Research


It has been claimed that social race and genetic ancestry are at best weakly related. Here we test this claim by applying predictive modeling in both directions, i.e., predicting genetic ancestry from social race(s), and predicting social race(s) from genetic ancestry. We utilize the public Pediatric Imaging, Neurocognition, and Genetics (PING) dataset (n = 1,391), so that others may examine the data as well. In the simple scenario where we are only concerned with self-identified white, black, and mixed (black-white) race individuals (571 whites, 140 blacks, 25 mixed), model accuracy was very high. Predicting social race from genetic ancestry resulted in an area under curve (AUC) of .994, an overall accuracy (concordance) of 98.0%, and a pseudo-R2 of .951. Conversely, predicting genetic ancestry from social race had a model R2 adjusted of .992. Using the full dataset, there are 8 census-type categories of social race. Using cross-validated multinomial regression to predict social race from 6 genetic ancestry variables, we find that the AUC is .89. Using Dirichlet regression to predict ancestries from social race, we find an overall correlation of .94 (R2 = 88.4%). Further analyses using more sophisticated methods (random forest, support vector machine) found similar results. In conclusion, social race and genetic ancestry are nearly interchangeable.
Submitted: 15th of November 2021 DOI: 10.26775/OP.2021.12.22
Published: 22nd of December 2021 ISSN: 2597-324X
Genetic ancestry and social race are nearly interchangeable
Emil O. W. Kirkegaard
It has been claimed that social race and genetic ancestry are at best weakly related. Here we test this claim by applying
predictive modeling in both directions, i.e., predicting genetic ancestry from social race(s), and predicting social race(s) from
genetic ancestry. We utilize the public Pediatric Imaging, Neurocognition, and Genetics (PING) dataset (n = 1,391), so that
others may examine the data as well. In the simple scenario where we are only concerned with self-identified White, Black,
and mixed (Black-White) race individuals (571 Whites, 140 Blacks, 25 mixed), model accuracy was very high. Predicting
social race from genetic ancestry resulted in an area under curve (AUC) of .994, an overall accuracy (concordance) of 98.0 %,
and a pseudo-R
of .951. Conversely, predicting genetic ancestry from social race had a model R
adjusted of .992. Using
the full dataset, there are 8 census-type categories of social race. Using cross-validated multinomial regression to predict
social race from 6 genetic ancestry variables, we find that the AUC is .89. Using Dirichlet regression to predict ancestries
from social race, we find an overall correlation of .94 (R
= .884). Further analyses using more sophisticated methods
(random forest, support vector machine) found similar results. In conclusion, social race and genetic ancestry are nearly
Keywords: race, genetics, ethnicity
1 Introduction
There is no lack of books and articles arguing that race is a social construct (Evans,2019;Gould,1981;Montagu,
1942;Sussman,2014). Representative headlines in the media include “Race Is Real, But It’s Not Genetic” from
Discovery Magazine (Goodman,2020), while in The Atlantic, we learn that “people’s racial identity may be
statistically correlated with their ancestry, albeit unreliably” (Holmes,2018), and in Scientific American that
“Racial categories are weak proxies for genetic diversity and need to be phased out” (Gannon,2016). There was
recently an entire special issue in National Geographic about the supposed unreality or social construction of
race (National Geographic,2018;Nyborg,2019). However, such works do not actually examine or report the
strength of the statistical associations between social race and genetic ancestry.
Similarly, surveys of geneticists
and anthropologists do not find any consensus about how strong the relationship is (Nelson et al.,2018;Wagner
et al.,2017). Thus, there is a need to quantify how strong the relationship is between social race and genetic
2 Data
We used data from the Pediatric Imaging, Neurocognition, and Genetics (PING) dataset (
) (Jernigan et al.,2016). This choice was motivated by the availability of the dataset for public use.
Though the dataset does not appear to be in the public domain or deliberately designated for public use, Noble
et al. (2015) used the dataset in a study. As part of their publication, they attached large parts of the dataset to
the journal website, thus making it freely available for others’ use. The fact that the dataset is thus de facto
Ulster Institute for Social Research, United Kingdom, Email:
1By social race, we mean here human-designated racial classification of persons, whether by themselves or by others.
Published: 22nd of December 2021 OpenPsych
public means that others will be able to replicate our analyses to verify they are correct or carry out follow-up
The dataset itself consists of 1,493 American children and youths (ages 3-20, mean 11.7) who underwent
detailed phenotyping including surveys, neuroscientific (MRI), cognitive testing (NIH Toolbox Cognition
Battery), and genetic testing. The subjects were recruited through “local postings and outreach activities
conducted in the greater metropolitan areas of Baltimore, Boston, Honolulu, Los Angeles, New Haven, New
York, Sacramento, and San Diego”, and as such, are not perfectly representative of the American population of
this age group. Only 1,391 subjects were available in the public dataset.
As part of the interviewing, the child or their primary guardian was asked which of the following racial
categories they identified with:
1. Hispanic or Latino
2. Pacific Islander, Samoan, or Hawaiian
3. Asian
4. African American or Black
5. American Indian or Native American
6. White
7. Other
Thus, for every person, there is a set of 7 social race binary variables available for study. We coded the data two
dierent ways: First, the standard simplified census-approach. In this approach, anyone who responds yes to
Hispanic is classified as Hispanic. Anyone else who only selects a single option is classified as that. Anyone who
selected multiple options was classified as multiracial. This produced 8 categories (the 7 available options +
multiracial). Second, the common combinations with lumping. Every combination of chosen races is combined
into a single compound group. Then all groups that were fewer than 20 subjects were lumped together in a
remainder category. This approach resulted in 11 categories, shown in Table 1.
Table 1: Distribution of social races by the common combinations coding with n = 20 as the minimum group size. Encoding
was done by forcats::fct_lump_min().
Group Count Percent
White 571 41.05
Remainder 177 12.72
African American 140 10.06
Hispanic, White 140 10.06
Asian 122 8.77
Hispanic 71 5.10
Asian, White 60 4.31
Pacific Islander, Asian, White 32 2.30
Hispanic, African American 29 2.08
African American, White 25 1.80
The genetic testing consisted of a standard microarray measurement (Illumina Human660W-Quad BeadChip,
550k variants). The PING group carried out ancestry analysis and assigned each subject’s global ancestries to 6
large clusters: African, Central Asian, East Asian, European, Native American, and Oceanic. The estimation
was done using ADMIXTURE. These variables were also released as part of the dataset by Noble et al. (2015).
3 Results
Results are presented in two parts. In the first part, we examine only the African-European ancestry subset of
the data. Then in the second part, we extend the analysis to the entire dataset. All analyses were done with R
Published: 22nd of December 2021 OpenPsych
Figure 1: Probability of social race categories as a function of genetic ancestry.
Table 2: Confusion matrix for model predictions.
Predicted values
True values White White + Black Black
White 496 0 1
White + Black 5 13 4
Black 0 3 127
3.1 African-Europeans
The first subset consists of subjects who selected only African or European races, and whose genetic ancestry
of these two clusters sum to at least 95 % (n = 649, 497 Whites, 130 Blacks, 22 mixed). For this dataset, the
African and European ancestry components are nearly perfectly negatively correlated (r = -1.00), and thus the
genetic data is eectively one-dimensional. The outcome variable is the ordered factor of social race with the
mixed group being the intermediate level. Thus, in this simplified scenario, the data can be modeled using
ordinal logistic regression. For predicting genetic ancestry, a simple linear regression is sucient. The logistic
model had an area under the curve (AUC) of .994 and a pseudo-R2of .932. Figure 1 shows the model results.
In the figure, we see that the mixed race group is not entirely centered at 50 %, as one might naively expect. The
modal value is instead at 40 % African and 60 % European. We can think of two explanations for this. The first
is that it is a remnant of the one-drop rule, wherein any amount of African ancestry would classify a person as
African by some US state laws (Liz,2018). The second is that many individuals identifying (or by their parents)
are first generation mixes between an African American and a White. Since African Americans have about
80 % African ancestry, and White Americans about 0 %, the ospring will have about 40 %, which is the modal
value observed (thanks for Gerhard Meisenberg for this suggestion). Either way, our finding replicated the
prior results by (Lasker et al.,2019) which found the mixed Black-White group was intermediate in European
ancestry, and had a mean European ancestry of 79.6 %.
With regards to the model predictions, it is informative to look at the confusion matrix, which shows the
concordance between model predictions and true values. This is shown in Table 2.
Overall, the concordance was 98.0 %. It was 99.8 % for Whites, 59.1 % for mixed, and 97.7 % for Blacks.
These results thus closely match those found by Tang et al. (2005). If we reduce the modeling outcome to
just predicting whether a subject reported being Black or not, the concordance was 99.1 %, AUC = .995, and
Published: 22nd of December 2021 OpenPsych
Figure 2: Violin plots of African ancestry by social race. The average ancestry proportions are .00, .40, and .82, respectively,
for Whites, White + Blacks, and Blacks.
.951. Thus, the main dicult of the model comes from telling the mixed from the Blacks, which can
also be seen in the confusion matrix.
Conversely, predicting genetic ancestry from social race requires merely a linear model. The fit is excellent,
with an adjusted R
of .955. This model is just the average ancestry for each of the three groupings, as shown in
Figure 2.
3.2 The full sample
With the results in mind from the previous section, we are now ready to examine the full dataset. There are now
11 categories to predict, and they cannot be coded as an ordinal variable. Thus, one cannot use logistic or ordinal
regression. Multinomial regression is the standard parametric approach for this kind of data. In this approach,
the probability of a case belonging to each category is estimated based on the input variables, which in this
case are the genetic ancestry variables. We used the nnet implementation of this model as implemented in
tidymodels (Kuhn et al.,2020;Ripley & Venables,2021). To avoid overfitting, we used 20-fold cross validation.
The estimated model accuracy was AUC = .925, with a strict concordance of 76.7 % versus 41.0 % by guessing
the largest group.2
The 6 genetic ancestries sum to 1, and thus using multivariate multiple regression is probably inappropriate
because the predicted values are not constrained to [0, 1], nor do they necessarily sum to 1. The standard
approach to this is to use Dirichlet regression, which is made to model such proportional data and accomplishes
this using data rescaling (Douma & Weedon,2019). Dirichlet regression is implemented in the DirichletReg
package (Maier,2021), which we used here to fit the data. Dirichlet regression does not provide any overall
model fit akin to R
, but one can examine the predicted values compared to the true values for each dimension,
as shown in Figure 3.
The correlations for each ancestry are: African .92, Amerindian .74, Central Asian .19, East Asian .86, European
.88, and Oceanian .77. Overall, the correlation between any prediction and the true value is .92, which thus
corresponds to an R
of .85. Few persons had Oceanian or Central Asian ancestry in our dataset, so it is not
surprising the correlations are weaker for them, though all are far beyond chance levels (all p’s < 10
). In
addition, no person in our sample was an unadmixed Amerindian.
We used the Hand and Till variant AUC generalization for multiclass data, as this was the default in tidymodels.
Published: 22nd of December 2021 OpenPsych
Figure 3: Model predictions from Dirichlet regression for predicting 6 genetic ancestries.
We tried some variations on the analyses in this section. First, we tried using the standard census simplified
social race encodings instead, thereby reducing the group count to 8 and allowing for very small groups (there
were only 4 American Indians). For predicting social race, the AUC was .915 and the concordance was 84.4 %
versus 41.0 % expected by guessing the modal value. For predicting ancestry, the overall correlation was .90.
These results are practically identical to the ones with the more complicated coding. Thus, the specific coding
was not important for the strength of the results.
Second, it is possible that nonlinear relationships or interactions between variables were important. To try
to capture these eects, we used a random forest model to predict social race as encoded by the common
combinations. This model produced a model fit of AUC = 0.930, and concordance of 77.4 % (versus 41 %
by guessing the modal value). Multivariate random forest to predict genetic ancestries from the social race
variables produced an overall correlation of r = .94. Both results are very slightly better than those using the
simpler additive models. Similarly, using a radial support vector machine to predict social race, we attained a
concordance of 75.7 %, and AUC of .851. Thus in general, we don’t find that nonlinear eects or interactions
are important.
Third, to assess whether standard multiple regression produced inappropriate results, we fit the multivariate
ordinary least squares ancestry model. The results were mostly sensible, though some out of bounds predictions
were produced (15 % of values were below 0, none above 1). Overall, the model accuracy was more or less the
same dirichlet regression, r = .92.
4 Discussion
We examined the statistical relationship between social race and genetic ancestry in a moderate-sized but
diverse American sample of children and youths. Despite popular claims to the contrary, we found that the
associations between the variables were extremely strong. When predicting social race, AUCs were consistently
above .90. According to a guideline for the interpretation of AUC values from a statistics textbook, values above
.90 are considered “outstanding” compared to merely “excellent” in the span .80 to .90 (Hosmer & Lemeshow,
2000, p. 162). In the simplified scenario of only African-European mixes where we could compute a pseudo-R
this was .932, again extremely strong. Such values are rarely encountered in applied research with individuals
(Gignac & Szodorai,2016;Nuijten et al.,2020). Whether or not we focused on the simplified situation of only
African-European mixes, or whether we looked at the full sample, the accuracies remained very high. The
empirical results stand in stark contrast to the various claims of weak or nonexistent associations that we quoted
earlier. The results in the present study were very similar to those reported in prior studies that carried out a
Published: 22nd of December 2021 OpenPsych
similar, but more limited analysis (Fang et al.,2019;Lasker et al.,2019;Tang et al.,2005). Thus, it is not likely
that our sample is an outlier among other samples.
5 Acknowledgements and supplementary resources
We wish to thank the PING consortium for their dataset. Special thanks to (Noble et al.,2015) for releasing the
dataset to the public.
The project files can be found at
, and the data can be downloaded from the journal
website of Noble et al. (2015)’s study (
Douma, J. C., & Weedon, J. T. (2019). Analysing continuous proportions in ecology and evolution: A practical
introduction to beta and dirichlet regression. Methods in Ecology and Evolution,10(9), 1412–1430. doi:
Evans, G. (2019). Skin deep: Journeys in the divisive science of race. Oneworld Publications.
Fang, H., Hui, Q., Lynch, J., Honerlaw, J., Assimes, T. L., Huang, J., .. . Tang, H. (2019). Harmonizing genetic
ancestry and self-identified race/ethnicity in genome-wide association studies. Am. J. Hum. Genet.,105(4),
763–772. doi: 10.1016/j.ajhg.2019.08.012
Gannon, M. (2016, February 5). Race is a social construct, scientists argue. Retrieved from
Gignac, G. E., & Szodorai, E. T. (2016). Eect size guidelines for individual dierences researchers. Pers. Individ.
Dif.,102, 74–78. doi: 10.1016/j.paid.2016.06.069
Goodman, A. (2020, June 25). Race is real, but it’s not genetic. Discover Magazine. Retrieved from
Gould, S. J. (1981). The mismeasure of man (1st ed.). Norton.
Holmes, I. (2018, April 25). What happens when geneticists talk sloppily about race. The Atlantic. Retrieved
Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). Wiley.
Jernigan, T. L., Brown, T. T., Hagler, D. J., Jr, Akshoomo, N., Bartsch, H., Newman, E., .. . Pediatric Imaging,
Neurocognition and Genetics Study (2016). The pediatric imaging, neurocognition, and genetics (PING) data
repository. Neuroimage,124(Pt B), 1149–1154. doi: 10.1016/j.neuroimage.2015.04.057
Kuhn, M., Wickham, H., & RStudio. (2020). tidymodels: Easily install and load the “tidymodels” packages
(0.1.0). Computer software. Retrieved from
Lasker, J., Pesta, B. J., Fuerst, J. G. R., & Kirkegaard, E. O. W. (2019). Global ancestry and cognitive ability.
Psych,1(1), 431–459. doi: 10.3390/psych1010034
Liz, J. (2018). “the fixity of whiteness”: Genetic admixture and the legacy of the one-drop rule. Critical
Philosophy of Race,6(2), 239–261.
Maier, M. J. (2021). Dirichletreg: Dirichlet regression (0.7-1). Computer software. Retrieved from
Montagu, A. (1942). Man’s most dangerous myth: The fallacy of race (1st ed.). Columbia University Press.
National Geographic. (2018). Black and white. Retrieved from
Nelson, S. C., Yu, J.-H., Wagner, J. K., Harrell, T. M., Royal, C. D., & Bamshad, M. J. (2018). A content analysis
of the views of genetics professionals on race, ancestry, and genetics. AJOB Empir. Bioeth.,9(4), 222–234. doi:
Published: 22nd of December 2021 OpenPsych
Noble, K. G., Houston, S. M., Brito, N. H., Bartsch, H., Kan, E., Kuperman, J. M., .. . Sowell, E. R. (2015). Family
income, parental education and brain structure in children and adolescents. Nature Neuroscience,18(5),
773–778. doi: 10.1038/nn.3983
Nuijten, M. B., van Assen, M. A. L. M., Augusteijn, H. E. M., Crompvoets, E. A. V., & Wicherts, J. M. (2020).
Eect sizes, power, and biases in intelligence research: A meta-meta-analysis. Journal of Intelligence,8(4), 36.
doi: 10.3390/jintelligence8040036
Nyborg, H. (2019). Race as social construct. Psych,1(1), 139–165. doi: 10.3390/psych1010011
Ripley, B., & Venables, W. (2021). nnet: Feed-forward neural networks and multinomial log-linear models
(7.3-16). Computer software. Retrieved from
Sussman, R. W. (2014). The myth of race: The troubling persistence of an unscientific idea. Harvard University
Tang, H., Quertermous, T., Rodriguez, B., Kardia, S. L. R., Zhu, X., Brown, A., . . . Risch, N. J. (2005). Genetic
structure, self-identified race/ethnicity, and confounding in case-control association studies. Am. J. Hum.
Genet.,76(2), 268–275. doi: 10.1086/427888
Wagner, J. K., Yu, J.-H., Ifekwunigwe, J. O., Harrell, T. M., Bamshad, M. J., & Royal, C. D. (2017). Anthropologists’
views on race, ancestry, and genetics: WAGNER et al. Am. J. Phys. Anthropol.,162(2), 318–327. doi:
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
In this meta-study, we analyzed 2442 effect sizes from 131 meta-analyses in intelligence research, published from 1984 to 2014, to estimate the average effect size, median power, and evidence for bias. We found that the average effect size in intelligence research was a Pearson's correlation of 0.26, and the median sample size was 60. Furthermore, across primary studies, we found a median power of 11.9% to detect a small effect, 54.5% to detect a medium effect, and 93.9% to detect a large effect. We documented differences in average effect size and median estimated power between different types of intelligence studies (correlational studies, studies of group differences, experiments, toxicology, and behavior genetics). On average, across all meta-analyses (but not in every meta-analysis), we found evidence for small-study effects, potentially indicating publication bias and overestimated effects. We found no differences in small-study effects between different study types. We also found no convincing evidence for the decline effect, US effect, or citation bias across meta-analyses. We concluded that intelligence research does show signs of low power and publication bias, but that these problems seem less severe than in many other scientific fields.
Full-text available
Using data from the Philadelphia Neurodevelopmental Cohort, we examined whether European ancestry predicted cognitive ability over and above both parental socioeconomic status (SES) and measures of eye, hair, and skin color. First, using multi-group confirmatory factor analysis, we verified that strict factorial invariance held between self-identified African and European-Americans. The differences between these groups, which were equivalent to 14.72 IQ points, were primarily (75.59%) due to difference in general cognitive ability (g), consistent with Spearman’s hypothesis. We found a relationship between European admixture and g. This relationship existed in samples of (a) self-identified monoracial African-Americans (B = 0.78, n = 2,179), (b) monoracial African and biracial African-European-Americans, with controls added for self-identified biracial status (B = 0.85, n = 2407), and (c) combined European, African-European, and African-American participants, with controls for self-identified race/ethnicity (B = 0.75, N = 7,273). Controlling for parental SES modestly attenuated these relationships whereas controlling for measures of skin, hair, and eye color did not. Next, we validated four sets of polygenic scores for educational attainment (eduPGS). MTAG, the multi-trait analysis of genome-wide association study (GWAS) eduPGS (based on 8442 overlapping variants) predicted g in both the monoracial African-American (r = 0.111, n = 2179, p < 0.001), and the European-American (r = 0.227, n = 4914, p < 0.001) subsamples. We also found large race differences for the means of eduPGS (d = 1.89). Using the ancestry-adjusted association between MTAG eduPGS and g from the monoracial African-American sample as an estimate of the transracially unbiased validity of eduPGS (B = 0.124), the results suggest that as much as 20%–25% of the race difference in g can be natively explained by known cognitive ability-related variants. Moreover, path analysis showed that the eduPGS substantially mediated associations between cognitive ability and European ancestry in the African-American sample. Subtest differences, together with the effects of both ancestry and eduPGS, had near-identity with subtest g-loadings. This finding confirmed a Jensen effect acting on ancestry-related differences. Finally, we confirmed measurement invariance along the full range of European ancestry in the combined sample using local structural equation modeling. Results converge on genetics as a partial explanation for group mean differences in intelligence.
Full-text available
Proportional data, in which response variables are expressed as percentages or fractions of a whole, are analyzed in many subfields of ecology and evolution. The scale‐independence of proportions makes them appropriate to analyse many biological phenomena but statistical analyses are not straightforward, since proportions can only take values from zero to one and their variance is usually not constant across the range of the predictor. Transformations to overcome these problems are often applied, but can lead to biased estimates and difficulties in interpretation. In this paper we provide an overview of the different types of proportional data and discuss the different analysis strategies available. In particular, we review and discuss the use of promising, but little used, techniques for analyzing continuous (also called non‐count‐based or non‐binomial) proportions (e.g. percent cover, fraction time spent on an activity): beta and Dirichlet regression, and some of their most important extensions. A major distinction can be made between proportions arising from counts and those arising from continuous measurements. For proportions consisting of 2 categories, count‐based data are best analyzed using well‐developed techniques such as logistic regression, while continuous proportions can be analyzed with beta regression models. In the case of >2 categories, multinomial logistic regression or Dirichlet regression can be applied. Both beta and Dirichlet regression techniques model proportions at their original scale which makes statistical inference more straightforward and produce less biased estimates relative to transformation‐based solutions. Extensions to beta regression, such as models for variable dispersion, zero‐one augmented data and mixed effects designs have been developed and are reviewed and applied to case studies. Finally, we briefly discuss some issues regarding model fitting, inference, and reporting that are particularly relevant for beta and Dirichlet regression. Beta regression and Dirichlet regression overcome some problems inherent in applying classic statistical approaches to proportional data. To facilitate the adoption of these techniques by practitioners in ecology and evolution, we present detailed, annotated demonstration scripts covering all variations of beta and Dirichlet regression discussed in the article, implemented in the freely available language for statistical computing, R.
Full-text available
It is often claimed that race is a social construct and that scientists studying race differences are disruptive racists. The recent April 2018 “Race Issue” of the widely distributed National Geographic Magazine (NG) provided its millions of readers with a particularly illustrative example of this position. As discussions of race issues often recur, in both scientific and lay literature, stir considerable polemics, and have political, societal and human implications, we found it of both scientific and general interest to identify and dissect the following partly overlapping key contentions of the NG race issue magazine: (1) Samuel Morton’s studies of brain size is reprehensible racism (2) Race does not relate to geographic location, (3) Races do not exist as we are all equals and Africans, (4) Admixture and displacement erase race differences as soon as they appear, and (5) Race is only skin color deep. Also examined is the claim that Race does not matter. When analyzed within syllogistic formalism, each of the claims is found theoretically and empirically unsustainable, as Morton’s continuously evolving race position is misrepresented, race relates significantly to geography, we are far from equals, races have definitely not been erased, and race, whether self-reported or defined by ancestry, lineage, ecotype, species, or genes, is much more than skin color deep. Race matters vitally for people and societies. We conclude that important research on existing population differences is hurt when widely respected institutions such as NG mobilize their full authority in a massively circulated attempt to betray its scientific and public readership by systematically misrepresenting historical sources and scientific positions, shaming past scientists, and by selectively suppressing unwanted or unacceptable results–acts included as examples of academic fraud by the National Academy of Sciences (US, 1986). Any unqualified a priori denial of the formative evolutionary aspects of individual and population differences threatens to impede the recent promising research on effects of genome wide allelic associations, which would lames us in the vital quest to develop rational solutions to associated globally pressing societal problems.
Full-text available
Objective: To assess anthropologists' views on race, genetics, and ancestry. Methods: In 2012 a broad national survey of anthropologists examined prevailing views on race, ancestry, and genetics. Results: Results demonstrate consensus that there are no human biological races and recognition that race exists as lived social experiences that can have important effects on health. Discussion: Racial privilege affects anthropologists' views on race, underscoring the importance that anthropologists be vigilant of biases in the profession and practice. Anthropologists must mitigate racial biases in society wherever they might be lurking and quash any sociopolitical attempts to normalize or promote racist rhetoric, sentiment, and behavior.
Large-scale multi-ethnic cohorts offer unprecedented opportunities to elucidate the genetic factors influencing complex traits related to health and disease among minority populations. At the same time, the genetic diversity in these cohorts presents new challenges for analysis and interpretation. We consider the utility of race and/or ethnicity categories in genome-wide association studies (GWASs) of multi-ethnic cohorts. We demonstrate that race/ethnicity information enhances the ability to understand population-specific genetic architecture. To address the practical issue that self-identified racial/ethnic information may be incomplete, we propose a machine learning algorithm that produces a surrogate variable, termed HARE. We use height as a model trait to demonstrate the utility of HARE and ethnicity-specific GWASs.
Over the past decade, the proliferation of genetic studies on human health and disease has reinvigorated debates about the appropriate role of race and ancestry in research and clinical care. Here we report on the responses of genetics professionals to a survey about their views on race, genetics, and ancestry across the domains of science, medicine, and society. Through a qualitative content analysis of free-text comments from 515 survey respondents, we identified key themes pertaining to multiple meanings of race, the use of race as a proxy for genetic ancestry, and the relevance of race and ancestry to health. Our findings suggest that for many genetics professionals the questions of what race is and what race means remain both professionally and personally contentious. Looking ahead as genomics is translated into the practice of precision medicine and as learning health care systems offer continued improvements in care through integrated research, we argue for nuanced considerations of both race and genetic ancestry across research and care settings.
There has been increasing attention given to the way in which racial genetic clusters are constructed within population genetics. In particular, some scholars have argued that the conception of “whiteness” presupposed is such analyses is inherently problematic. In light of these ongoing discussions, this article aims to further clarify and develop this implicit relationship between whiteness, purity and contemporary genetics by offering a Foucauldian critique of the discourse of race within these genetic admixture studies. The goals of this article, then, are twofold: first, to unearth some of the presuppositions operative in this genetics discourse that make possible a biological conception of race; and second, to examine some of the social and historical origins of those presuppositions. To this end, this article provides a brief genealogy of racial purity beginning with its formal legal codification in the one-drop rule.
Individual differences researchers very commonly report Pearson correlations between their variables of interest. Cohen (1988) provided guidelines for the purposes of interpreting the magnitude of a correlation, as well as estimating power. Specifically, r = 0.10, r = 0.30, and r = 0.50 were recommended to be considered small, medium , and large in magnitude, respectively. However, Cohen's effect size guidelines were based principally upon an essentially qualitative impression, rather than a systematic, quantitative analysis of data. Consequently, the purpose of this investigation was to develop a large sample of previously published meta-analytically derived correlations which would allow for an evaluation of Cohen's guidelines from an empirical perspective. Based on 708 meta-analytically derived correlations, the 25th, 50th, and 75th percentiles corresponded to correlations of 0.11, 0.19, and 0.29, respectively. Based on the results, it is suggested that Cohen's correlation guidelines are too exigent, as b3% of correlations in the literature were found to be as large as r = 0.50. Consequently, in the absence of any other information, individual differences researchers are recommended to consider correlations of 0.10, 0.20, and 0.30 as relatively small, typical, and relatively large, in the context of a power analysis, as well as the interpretation of statistical results from a normative perspective.