This paper is under review for the journal Science in Context. This version dates Nov 12th 2020
Representing vulnerable populations in genetic studies: The case of the Roma
Veronika Lipphardt* / Gudrun Rappold° / Mihai Surdu*
*Albert Ludwig University Freiburg
° University of Heidelberg
Moreau (2019) has raised concerns about the usage of DNA data obtained from vulnerable
populations, such as the Uighurs in China. We discuss another case, situated in Europe and
with a research history dating back 100 years: genetic investigations of Roma. While Moreau
is mainly concerned with ethical issues, especially regarding informed consent, and though we
share his concerns, here we focus on problems surrounding representativity. We claim that
many of the ca. 440 publications in our sample neglect the methodological and conceptual
challenges of representativity. Moreover, authors do not account for problematic misrepresen-
tations of Roma resulting from the conceptual frameworks and sampling schemes they use.
We question the representation of Roma as a “genetic isolate” and the underlying rationales,
with a strong focus on sampling strategies. We discuss our results against the optimistic pro-
gnosis that the „new genetics“ could help to overcome essentialist understandings of groups.!
‚All European Roma’, a genetic publication from 2015 states, ‘appear to descend from a low
number of founders, and to have diverged into socially distinct endogamous groups after their
arrival in Europe.’ (Martinez-Cruz, 2015:2) !
This grand claim may provoke questions in readers not trained in genetics. How can someone
make such a general claim about „all European Roma“? And why is this knowledge framed in
such puzzling terms? Or, to start off more broadly, what, after all, is known about ‘the Roma’?
(Plájás, M’charek and van Baar, 2019). How are the boundaries of Europe defined? What
would be necessary in order to provide reliable knowledge about the Roma? Who is a Roma,
or what criteria would allow to distinguish Roma from others?
And why, or how can genetics
contribute to all of this?!
Far from being able to answer these questions, we instead wish to examine how some gene-
ticists have answered them in DNA studies in the last thirty years. These answers include
frequent statements about how genetics can contribute to one’s understanding of the Roma;
how little had been known about the history of Roma before these studies; and how much
there is still to learn and knowledge to gain through future genetic investigations of Roma. !
For example, a press release announced a new genetic study in 2012 (Mendizabal 2012): ‘The
Romani people’, it said, ’lack written historical records on their origins and dispersal’. (Cell
Press 2012) To ‘fill in the gaps’, geneticists had ‘gathered genome-wide data from 13 Romani
groups collected across Europe to confirm an Indian origin for European Romani, consistent
with earlier linguistic studies.’ The release quotes co-author David Comas: ‘Their marginalized
situation in many countries also seems to have affected their visibility in scientific studies’. Co-
author Manfred Kayser, a geneticist who mainly publishes in forensic genetics, is quoted with
an evaluation of the wider usefulness of genetic data from Roma:!
‘Our study clearly illustrates that understanding the Romani's [sic] genetic legacy is necessary to
complete the genetic characterization of Europeans as a whole, with implications for various fields,
from human evolution to the health sciences.’ (Cell Press 2012)
An unkown „Romani’s genetic legacy“; Romanis’ Indian origin, their marginalization and invisi-
bility; gaps in their historical records; Europeans’ demand to understand themselves; human
evolution and human health: all those appeals to knowledge are woven together in a single
statement that invokes the unknown and emphasizes the potential.
Yet Roma were quite „visible“, and a continuous stream of publications on Roma were pub-
lished that focused on their heredity and genetics over the past hundred years, since 1921,
resulting in more than 440 publications by 2019. Most of them, 75% (ca. 340), were published
in the past thirty years; ca. 220 of them in the field of medical genetics, ca. 75 in population
genetics and ca. 45 in forensic genetics.
Those studies analyze, compare, discuss or other-
wise draw conclusions on DNA data obtained from individuals labeled „Roma“. In forensic
genetics journals, Roma have been the most intensely studied population in Europe over the
past thirty years
. It seems the geneticists who pursue those studies can build on extensive
data and knowledge in their field.
In this paper, we examine how exactly those authors make their claims of what is known and
what is not known about the Roma, and we examine how Roma groupness and ethnicity is
concluded upon in those genetic studies.
We understand our approach as firmly rooted in Science and Technology Studies (STS). In
this field, numerous publications have tackled issues of genetics and society, and our work
draws on and contributes to this strand of research. With this paper, we wish to address a
broad audience – geneticists as much as social scientists as well as humanities scholars –
some of whom would perhaps not be ready to follow through the kind of theoretical debate
usually introducing an STS paper. Accordingly, we contextualize our findings within the rele-
vant STS scholarship rather sparsely in this section and more towards the end of this paper.
Our goal is to involve colleagues from all relevant disciplines in an interdisciplinary debate.
Hence, both our approach and vocabulary need to be one that facilitates communication a-
This, however, is challenging. Already the abstract of this text has probably triggered reactions
in some readers. Some scholars from the social sciences and humanities, following their first
impulse, might consider any genetic research on minority groups such as Roma as ethically
problematic. On a fundamental level, from their perspective, approaching and describing Roma
with genetic concepts, terms and methods seems like a bad repercussion of dark historical
As justified as such concerns might seem to those readers, others who have a training in ge-
netics might think differently. Geneticists involved in these studies would think that they apply
the same methodological tools as for any other ethnic group. After all, many of these studies
state they have passed the appropriate ethical procedures or have been approved by relevant
institutions, such as ethical committees. After much public debate about ethics in human ge-
netics and human variation research over the past three decades, members of the genetics
community would certainly say they seek the ethically most appropriate, or the least offensive
and harmful approach they can think of. Within those boundaries, the object of curiosity is
justified by its significance, or by its informative value. In the case of the Roma, according to
many genetic studies, this value is believed to be especially high, as we shall demonstrate
One could approach these differences from the perspective of ethics of science, a field of
growing importance, or from the perspective of science policy, highlighting international agree-
ments for ethical standards in genetic research. For the sake of brevity, these perspectives are
not taken here but in another paper that discusses ethical aspects of genetic studies of Roma
(Lipphardt and Surdu, submitted).
Yet between questions about ethical approval procedures and worries about repercussions of
historical moments, there is another level of critical awareness for potential shortcomings of
genetic studies of vulnerable groups. Coming from the perspective of sociology and epistemo-
logy, scholars have warned against the essentializing and reifying effects of representing eth-
nic groups, and have asked pressing questions about how legitimate an object of scientific
curiosity ethnic groups can be. Many social sciences and humanities scholars, as well as inter-
disciplinary author panels, have discussed the risks of geneticization, essentialism and genetic
determinism in this context. Most of these critics have not simply rejected genetic studies of
vulnerable populations, but oftentimes seek a more nuanced and differentiated approach and
call for heightened ethical awareness.
Rogers Brubaker, an UCLA-based sociologist, writes about population genetics and the ‘newly
respectable biological objectivism about race’ (Brubaker, 2015: 54). He warns against simply
‘reasserting the usual mantra that there are no biologically significant differences between so-
cially defined racial categories.’ (Brubaker, 2015: 55) Rather, Brubaker explains, ‘it is not that
socially defined racial categories are entirely arbitrary, bearing no relation to biogeographic
and biogenetic ancestry. Since social understandings of race and ethnicity emphasize origins
and descent, it would be surprising if socially defined ethnic and racial categories did not cap-
ture, in a crude way, some information about biogeographic and biogenetic ancestry.’ (Bruba-
ker, 2015: 83)
To be sure, Brubaker does not request us to simply adopt the idea that social groups neatly
overlap with racial categories. Rather, he claims that genomics can – at least crudely – infer
self-identified ancestry or race from genotype, but, the other way around, it cannot (Brubaker,
2015:82): ‘If there is information on the self-identified ancestry or race of an individual, one
cannot infer their genotype. This is because „genetic variation does not take the form of dis-
crete and sharply bounded groups.’ (Brubaker, 2015:83)
Others have warned more strongly against neglecting the disconnects between ethnic or racial
labels and genotypes, and pointed to the importance of considering the sampling as a critical
moment in genetic research on human variation. (Fujimura et al., 2014; Nash, 2013)
Brubaker, too, is well aware of the risks of the „new objectivism“: ‘By providing a natural foun-
dation for social identities, geneticization can essentialize, even absolutize understandings of
difference.’ But Brubaker hopes that there is also something to gain from embracing the new
objectivism – a kind of de-essentialization that could ultimately help to fight racism:
‘Yet by highlighting the genetic heterogeneity within any collectivity, the dominance of within-group
over between-group variation, and the histories – ancient and modern – of migration, gene flow, and
admixture, geneticization can undermine understandings of pure, internally homogeneous, externally
bounded groups.’ (Brubaker, 2015:54)
Genetic variation understood as non-discrete, not sharply bounded, not pure and not structu-
red into homogeneous groups: without doubt, many geneticists would subscribe to this under-
standing of human genetic differences. Some population geneticists may be deeply engaged
in a research agenda along these lines. And it fits well to Brubaker‘s groundbreaking book
„Ethnicity without groups“(Brubaker, 2004), with its influential critique of essentialist and deter-
ministic understandings of groups and ethnicities.
But seen from the perspective of the groups or ethnicities that have been studied as distinct
genetic populations, genetics does not look unified in this regard: Non-essentialist understan-
dings of genetic variation were often not the guiding principle for the geneticists. Some classi-
cal examples for genetically isolated populations have never been framed in other than essen-
tialist ways; they have invariably been described as discrete, sharply bounded, more or less
homogeneous groups. The Roma are but one example for which Brubaker’s hope is not justi-
fied: the undermining of essentialist understandings of groupness by genetic studies has not
worked in their case.
A number of studies from the social sciences and from STS (science and technology studies)
on special populations in genetics and genomics have appeared in the past decade: For exam-
ple, for populations in Brazil (Santos, Da Silva and Gibbon, 2015), Mexico (Benjamin, 2009),
and other South American countries (Kent et al., 2015); Iceland (Pálsson, 2008), Finnland
(Tupasela 2016), Quebec (Hinterberger, 2012), Singapore (Ong, 2016), and Taiwan (Tsai,
2010). Reardon (2005) and M’charek (2005) have written about isolated populations in the
Human Genome Diversity Project (HGDP). Munsterhjelm (2014) demonstrated how the Kari-
tiana, a small indigeneous group in Brazil, “became famous in forensic circles”: not because
they were so overtly criminal, but because their genomes, accessed without their consent,
were so readily available to forensic geneticists, and such effective research tools due to their
supposed isolation and inbreeding.
Genetic studies on Roma, however, have hardly been the focus of social scientists, or at least
not in comparable depth (Cazacu et al., 2013; Myers, 2019). But that focus can add new in-
sights: Geneticists have conceptualized Roma as different from other genetic isolates, as a
very specific isolate indeed. Similar to Jews, they are seen as a transnational isolate (Jobling
2014: 448) or a diaspora group. But in contrast to Jews, the authors of these studies believe,
Roma have no “written records” of their own history, and in contrast to religious communities
such as the Amish, they have no genealogies (Floersch, Longhofer and Latta, 1997). In con-
trast to “Native Americans” who are viewed as indigenous groups (Tallbear, 2013), Roma are
depicted as a foreign population, while the comparison groups are seen as “indigenous” or
“autochtonous populations”. Unlike the Karitiana with only a few hundred individuals, the Roma
in Europe officially count several millions, which allows for very different research designs. The
Finns, another so-called genetic isolate in Finland, have attracted much of the geneticists’ in-
terest since the 1980s (Tupasela, 2016; Tarkkala and Tupasela, 2018), but the Finns are not
seen as a “transnational isolate”- for obvious reasons.
Other differences between the Roma and other so-called “genetic isolates” might strike the
social scientist much more than the geneticist: The history of Roma being studied as a genetic
isolate started hundred years ago; and much more than the Finns, Roma are a vulnerable
minority heavily discriminated against until today. In contrast to the Saami, who have establi-
shed a Saami council in 1956, there is no Roma constituency that could prompt or preclude
research, or successfully claim some of its economic benefits.
The geneticists studying Roma would say that genetics regards them as a genetically isolated
population (see, for example, a separate chapter in a text book on human evolutionary genetics
by Mark Jobling, 2014: 448), and therefore they are justifiably viewed as a genetically bounded
group. Yet this view misses an important point: Adopted as a conceptual premise, and then
turned into a sampling strategy for genetic studies, the rationale of the „isolated population“
becomes a circular logic, a self-fulfilling prophecy. To highlight this tautology is the main aim
of this paper. !
An advanced social sciences approach would firstly check the geneticists’ groupness claims
against state-of-the-art academic literature about Roma, and secondly ask for the representa-
tion of Roma in the genetic studies. An advanced life sciences approach would seek to repro-
duce research results with DNA data and then ask for a thorough inspection in terms of scien-
tific standards. Regarding ethical questions, there would likely be a convergence of life science
and social science critics of genetic essentialization.
In what follows, we leave aside most of these questions, and concentrate on one aspect we
deem to be of interest for all sides: representativity. Whether a phenomenon is captured well
in a scientific study depends much on adequate methodological considerations about how to
represent it. If the main unit under investigation is „all European Roma“, or „European Roma“,
or, as we can find it in studies until today, „Gypsies“, then the obvious question is how to
represent this main unit in a scientifically sound and satisfying way.
In a social sciences methodology course, students learn what a main investigation unit (or
population or universe) is, how it should be represented, and what methodological flaws one
must avoid. In political science, students would learn how citizens of a nation state are to be
represented. For geneticists, however, this kind of social or political representativity – (how
can a small number of people represent a large number of people such as a nation state’s
population?) – has not been the center of their concerns in the past.
Asking test subjects to
self-assign to an ethnic category has become routine in biomedical research,
but this is not
the same as representing populations and their history.
What is known and knowable about the Roma through genetic studies, then depends on how
Roma are recruited, sampled and represented in DNA databases. Whether the Roma (the
European Roma, or the Roma in any given country such as Bulgaria, Romania, Hungary or
Spain) are represented adequately in DNA databases is hence as much an issue for scholars
from the social sciences and humanities as it is an issue for geneticists. No discipline alone
can come to a conclusive judgement on this issue without consulting the other.
With a qualitative approach
to genetic studies of Roma published in the past three decades,
we aim to point out the conceptual challenges of representativity. From several hundred, we
have selected a handful of studies for a focused analysis, most of them from the field of popu-
lation genetics with a main interest in the migration history of Roma. (Some medical genetic
studies are included and indicated as such; we are aware that the challenges of representati-
vity are not the same in population and medical genetics.) We have selected these studies for
their recent publication date and their academic and public impact, and not because we think
that they belong to the ethically and methodologically most problematic papers in our sample.
We do not discuss the selected genetic studies and their results in depth, but concentrate on
sampling practices and representativity. Ethical questions, although unavoidably popping up
in these papers, need to be postponed for upcoming publications. In spite of the problematic
aspects we are going to point out, it is still important for us to state that in studies from the past
ten years, we have noticed a trend towards more transparency regarding ethical procedures,
self-assignment in recruitment, more cautious wording and more balanced methodologies. The
papers we discuss embrace up-to-date critical awareness to varying degrees, but nevertheless
reveal a lack of societal awareness that has consequences for research designs, methodolo-
gies and findings.
For simplicity, in what follows, sometimes the geneticists involved in the genetic studies on
Roma are called „authors“.
What do authors of genetic studies claim to know about the Roma?
Our examination of the authors’ epistemic claims about Roma begins with numbers. How many
Roma are there in Europe? Nobody knows, and there is no good way of knowing (Surdu, 2016,
2019). States count their citizens, which leads to more or less accurate numbers, but there is
no such count for all European Roma. Where Roma census numbers exist, state authorities
(but also scholars, international organizations or Roma leaders) do not trust them to be correct:
Roma, census takers claim, oftentimes do not self-identify as Roma because they fear
discrimination (Surdu 2016).
That’s why their total number is said to be uncertain; estimates
vary between 4 and 12 million. However, if there is no certainty about Roma population size,
this will affect claims to representativity, as well as the reliability of figures for the prevalence
of mutations in this group.
The study by Martinez-Cruz for example admits that ‘social and political factors preclude the
collection of precise census on the Roma’ (Martinez-Cruz et al., 2015:1). ‘They are acknow-
ledged as the largest ethnic minority of Europe, with a population of up to 10 million people
spread across the continent and mostly concentrated in Central and South-Eastern Europe’
(Martinez-Cruz 2015: 1). The authors’ representation of a group of humans in a genetic frame-
work, then extends to up to 10 million people, distributed over thousands of kilometers. All this
is based on estimates that are not produced by any academically recognized methodology;
which is indeed, a challenge in terms of representativity.
The central premise, the pre-assumption or starting point for most of the studies is that Roma
in Europe are an isolated population. Some authors call it „diaspora“, others „genetic isolate“.
In the past decade, some authors in population genetic studies have admitted a certain degree
of „admixture“ from the majority society, but the overall notion of a rather more than less ge-
netically isolated population still holds implicitly and explicitly. For example, the press release
mentioned above quotes Manfred Kayser:
‘From a genome-wide perspective, Romani people share a common and unique history that consists
of two elements: the roots in northwestern India and the admixture with non-Romani Europeans accu-
mulating with different magnitudes during the out-of-India migration across Europe’ (Cell Press, 2012).
This description speaks of a well-defined process that can be modeled and quantified. In Na-
ture, the same study is described in the section „Research highlights”, under the title „Romani
have Indian ancestry“:
‘The 11 million members of Europe’s largest minority group, the Romani […], are descended from a
single population that left India some 1,500 years ago and dispersed across Europe through the Bal-
kans. [The research team] analysed the genomes of 152 Romani individuals from across Europe and
compared them with those of populations worldwide. European Romani probably originated from nor-
thern and northwestern India. Genetic analysis suggests that, after leaving India, Romani ancestors
interbred with local populations on the way to the Balkans before beginning to spread throughout Eu-
rope around 900 years ago. Since then, Romani have interbred with local populations in Europe.’
(Nature 492, 2012)
While this text does not give any indication of the extent of admixture, and while a large extent
would contradict the assumption that they are “descended from a single population”, the study
itself finds considerable evidence for admixture, recent and long ago, but nevertheless depicts
it as rather limited. The conclusions state:
‘Our data suggest that European Romani share a common genetic origin, which can be broadly ascri-
bed to north/northwestern India around 1.5 kya. After a modest genetic contribution from the popula-
tions encountered through their rapid diaspora from India toward the European continent, our data
indicate that the Romani dispersed from the Balkan area around 0.9 kya. We further observe evidence
of secondary founding bottlenecks and small population sizes, together with isolation and strong en-
dogamy.’ (Mendizabal et al., 2012)
In this description, the Roma are still a sharply bounded, discrete genetic group, isolated and
strongly endogamous, yet not pure and homogeneous. The extent of admixture, however, is
depicted as quite limited, temporarily, geographically and dimensionally:
‘Our data further imply that in more recent times, temporally and geographically variable admixture
events with non-Romani Europeans have left a footprint in the Romani genomes. Overall, our analyses
suggest that despite the relatively short time span, the demographic history of the Romani is rich and
complex. Further studies with more dedicated geographical sampling and resequencing data would
help in defining the Indian parental population of the Romani, as well as further details of their migration
and subsequent history in Europe.’ (Mendizabal et al., 2012)
According to this account, the Roma have “Romani genomes” with a „footprint“ of admixture in
„recent times“. What has priority for the authors, however, is to define the „Indian parental
Geneticists writing about Roma history usually take advantage of prior literature and a hypo-
thesis to test, for example a hypothesis that entails a grand narrative
about Roma history.
If the gained knowledge does not resonate with widespread societal and cultural narratives, it
is more difficult to find support.
Accordingly, these articles sometimes build on unsubstantiated evidence (e.g. medieval chro-
nicles or folk myths, as in Kalaydijeva et al 2005:1086), or often on academic knowledge from
the humanities: Linguistic and anthropological studies are cited frequently and cursorily, as in
the public release quoted above, and mostly as evidence for the Indian origin of Roma. Rarely
do these references point to cutting edge research, but often to articles and books published
some decades ago.
In most cases, these accounts are used as a starting point or as a his-
torical source for the Indian origin of Roma and for their migration routes in Europe.
Further information is needed on the lifestyle of Roma, or else the studies could not build on
their central hypotheses of isolation. Cultural and ethnographic studies are referred, often
without citations or only cursorily, as providing knowledge on their cultural characteristics
’Studies investigating Roma culture revealed significant similarities between Roma and Indian culture
including the caste system and endogamic habits that means exclusive marriage within Roma sub-
ethnic groups (clans).’ (Melegh 2017:1)
‘Traditional social organization based on strict and complex rules of endogamy and particular popula-
tion history of the Roma have similarly shaped Romani population structure as well as epidemiology
and molecular architecture of single-gene disorders.’ (Salihovic et al., 2011)
’The majority of Roms still preserve their language, traditions, and lifestyle, and their communities
remain almost totally genetically isolated not only from the surrounding population but also from one
another. Endogamy is a strict rule, consanguineous marriages are frequent (15-45%), and the inbree-
ding coefficient ranges among the highest worldwide.‘ (Plášilová et al., 1999: 293)
A cultural tradition of endogamy: In this interpretation, any separation of Roma from the rest of
society is self-inflicted, voluntary, precisely what Roma culture dictates. The genetic isolation
is hence depicted as a consequence of self-determined social separation, implying that, typi-
cally, Roma have offspring with Roma because they prefer to choose their marriage partners
among themselves. Other factors that also contributed to the societal isolation of Roma, such
as discrimination, ghettoization, stigmatization, exclusion or persecution, are rarely taken into
account in this narrative.
Some geneticists studying Roma could now say, well, there might be different reasons for
isolation, but the reasons don’t matter: Isolation is just isolation, the result will always be an
isolated population. However, factors like discrimination or ghettoization would not only lead
to societal isolation followed by genetic isolation – they would contribute to genetic isolation
differently: If voluntary endogamy is the causal factor for isolation, then the criteria of the com-
munity determine who is considered an acceptable marriage partner and who is not. If discri-
mination is the causal factor of isolation, the majority society determines the criteria for exclu-
sion. These are not the same: For example, in many Jewish families and communities, being
Jewish requires to be born to a Jewish mother, but this criterion is not shared by all majority
societies of the Jewish diaspora. Also, some majority societies have excluded and ghettoized
Roma together with other groups of “undesired” people – but these groups would not neces-
sarily have been among the acceptable marriage partners for Roma families and communities.
In some countries or regions, Roma were not the only ones affected by exclusion, and exclu-
ded groups were relocated to separate settlements – ghettos – together. Also, a suppressed
minority can typically not maintain their own „traditions“ with regard to marriage and reproduc-
tion: Being enslaved, for example, strongly limits one’s reproductive freedom.
xities, however, varying considerably from country to country and from region to region, are
not considered in the genetic studies.
What does representativity mean, and why does it matter for genetic studies of populations?
For each of the subfields – forensic, medical and population history genetics –, representing a
group comes with different challenges, particularly with regard to the application contexts. For
example, a medical geneticist needs to know how prevalent a mutation is in a population. A
forensic geneticist who wants to build up a reference database, for checking allele frequencies
in order to estimate how frequent a profile of a suspect is in a given place, needs to know
whether she has tapped into substructure, or whether there might be real-world populations
unrepresented in the database. A population history geneticist wants to collect DNA samples
from individuals who most likely represent a supposed historical group, that is, whose ances-
tors have only married within their own group since historical times. Yet ultimately, in any of
these cases, the geneticists who want to make claims about “all European Roma” must consi-
der questions of representativity. Otherwise, they risk making claims that only hold for a small
subgroup, or for that matter, not at all.
How would one plan to represent „all European Roma“, or 10 million Roma, in a DNA data-
base? Given to the hands of some social scientists, for many of them, the first thing to do would
perhaps be defining criteria to decide who qualifies for the sample and who does not, while
aiming at an appropriate sample size. This would cause considerable discussions, as the task
is complex and raises fundamental questions.
To provide an example from biomedical research, „all Germans“ would become defined as „all
German citizens“, hence, only people with a German passport would qualify. In fact, biomedical
large scale studies such as the „National Cohort“ do consider representativity issues, as they
recruit through the registration offices in order to ensure that the „results of the investigation
will be transferable to the overall population [of Germany]“ (NaKo Gesundheitsstudie website).
This may still prompt discussions about representativity, however, there is a clear framework
with statistical data available for contextualization and comparison.
There is, however, no Roma nation state citizenship to turn to for that criterion. Language
would not be a reliable identifier either: Romani is a language spoken by some, but not by all
people considered (or self-assigning as) Roma. Other criteria are even more questionable: A
„Romani culture“ is impossible to nail down; Romani surnames don’t work either. Being discri-
minated as a Roma or „Gypsy“ by others is also no solid criterion. In addition, these criteria do
Cutting edge historical and sociological evidence demonstrates that Roma have no common
language, neither common territory, religion, cultural practices nor social status; some scholars
call them a “superdiverse” group (Tremlett, 2014). Scholars from the social sciences and hu-
manities today largely agree on the fluidity, complexity and situatedness of Roma identity (e.g.
Bogdal 2011; Jonuz 2009; Kovats 2013; Law and Kovats, 2018; Stewart, 2013;!Surdu, 2016;
Surdu and Kovats, 2015; Veermersch 2005). The self-assignment as „Roma“ is no good proxy
for the external assignment, and vice versa. “Gypsy” cannot be viewed as synonymous with
“Roma”. Social historians consider the term “Gypsy” a construction imposed with differing ra-
tionales by national governments and administrations, through a long history of labeling, stig-
matizing and repressive control, up until the genocide under the National Socialist (NS) regime
and beyond (Lucassen 1991, 1997; Lucassen/Willems/Cottaar 1998, Mayall 2004; Willems
1997). In a number of case studies, scholars have shown that marrying partners from outside
of the group is relatively common (e.g. Okely 1983; Fraser 1992; Achim 2004, Stewart 1997).
Framing Roma as one generic group is hence seen as a form of racialization, or essentializa-
tion (e.g. Law and Kovats 2018; Surdu, 2016; Yıldız and De Genova, 2017).
On the one hand, in many countries or regions, Roma have experienced long and repeated
phases of integration – leading to what the geneticists would call “mixing”. On the other hand,
in different places, Roma were (and are) segregated, ghettoized and forced into societally and
geographically marginal places by decision makers and authorities (About, 2012; Donert,
2008; Filhol, 2013; Berescu, 2019; Kóczé, 2018; Picker, 2017; van Baar, 2015, 2018; Vincze,
The vast existing scholarship on past and present integration and exclusion of Roma in diffe-
rent countries suggests that it is very difficult to sample or represent the Roma as a group. In
census taking and in the social sciences, self-assignment is viewed as the most advisable way
for data collection and for research about identity building. Hence, many social scientists would
only admit people to the sample who self-identify as Roma. The same holds true for much of
biomedical research today, since the US has introduced census categories (based on self-
identification of race, including multiple racial belongings) for test subject recruitment (Epstein,
However, self-identifying as Roma rarely comes with benefits in societal contexts where Roma
are discriminated (Jonuz, 2009). This is also the reason why census takers distrust the data
they have collected on Roma: Roma are believed to „hide“ their „true“ identity – that is, the
identity census takers would have ascribed to them. However, individuals may identify with
various population labels due to their being well integrated; self-assignment may situationally
emphasize one or the other category of belonging. For geneticists, this makes the sampling
criteria of self-assignment problematic.
External identification, that is, the identification of Roma by others, such as doctors, nurses,
social workers, teachers, police officers, community leaders, neighbors etc., would be seen as
problematic by most social scientists both on methodological and ethical grounds. In their
influential empirical research drawing on a large set of data, Ladányi and Szelényi (2001) de-
monstrated that self-identification as Roma and external identification do not overlap, or are
not equivalent to each other. Instead, understandings of Roma ethnicity vary greatly across
There is also considerable variation depending on the classificatory work invested by experts
and fieldworkers in survey practices: for example, two sets of fieldwork are likely to produce
incongruent classifications of Roma (Ladányi and Szelényi 2001). Representativity also has a
strong technical dimension (Fujimura and Rajagopalan 2011): If one is to use some of the
standard software on two populations in order to examine their genetic relations, one needs to
make sure the two populations have been sampled in a similar way, that their sizes are of the
same order of magnitude, and that the two populations were sampled under the same concep-
tual framework (in our case, that of a genetic isolate). None of these conditions seems fulfilled
in genetic studies of Roma: For studies from Hungary, for example, reference data of Hunga-
rians are drawn from a national database, not from some isolated rural settlements.
Beyond these methodological issues, to use the external appearance as a criterion for recog-
nizing Roma – a criterion used sometimes by geneticists – would be considered much more
problematic and even racist by social scientists.
How do the geneticists handle this complicated issue? After all, their investigations and the
validity of their results completely depend on the chosen samples.
The population genetic studies on Roma from the past three decades are strikingly thin-lipped
about their sampling criteria and practices; sampling schemes are not made explicit. Issues of
representativity are rarely mentioned, and if so, not in an informative way. „The donors were
real representatives of the entire population, as they were collected in a nationwide project“, a
study of allele frequencies says (Magyari et al., 2014: 149). Mendizabal et al (2012) state:
„Alternatively, mixed couples may leave the Romani communities and integrate into the non-
Romani societies, and thus would not be sampled from Romani groups in these countries.“
This suggests that „mixed“ and „unmixed“ couples segregate neatly; it marginalizes cases of
„unmixed“ couples leaving the community and „mixed“ couples staying within the community.
The silence on sampling is a relatively recent phenomenon. In their seroanthropological publi-
cations from the 1920s to the 1980s, geneticists were much more explicit regarding their
sampling procedures.!Many of their efforts were aimed at avoiding “mixed” individuals because
they were interested in “pure Gypsies”, who were, as they admit, hard to find and recruit. The
idealized test subject was the nomad, even though nomadism was a marginal phenomenon;
yet nomads were seen as most isolated from the society and hence optimal for genetic studies,
however rarely willing to cooperate. Towns of all sizes, where people tend to “mix”, were avoi-
ded. Potential individuals were excluded if, upon being interviewed for recruitment, they said
they were born to a mixed couple, or if the recruiters’ expectation regarding Roma life style,
culture or outlook weren’t met otherwise. Two studies (Clarke 1973; Rex-Kiss et al. 1972) ex-
plicitly state that one crucial criterion of selection was the visual inspection of the ‘external
somatic features’ of the recruited subjects (Rex-Kiss et al. 1972: 358).’ His sampling strategy
led him to prisons; other researchers turned to other institutions that, for one purpose or ano-
ther, classified and treated Roma separately from other citizens.!
We firmly assume that sampling would be done differently today. How exactly it would be
done, though, remains unclear. If sampling information is given, it is vague and abstract. Many
studies rely on data shared by other teams; when it comes to describing the sampling scheme,
the authors point to the team that has collected the data.
Following up on the latter’s publi-
cations, one can sometimes not find any information on sampling there either. Rather, the
information given there is sometimes even more vague. In some cases, following those refe-
rences back through the literature shows that the data has in fact never been published.
Yet sampling practices are not completely opaque. Approaching individuals for recruitment in
population genetic studies, the DNA collectors still want to make sure they do not include
people with no or too little Roma ancestry. From some studies, single hints can be gathered
as to how this might have happened, and we have checked these observations with two expert
interviews. As it seems, in some cases, questionnaires may ask for an individual’s self-assign-
ment, for the ethnicity or self-assignment of their four grand-parents, for lifestyle parameters,
for their mother tongue, for cultural traditions, certificates or registries. It seems not unusual
that scientists rely on the external assignment by a third person or institution: for example, a
doctor, a community senior, a state official. Physical appearance is yet another selection cri-
terion, still used today, even if, perhaps, not systematically and not explicitly.
Self-assignment is mentioned in some studies as a sampling criterion. For example, in one
study, samples are described to derive from „27 self-declared Romani“ (Gomez-Carballa et al.,
2013: 2). But on the other hand, it is seen as a rather unreliable information. Mendizabal et al
(2012) state in the supplement: „All individuals included in this study were self-identified as
Romani [sic]. Importantly, the self-identification as Romani is a delicate matter in some Eu-
ropean countries due to the social stigma attached to Romani identity; hence additional infor-
mation obtained in sampling can be scant“ (Mendizabal 2012, Supplement, 1). How the teams
overcome this problem, what other criteria they use instead, remains unclear.
Sampling issues: family relationships, small samples, privacy and social pressure
Population genetic studies need to account for the risk of tapping into population substructure,
particularly if it is about isolated populations (Ehler and Vanek, 2017). If samples are taken
from the same family or neighborhood, the risk of sampling bias is considerable.
This is relevant for considerations of representativity in our case, especially in the context of
small samples that were collected in small, ghettoized Roma settlements. In some cases, what
is taken to be a representative sample of Roma in a specific country, or even of “European
Roma”, may in the worst case be based on a limited number of related community members,
in a limited number of locations that have become exclusive sampling sites for genetic studies
of Roma over the last decades: Several locations in some East European countries have been
long term sampling sites for genetic studies (e.g. Baranya county in Hungary; Kosice in Slova-
kia). Even more problematic, some of these samples have been used and shared for decades.
Some of the studies give information on how and where sampling took place, such as the
names of villages or city quarters. Martinez-Cruz (2015), for example, recruited 110 subjects
from seven neighboring villages in Greece, which seems problematic if family relations need
to be avoided. (Whether publishing this information, including the villages’ names, complies
with privacy and anonymity obligations is yet another open question.)
In other cases, samples were collected in clinics, doctors’ offices or medical care institutions,
or in health care schemes addressing Roma or people with a specific health problem. While
health care systems seem to ease access to individuals who have previously been labeled
„Roma“ by that very system, it is in many cases unclear whether those people would self-
identify as Roma, or under what circumstances they would (not). Several forensic genetic pa-
pers using DNA data from Roma list co-authors affiliated with police, investigative or military
forces (for details see Lipphardt and Surdu, submitted). Three forensic studies explicitly men-
tion that their samples were collected by medical doctors (Nagy et al. 2007: 25; Saiz et al.,
2014, re-using the data of Novokmet and Pavčec, 2007). The ethnic categorization of samples
by the collectors indicates that systematic ethnic labelling for Roma is in place in medical in-
stitutions in some countries.
When such data collected in health care settings is used to address population genetic ques-
tions, this can have problematic consequences for representativity: after all, specified health
care programs attract communities and families nearby, as well as, plausibly, people with si-
milar genetic dispositions to particular non-genetic conditions. In the case of health care
schemes addressing genetic diseases, it is plausible to assume that relatives will show up at
the medical institution to get health care. Third degree cousins, for example, aware or not
aware of their kinship, are genetically more similar to each other than unrelated individuals.
Hence the necessity to account for the risk of sampling bias, especially since geneticist authors
describe Roma communities as „inbreeding“, „consanguinous“, „endogamous“, or as large,
complex family networks. Test subjects are obviously often asked for family information, so
that known relationships could be detected already in the doctor-patient conversation; but
doing this without infringing privacy is a challenge if family members do not come to the care
facility together. To put it differently, with „inbreeding“ and „isolation,“ the exact background
that make such families interesting for researchers are also the things that undermine the
validity of sampling, not just from a biological point of view, but in an ethical sense as well.!
Technical controls for kinship are mentioned in some studies. The 2013 study from Spain fo-
cused on ‘27 self-declared Romani within the framework of ESIGEM [...]; all these individuals
had suffered from meningococcal disease.’ (Gomez-Carballa et al., 2013: 2) The authors
checked for family relationships by identity-by-state analysis and found only one pair of
individuals (among the 27) matching the criterion of „closely related“. Another pair showed
statistically significant evidence for second degree relatedness. That’s four out of 27 – and
more sensitive methods than genetic epidemiologists use them today, would probably find
even more family relationships.
This could particularly be the case if very small sample sizes were used as representative for
a national minority population. For example, Mendizabal et al (2012) use a sample of eight
Lithuanian individuals for their study of European Romani. The complete sampling information
reads: ‘The Lithuanian Romani were sampled in the "Kirtimai" tabor (Roma settlement) in Vil-
nius. They belong to Verchnij tabor group and are mostly Polish speakers.’ (Mendizabal, 2012,
supplement, 1) Kirtimai is a neighborhood only a few kilometers south of Vilnius’ old city center.
It is the only compact Roma settlement in Lithuania with a population of 354 to 500 people,
depending on the source of the estimate (Poviliunas, 2011). Eight individuals from one single
city neighborhood, which is a small compact Roma settlement, are probably easy to re-identify.
It seems unlikely that there are no family relationships between them. Mendizabal et al (2012)
state that they have used „Tukey’s outliers detection“ to remove ‘individuals either showing a
higher amount of inbreeding or larger than average identity-by-state distances in their sampling
population’. (Mendizabal, 2012, supplement, 1). This, however, is probably not sufficient to
exclude kinship in this specific community. A small sample size from a small, societally exclu-
ded community carries a high risk for so-called “cryptic relatedness”.
This would make the
inference of the history of a larger Roma population from that sample even more questionable,
because the risk of capturing population substructure is larger in such a small local sample.
The Kirtimai sample of eight is perhaps rather representative for a locally specific population
substructure than for Lithuanian Roma in general. Lithuania, after all, has a population of ca.
2.500-3.000 self-declared Roma, and they do not all live in Kirtimai.
Notably, it is unclear under what conditions these eight samples were collected: Mendizabal
et al. (2012) do not give any reference. Gomez-Carballa et al. (2013) also use a Lithuanian
sample and refer to Gresham et al. (2001). Gresham et al (2001) use 20 Lithuanian samples
without giving a reference, implying this is primary data. As Vaidutis Kucinskas from the De-
partment of Human and Medical Genetics at the Faculty of Medicine of Vilnius University is co-
author on all three studies, it seems plausible that he has contributed these samples, but with-
out any information on the sampling, it is hard to tell what exactly these 20 or 8 samples repre-
sent. Without any information on data attrition, it is also hard to tell why the sample has been
reduced to eight. This is only one out of many examples; in order to learn more about such
instances, one would need to contact co-authors in dozens of cases in which the relevant
information is lacking.
Excluding “mixed” individuals; removing data sets from the samples
While much of the sampling practices in the field seem to aim at narrowing down on those
individuals who represent the descendants of the „proto-Romani“, i.e. the group that departed
from India some 1,000 years ago, for some geneticists there still seems to be too much noise
in the samples the recruiters bring to the lab. In particular, we found instances in which resear-
chers attempted to avoid „mixed“ individuals in their samples: Individual data sets might be
excluded from a sample in the lab, after the DNA analysis yielded a result that does not accord
with what genetically was expected from a Roma.
For example, Melegh et al (2017) use genome-wide SNP data from 179 „Roma samples“.
Twenty-seven of the samples had been documented in another study (Moorjani et al., 2013),
which states that most individuals were „from Hungary“
(Moorjani 2013: 8). Those 27 partici-
pants had extensive interviews before giving written informed consent; about their self-assign-
ment, the authors state: ‘Roma individuals self-reported as being descendants of the same
tribe for at least three generations’ (Moorjani et al., 2013:8). These 27 samples were then
merged with the dataset of 152 samples from Mendizabal et al. (2012) discussed above – in
which sampling criteria were not described in any detail –, and the overall dataset was treated
as one Roma dataset without any further subdivision.
The sampling rationale clearly favoured isolated groups – “tribes” – and Indian origin. But in
spite of their sampling strategy to include only „descendants of the same tribe for at least three
generations“, the authors state: ‘Our results showed that Roma have on average 81.08% +/-
0.53% West Eurasian related ancestry.’ (Melegh et al 2017) And yet, to arrive there, the
authors had to do much more than asking for tribal affiliations:
„Based on PCA and clustering methods, we removed Roma individuals from the merged Roma
dataset, which showed significant admixture with non-Roma Europeans. The merged dataset
contained 158 Roma samples featuring 599,472 autosomal SNPs.“ (Melegh et al, 2017: 2)
This means that, even after excluding 11% of all participants on the basis of DNA results as
“admixed” individuals (namely those who had too much European ancestry in the eyes of the
authors), “West Eurasian” ancestry still dominated heavily.
Put differently, in order to demonstrate the Indian ancestry of Roma, the authors of some of
these studies have removed samples of those individuals of whom they think are not “Indian
enough”. They are left with, unsurprisingly, some Indian ancestry, but only as a minority subset
within a bigger sample with a huge amount of admixed ancestry, mostly from Europe. And yet
the minority subset is seen as the authentic, authochtonous part representing Roma while the
larger sample with European ancestry is seen as the admixed interference. In other words, on
top of the already restrictive sampling strategy, another layer of filtering is added to ensure that
only a subset of individuals with some Indian ancestry would be retained for analysis.
Excluding „mixed“ individuals is a concern that applies to many studies. In a series of publica-
tions, Hungarian authors documented concerns about the representativity of a large shared
sample: Kosa et al. (2015:303) claim that their sample is not representative because, firstly,
”assimilated” Roma had not been included, and secondly, because some of the Hungarians
sampled for the comparison group might have been Roma themselves. They conclude that
this may have ‘slightly diluted the true difference between the populations’ (Kosa et al., 2015:
303). No matter how confusing social realities proved to be, no matter how well integrated or
„mixed“ people in Hungary were, the authors remained concerned with the „true“ difference,
the most clear-cut difference, so to speak, here referring to the genetic difference.
Similarly, Nagy et al. (2017), using Kosa’s data, conclude that ‘the presence of participants
with mixed Roma/non-Roma ancestry […] may result in a slight underestimation of the differ-
rences between the populations’ (Nagy et al., 2017: 455). Piko et al. (2017) and Fiatal et al.
(2016) both use Kosa’s data and state that ‘those Roma who have, to various degrees, assi-
milated with the Hungarian general population’ have been excluded from the sample. Fur-
thermore, both studies state that ‘because many people are reluctant to self-define their
ethnicity as Roma, this constraint would be very difficult to overcome (Piko et al., 2017:124;
Fiatal et al., 2016:2265).
With the data, or with DNA samples, the pattern of thinking is shared, too: If the „representative
sample“ of the general Hungarian population included some people who were „Roma“ in the
eyes of the authors, this had been revealed by the DNA analysis; no matter how those indi-
viduals would self-identify, genetically they had to be considered Roma. The Hungarians, then,
were assumed to be sampled for their „unmixed“ Hungarian ancestry; and if the sampling was
successful, according to the authors, genetically speaking, there should be no “Roma” in that
As STS scholar Star (1983) has demonstrated, filtering data in the laboratory sometimes is
part (but should not be ) of the scientific work of transforming „ill structured” problems into “well
structured” problems: by ignoring complexities and making choices in all stages of the research
process, often under conditions of scarce resources and pressure to deliver significant results.
Terminology and wording
Representation also happens through language. In many studies, social, cultural and political
separation comes to be re-interpreted in genetic terminology. Martinez-Cruz et al (2015:1) hold
that Roma are ‘[...] an excellent model to evaluate the consequences of recent, multiple, and
widespread dispersals and founder events’ which sounds like a self-confident statement of
solid knowledge. !
Similarly, a medical genetic study states: ‘The Gypsies are a young founder population com-
prising multiple genetically differentiated sub-isolates with strong founder effect and limited
genetic diversity.’ (Kaneva et al., 2008:191) The population is supposed to have ‘a substructure
that can greatly facilitate the mapping and identification of disease genes.' (Kaneva et al.,
2003:105) ‘Endogamy and inbreeding [...]’, another publication states, ‚lead to the accumula-
tion of hereditary disorders’. (Tournev, 2016:95) And in another study: ‚The proportion of slight-
ly deleterious genetic variants accumulates during bottleneck events as the efficiency of puri-
fying selection is diminished in small populations.’ (Mendizabal et al., 2013:198) The mapping
and identification of disease genes is the puzzle the scientists aim to tackle by employing their
supposedly well-established model of an isolated population.
These quotes may sound like purely technical terminology – except for the population label
„Gypsy“ –, but in fact, such statements are as much about society as they are about biology.
Each term stands for a specific interpretation of societal situations Roma have experienced.
Furthermore, there seems to be a misfit between the positive appraisal of usefulness on the
one hand (‘greatly facilitate the mapping and identification of disease genes’; ‘Our special "re-
search tool" will be the unique genetic heritage of Gypsies’, Jordanova n.d.) and the negative
connotations of inbreeding, deleterious genetic variants and selection on the other. Wordings
such as these seem to speak for an instrumentalizing approach rather than one driven by
empathy with people in miserable health conditions. While it is arguably not a priority for ge-
netic publications to demonstrate empathy, there are role models in the field who manage to
convey empathy in their scientific publications.
Some genetic studies speak of “Gypsy disorders”, “Gypsy mutations”, and even of “Gypsy
chromosomes” (e.g. Morar et al., 2004). To call the population under investigation “Gypsy” or
“Roma” is obviously seen as a scientifically irrelevant decision by many scientists. In conver-
sations, we are told that this is just a question of sensitivity; many geneticists seem to strive
for using the label least offensive for their test subjects.
However, this cautionary approach
might apply to consent forms and personal contact, but whether it also applies to scientific
publications or not, is not so clear: Would geneticists expect their individuals in question to
read these publications? Would this be different for a societally well-established minority, as
compared to a poor and discriminated one with high levels of illiteracy? In any case, quite
many authors of these studies find it unproblematic to call the population “Gypsy”, although
quite many Roma would find this offensive. In conversations, we are sometimes told that it
seems justified to ignore “political correctness” because even some Roma call themselves
Viewed from the perspective of representativity, the following groups are not congruent and of
very different size: Individuals who are willing to identify as “Gypsies” in private and public;
individuals who are willing to identify as “Roma” in private and public; individuals who are called
“Gypsies” by others; individuals who are called “Roma” by others. If genetic studies do not
detail their sampling strategies in this regard, it is unclear what they represent. What is pretty
clear, however, is the fact that these authors, when using the term “Gypsy”, risk to offend many
of those they wish to represent.
The main unit: „All European Roma“; or rather „those with ancestors from India“? !
Before we continue with reporting on the genetic studies’ sampling practices and representta-
tivity, we include here an intermediating thought to make it easier to follow the rest of this
For social scientists who are not familiar with the relevant STS literature, the observation that
these genetic studies seem to ignore obviously problematic aspects of representativity can be
puzzling, even disturbing. In agreement with relevant STS literature (e.g. Fujimura and
Rajagopalan, 2011; Fujimura et al., 2014; Nash, 2013; Gannett, 2014; Bliss 2015, 2018), we
offer a differentiated explanation, one on the level of conceptual differences: Geneticists car-
rying out these studies seem to have a different understanding of „population“, namely, pri-
marily a genetic one. Moreover, the genetic boundaries are the ones to define the boundaries
of a group – that is, an individual is to be considered a Roma if the individual has biological
ancestors from medieval Romani groups, and/or if genetic findings make the case.
To be sure, there might also be groups that are genetically quite closely related, but share no
common idea of belonging; they may even have been enemies for centuries. But if genetic
boundaries seem to match to some extent with some widely known social boundaries, popu-
lation geneticists would view this as a successfully identified population structure. Social divi-
sion perceived in society and biological difference studied in science seem to explain one a-
nother: That way, they reinforce the conceptual frame in which they both have been produced.
What the geneticists involved in these studies aim to explain are genetic differences between
populations of today – or, more precisely, genetic differences between groups they consider
as populations. Their epistemic object is not the social reality of Roma, but a genetically boun-
ded population that, for them, seems to overlap strongly with the social group of Roma. The
Roma seem to them one of the examples where genetic and social cohesion go hand in hand.
The focus is on the genetic group and any of the social markers used for recruitment are seen
as powerful proxies for that group. That the social groupness of “Roma” could pose problems
for their demarcation of the genetic population might not be a concern.
A significant boundary in genetic population structure can be made plausible by a historical
explanation, a story for the readers to understand how the group came to be genetically diffe-
rent. The public will find a story plausible that fits their understanding of groups and groupness.
It is more difficult to publish counter-intuitive population histories, in particular if these stories
do not resonate with what human evolutionary genetics text books say (e.g. Jobling 2014:
In addition, the Roma – understood as a genetic population – are perceived to have common
ancestry components supposedly making them distinguishable from „Europeans“.
focus of most these genetic studies is on the Indian origin. The authors aim at demonstrating
genetic continuity between the group that migrated from India to Europe in medieval times and
today’s Roma. Some authors call the group that departed from India „proto-Romani“. Romani
who have been living in Europe at all times ever since are all viewed as descendants of these
„proto-Romani“. Social integration, or any other social situation leading to „genetic mixing“,
makes the task for the geneticists more difficult. However, if one assumes that „mixing“ has
been negligible and that it has not eroded the group coherence as such – or that at least the
core part of that group has remained intact and „unmixed“ – then, „mixing“ can presumably be
controlled for in a model:
‘The basic common model considers a proto-Romani population that splits from a given population of
the Indian subcontinent (Pakistan and India) and can admix with a hypothetical (unsampled) Central
Asian, or Near or Middle Eastern population, as well as with non-Romani Europeans after arriving in
Europe.’ (Mendizabal 2012, 2345)
When the authors of these population genetic studies look at DNA data collected from living
Roma individuals over the past thirty years, what they understand to look at is a proxy towards
a hypothetical, ancestral „proto-Romani” population (see the quote above). Studying Roma
migration routes over Europe, they are interested in dispersal and subfounder effects in single
national contexts. In each case, they look for individuals who would most closely resemble the
medieval Indian-Romani arrivals in that country, coming from Eastern Europe or the Near East.
Of course, the most revealing markers for this would again be „Indian signatures“ (Gomez-
Carballa et al., 2013), marking those who are descendants of the first Romani arrivals in the
However, finding “Indian signatures” does not mean that large or significant parts of the ge-
nome are resembling genomes of people from India, rather than those of others. Neither do
readers learn about the results from the most powerful markers. In fact, in studies considering
“mixture”, an overwhelmingly large proportion of the genome of an average person sampled
as Roma does not resemble “Indian signatures”, but “European” or other “signatures”. Yet, in
the research questions, research designs and interpretation of the results, in reports on their
findings in the conclusions or in press releases, the authors emphasize and focus on “Indian
signatures” above all others (Mendizabal 2012; Melegh 2017), whereas “signatures” from other
regions are mentioned only briefly and marginally. For further research, what seems most pro-
mising to them is to explore the Indian ancestry further.
Mendizabal et al. (2012), for example, speak of admixture along the way from India to Europe
and within Europe, but this seems secondary and negligible for the authors. Because their
priority is to find a more specified regional origin of the Roma population within India, the au-
thors match their Roma DNA data with DNA data from different areas in India. The applied
computer program suggests that the „parental population“ came from Kashmir or north/ north-
west India. As the authors admit, they are struggling with a lack of samples precisely from that
region. Hence, ‘[...] future dedicated sampling across linguistic and social strata in this Indian
subregion is needed to identify the actual parental population of the European Romani from
that Indian subregion’ (Mendizabal et al., 2012:2347).
To be sure, we do not maintain that there was no migration from India to Europe in medieval
times. We also do not simply reject the claim that medieval migrants from India are among the
ancestors of some of the Roma living in Europe today. However, this is only one source of
their ancestry, and even not the dominant one; similar to other populations, Roma have multi-
ple sources of ancestry. Their genetic ancestry is manifold and complex and does not allow
for a single historical narrative. Furthermore, in India and Europe, in between these regions
and around the world, many people might have comparable ancestry but are not considered
The example of the Bulgarian Romani
Mendizabal et al. (2012) aimed at determining the temporal sequence of arrivals in various
countries. Therefore, they ‘attempted to identify the current Romani population that is
genetically the most similar to the putative founder population of all European Romani groups’
(Mendizabal et al., 2012:2345), that is, the „actual parental population“ in India (Mendizabal et
al., 2012: 2346). As a result, the authors state that Bulgarian Romani seem to be most similar.
In this study, the estimated 750.000 Bulgarian Roma (roughly 10% of the Bulgarian population)
are represented by 18 individuals. The sampling information in the supplement is more detailed
for the Bulgarians than for all other Roma populations. ‘The Bulgarian Romani samples were
collected from the two major groups around the country: Wallachian and Yerli, and some of
their subgroups (Dassikane, Horohane, Kaldarashi, Kopanari and Reshetari).’ (Mendizabal
2012, supplement, 1). No reference is given for the DNA data from Bulgaria, but in the
acknowledgements, Ivailov Tournev is thanked for the „recruitment of Romani samples from
In 2016, Ivailov Tournev published an account of his two decades long sampling endeavours
in Bulgaria. The text’s subtitle reads „Neuromuscular disorders in Roma (Gypsies) – collabo-
rative studies, epidemiology, community-based carrier testing program and social activities
(Tournev, 2016). Starting in 1994, Tournev and his team collected the most detailed informa-
tion on Bulgarian Roma, in cooperation with ethnographers:
‘The main sources for collecting the epidemiological information were the field work studies. A neuro-
logical screening of hereditary neuromuscular disorders using the method “door to door” was perfor-
med in 2500 towns and villages (having predominantly Roma population) in the country. Those towns
and villages where pedigrees with hereditary neuromuscular disorders resided were visited from 2 to
10 times with the aim of collecting pedigree information, blood samples for genetic studies and
neurological examination of the patients. The field work studies covered a period of 20 years (1994–
2014). 97% of the Roma population living in compact Gypsy quarters was encompassed. An eth-
nographical and linguistic examination was performed in every quarter using a semi-standard interview
for identification of various Roma groups and subgroups. In those towns and villages where Roma
people live in several quarters or more than one Roma group resides, the ethnographical and linguistic
examinations were performed in every quarter and in every separate group. The field studies were
performed with the support of the local ‚Roma’ foundations and Roma health mediators from different
parts of the country.“ (Tournev 2016, 99)‘.
If 97% of the Roma population living in „compact quarters“ were encompassed, as Tournev
acknowledges in this quote, one could perhaps call this a „genetic census“. Tournev gives a
detailed account of the terrible living conditions of most Bulgarian Roma, including their segre-
gation from the major society. More so than in other countries, Bulgarian Roma live in isolated
settlements, and Tournev is explicit about the majority’s role in creating genetic isolates by
Hence, data on genetic isolation is rich in Bulgaria, up to an extent that allows for a more
differentiated view on single small groups: some of them have been more isolated than others,
for different reasons. The results speak for a complex substructure, more complex and more
variegated perhaps than elsewhere. How the 18 individuals from Mendizabal’s study were re-
cruited, or how their data was selected from many thousands, is an open question.
Tournev’s text is rich in information, particularly about neuromuscular genetic diseases among
Bulgarian Roma. A large number of studies were published from the collected data, both in
medical and population genetics, Luba Kalaydijeva being the most productive main and senior
author. Kalaydijeva involved two ethnologists in the research: one of their tasks was to work
out a classification for the different Roma groups. The instances of data sharing with other
teams across Europe and the world are numerous; there is hardly any population genetic study
on Roma that does not include data from the Bulgarian large-scale collection of Gresham et al
(2001). Many studies have built „European Roma” samples by adding a small number of
samples from other countries to the already existing Bulgarian data.
Accordingly, DNA data from Bulgarian Romani provide the most detailed, variegated and rich
data collection, and it has come to be interpreted and used in many different ways. It is hard
to imagine that Kalaydijeva, who has time and again emphasized the great and complex ge-
netic diversity of Bulgarian Romani in a number of publications, would agree to represent Bul-
garian Roma with 18 individuals from a low number of subgroups. Also, if no rationale is given
for the selection of those 18 individuals from a supposedly huge number of data sets, it is hard
to tell what these individuals stand for.
After all, the samples from Bulgarian Romani of today, or of 1994, cannot stand for any other
national, transnational or regional group. Neither can they represent the „ancestral“ Bulgarian
Roma population, the one that supposedly arrived in medieval times in the region that is today
the nation state of Bulgaria: If Bulgarian Roma have been isolated in many small communities,
in varying constellations over the centuries, their current genetic diversity can be shaped by
complex drift processes and is not the result of neatly definable, linear historical processes.
The underlying history is inextricably complex and locally contingent.
Merging datasets across countries
A number of genetic studies attempts at spanning Europe, and for that purpose, they merge,
share, transfer and reduce data of different provenance, collected with different sampling
strategies. What does such a merging strategy imply? And what can such a merged sample
In each medieval principality, in each Early Modern Times empire, and in each modern nation
state, foreigners – including those whom geneticists regard the ancestors of today’s Roma –
were treated, named and registered differently. Over the centuries, with political upheavals and
wars, state borders and registration procedures changed and shifted, including and excluding
minorities in different ways. Yet assuming genetic continuity, the involved population geneti-
cists tend to ignore such complexities, extract samples from a number of countries and look
for patterns on maps of Europe. Their aim is to investigate Roma migration routes along with
From the ca. 75 population genetic studies published after 1990, two might suffice to demon-
strate this attempt at representing European Roma with a merged data set. Mendizabal et al.
(2012) include a map of Europe with the nation states of today. It shows how many Roma live
in each country, how many individuals were sampled per country, and at what historical date
Roma were first mentioned in that country. The overall data set of 152 individuals, collected
from 13 Roma groups in 13 countries, includes small samples, such as 7 individuals from
Wales (but these were excluded from some of the analyses because they seemed too admi-
xed), 10 from Spain, 9 from Portugal, 8 from Lithuania or 7 from Estonia; the largest sample
contains the 18 from Bulgaria, followed by 14 from Romania. The sample sizes stand in no
correlation to the size of the respective Roma population, nor to their proportion of the overall
national population. From following up the numbers, one learns that from each national sample
some individuals were excluded due to familial relationships.
No sample, neither in this nor in any other population genetic study on Roma, comes from
Germany, France, Italy, Belgium, the Netherlands, Poland, Austria or Switzerland. This is not
due to a small Roma population: Some of these countries have much larger Roma populations
than Lithuania, Portugal or Ukraine. No reason is given for this sampling decision in any of the
However, in one study (Morar et al., 2004), titled “Mutation history of the Roma/Gypsies”,
samples from Germany, France and Italy are mentioned, yet not marked as such in the data
analysis. In order to account for this, we need to first look into general sampling decisions: „In
this study“, the authors state in the abstract,
‘[…] we have used five disease loci harboring private Gypsy mutations to examine some missing
historical parameters and current structure. We analyzed the frequency distribution of the five
mutations in 832–1,363 unrelated controls, representing 14 Gypsy populations, and the diversification
of chromosomal haplotypes in 501 members of affected families.’ (Morar 2004: 596).
Representing a population by mutations presents further problems for representativity in a
population history study (“private Gypsy mutations” will be discussed in more detailed below).
If a third of all recruited individuals carry one out of five mutations understood by geneticists
as „private Gypsy mutations“, their data has likely either been collected in large scale screening
programs or in doctors’ offices and clinics.
The study comes from Kalaydijeva’s lab; Bulgarian samples make up a large part of the data-
set. „Self reported identity“ (598) in terms of „historical and cultural-anthropological classifica-
tions“ was used in order to sort the individuals into group categories. In total, a table states,
1175 individuals from 14 „Gypsy groups“ were sampled and their data was assembled in three
large „migrational/linguistic categories“: 419 individuals labeled „Balkan“; 366 labeled „Vlax“;
and 390 labeled „Western European“ (Morar 2004: 598).
While the former two categories are subdivided into groups with local or professional names
(„Musicians“, „Kalajdjii“), the latter, „Western European“, is subdivided into national groups:
Hungarian (283 individuals), Lithuanian (20), Spanish (87) – in total, as stated above, 390
However, the reader also learns that
‘individuals from Hungary, Slovenia, the Czech Republic, Lithuania, Germany, France, Italy, Spain,
and Portugal, for whom information on Gypsy group identity was unavailable, partial, or contradictory,
were classified together as ‚western European.’ (Morar 2004: 598)
This raises further questions: Were the individuals from Germany, France and Italy put in the
Hungarian, the Lithuanian or the Spanish sample? Based on what significant criteria? How
many were there? Were they approached as mutation carriers in health care facilities, and
then asked to identify as „Gypsies“? Or were they approached as „Gypsies“, and if so, under
which sampling scheme? What does it say about „self reported identity“ (598) if information on
„Gypsy group identity“ was „unavailable, partial or contradictory“? Why was this the case for
all individuals from these three countries? In none of these countries, census data collection
includes ethnicity. Did this play a role? – Or, one could also ask: Would a German patient self-
identify as „Gypsy“? Would a German ethics board be comfortable with approving of applica-
tions for projects involving „Gypsies“? And would a German self-reported Roma approve of
being sorted into a Hungarian, Lithuanian or Spanish „Gypsy“ sample? And where are these
samples and datasets today?
As already said, such studies merge, share, transfer and reduce data of different provenance
and collected with different sampling strategies. What the recruited individuals probably all
have in common is the fact that they are not well integrated into their nation state’s societies.
The sampling practices in many of these studies seem to have favored „societally deprived
groups“, serving as a proxy to „genetic isolates“. If this rationale underlies sampling decisions,
it is hardly surprising that most research results confirm genetic isolation. If research teams
simply use data sampled by another team without questioning the sampling strategy, they will
reproduce the first team’s results and biases (as for example Iovita and Schurr, 2004, who take
a differentiated approach but confirm isolation on the basis of shared data).
One important sampling rationale seems to have been maximizing the likelihood of genetic
variants that have also been found in India. One can perhaps increase that probability by fo-
cusing on people who have a certain genetic disease, who look Indian, speak Romani, and
identify as Roma. But the merged sample cannot represent „European Roma“; it represents
isolated groups, families or neighborhoods that are labeled „Gypsy“ or „Roma“, and a certain
proportion of the recruited individuals may even identify as Roma. How many, and under what
circumstances, remains unknown.
„Private Gypsy mutations“
The Roma are described as an isolate in which various “private mutations” for all kinds of
diseases, especially neuromuscular diseases, have “accumulated”, more than in any other
group. Brubaker argues that „rare variants“ would „not be definitive of any socially defined
racial category“ (2015:82).
The term “private mutation” is viewed as a purely technical term by human geneticists: It means
to denote any novel mutation that has been found in a narrow social group, for example, in a
family, between relatives or in an isolated rural settlement. Speaking of „private Gypsy muta-
tions“, however, as many of the medical genetic studies on Roma do, gives the term a different
resonance: “private”, in this combination, takes on a more metaphorical meaning and also
invites an interpretation of mutation carriers as belonging to that ethnic group, or at least as
having ancestors from that group.
And indeed, in some studies, the ethnic attribute “Roma” is being assigned on the basis of
disease mutations even though the patients have self-declared a different ethnicity: For
example, the authors write about a patient with Hereditary Motor and Sensory Neuropathy
(HMSN), a rare genetic disease attributed to Roma: ‘[…] the family was not aware of their
Roma ancestor’. (Brožková et al. 2016:2) Neither do the authors consider that the mutation
could also occur in non-Roma individuals. Similarly, Colomer et al. (2000: 578) examine three
Spanish patients with HMSN Lom disease and argue that the patients ‘belong to a non-
consanguineous family with Gypsy background although they were unaware of the details of
their ancestry’. Some deleterious genetic mutations are referred to as “Gypsy mutations” even
though ‘mutation screening in 359 Eastern-European Gypsies failed to identify any carriers’
(Barca-Tierno et al., 2011: 1218). Speaking of „private Gypsy mutations” also implies that the
Roma are the population in which that mutation first emerged; or, that they are the source
population of a mutation brought over from India to Europe. From the 220 biomedical studies
reviewed, only a handful mentions that a rare disease could have been introduced into Roma
communities from outside.
This general depiction of Roma is, of course, misleading. By far not all mutations labeled „pri-
vate Gypsy“ or „private Roma“ are shown to be more prevalent in India, or to be confined to
Seen from the perspective of human genetics and evolution, mutations
can also first occur in a surrounding majority population; a subsequent ghettoization of „unde-
sired“ population groups – disrespectfully excluded as foreigners, poor, diseased, disabled,
deviant – can lead to the amplification of mutations in a societally isolated community. In spite
of ethnic and social complexity, such a community may nevertheless be labeled „Gypsy“ or
„Roma“ by the majority in a society. Writing the history of Roma migration routes by means of
mutations, then, is a representational challenge.
A disagreement on representativity
As we have demonstrated, genetic studies that claim to have produced research results about
„the Roma“ or „the Gypsies“ cannot formally represent the overall Roma population. The
sampling schemes and practices do not satisfy the standards of representativity, neither in the
social sciences, nor in some branches of the life sciences. The authors of these studies would
either have to state precisely what the sampling decisions are aimed at and what the sample
then represents: for example, people living in most isolated places, or people who might have
Indian ancestry, or people who carry a mutation for one or more genetic diseases, or all of
these, if applicable. Or, if the authors stick to representing „all European Roma“, they would
have to adopt a whole new sampling scheme, one that takes on the challenge of representing
a relatively large and „superdiverse“ group. This could not be done without extensive discus-
sion with social sciences and humanities scholars, and, even more importantly, not without
active involvement of Roma. Whoever comes to represent Roma in such a participative endea-
vor is not to be foreseen. But with other vulnerable populations considered interesting for ge-
nomics, such participative options are already underway (e.g. Kowal and Radin, 2015).!
However, we believe that a closer look at the heart of the disagreement about representativity
is warranted. The conflation of the population geneticists’ research objects – genetically boun-
ded populations – and social or political population labels runs deeper and is more widespread
than only in studies on isolated populations. A vast literature addresses this topic (Fujimura
and Rajagopalan, 2011; Fujimura et al., 2014; Nash, 2013; Fortier, 2012; Koenig, Lee and
Richardson (eds.), 2008; Schramm, Skinner and Rottenburg R (eds.), 2012). The four-grand-
parents-sampling approach is practiced widely and documented in recruitment guidelines and
consent sheets of, for example, the 1000 Genomes Project. Nash describes how the “People
of the British Isles Project”, by aiming at recruiting people with ancestry that fit the rationale,
focused on rural, “rooted”, and white (Nash, 2013:201).
As Nash (2013) warns, one needs to watch out carefully for the omissions such sampling
schemes entail. Of course, such a sample cannot represent any population, neither a historical
nor a present one. It only represents a certain portion of a population that practices a specific
social behaviour. At many times, in many places, many people did not live close to where their
grandparents lived, but migrated or were displaced, or practiced mobile or commuting life
styles. Excluding these people from sampling schemes means to exclude specific parts of a
population, namely those who are willing and able to identify under the sampling criteria.
Exclusions in the lab, if an individual DNA data set fails to meet the expectations, are also not
restricted to the Roma studies. The famous Novembre et al. study of 2008, claiming the repre-
sentation of „European population structure“, stated: „We applied various stringency criteria to
avoid sampling individuals from outside of Europe, to create more even sample sizes across
Europe, to exclude individuals with grandparental ancestry from more than [one] location, and
to avoid potential complications of SNPs in high linkage disequilibrium.“ And: „These numbers
exclude individuals who reported mixed grandparental ancestry, who are typically assigned to
locations between those expected from their grandparental origins (results not shown)“ (No-
vembre et al., 2008:98).
If such data cleansing operations are implemented, the scope of the claim cannot extend to
represent a living population, e.g all Europeans, or European population structure in general.
Such samples only represent people whose ancestors all come from the same group or region;
regions where marriage (or reproduction) between partners from two geographically distant
regions is rare – as in rural regions, for example – come to be better represented than others.
If, however, population geneticists stick to their goal of representing certain populations as
groups genetically bounded over long time periods, then the small sample sizes they deem
sufficient require further thought, if tapping into population substructure is to be avoided.
Contextualizing our case study in the critical interdisciplinary literature on human genetic vari-
ation research, we note that it provides an extreme case of what those specialists have warned
against, in both ethical-political as well as in conceptual-methodological perspective. It is an
extreme case of problematic extrapolation, as Fujimura et al. (2014:215) have described:
‘[…] human geneticists make decisions about which subset of individuals to use to “represent” a “race”
or “national group” in their sampling procedures and in their cluster analysis. The subsets they use are
obviously extremely small compared to the number of individuals who identify with that race or
nationality label. They thus extrapolate their results from a small number of individuals to make
inferences about a vastly larger number of individuals who self-identify with the same race or nationality
label and whose genetics have not been studied.’
It is a demonstrative case of what Catherine Nash (2013:203) has described as a problematic
trend in which “continental and regional ancestries” are ‘genetically identified and described as
bounded natural categories.’
Our case study on the “new genetics” (Schramm, Skinner and Rottenburg, 2012), focusing on
a strand of research on Roma, adds to the growing STS literature criticizing essentialist and
racialized versions of grouping humans through genetic accounts. STS scholars have focused
on particularities of geneticization of minority identities and its interplay with social and political
understandings of race and ethnicity (e.g. Egorova, 2010; Kowal, Radin and Reardon 2013;
Kyllingstad 2012; Tallbear, 2013; Wade et al 2014). Yet, so far, Roma were omitted from this
critical perspective, even though genetic studies on Roma show stunning conceptual continu-
ities since their inception almost a century ago (Lipphardt, 2016).
For our own perspective on groupness, we follow Hacking’s (1999) “dynamic nomi-
nalism” which asserts that categorization and labeling are constitutive for a group’s social for-
mation and dynamics. To be sure, such a position is not one of a naïve constructivism denying
groups as real entities (Hacking, 1999); as groups are socially defined, historically contingent
and changing, codified into legal systems, embedded in administrative and techno-scientific
assemblages, self-internalized or rejected and ubiquitous objects of everyday politics, they are
indeed real and consequential.
From an STS perspective, “populations” (the postwar conceptual replacement of “race” in hu-
man population genetics) are not natural kinds; their genetic profiles follow from the models
and technologies used to measure similarity and difference, as well as from the assumptions
and decisions taken along the research process. Sampling strategies, genetic markers and
reference groups chosen for comparisons, all these may shape the ethnic groups which
genetic work assumes to merely describe (M’Charek, 2005).!The alignment of genetic data to
socio-political relevant racial and ethnic categories appears less an effect of data aggregation,
it rather reflects practical, pragmatic, conceptual, methodological, theoretical and socio-politi-
cally relevant choices the researchers take (Bolnick, 2008; Duster, 2015; Fullwiley, 2008; Gan-
nett, 2003; Lee et al., 2001; M’charek, 2005). As STS scholars demonstrated, genetic classifi-
cation and social order are not separate endeavors but they are co-produced in entangled
projects of race and ethnicity (Reardon, 2005; Tallbear, 2013). Noteworthily, in some cases,
genetic research targeting minority groups is carried out by geneticists who self-identify as
members of these groups for whom they seek social justice, political and medical attention
(Fullwiley 2008, Bliss, 2015). However, while genetic research on minority groups can have
empowering effects, the case of the Roma is just one of many cases in which it has no such
effect, but the potential to feed into socially divisive processes (Egorova, 2014; Kent et al.,
2015; Santos et al 2014; Wade et al 2017).
The genetic studies we have examined could be the starting point for further questions and
debates, ranging from methodological and conceptual ones to ethical and societal ones. For
Roma, STS or humanities scholars could discuss a wealth of questions that have been raised
with regard to other populations in the past years; to name but a few, the branding and com-
modification of unique populations (Reardon, 2017; Tupasela, 2016; Tarkkala and Tupasela,
2018), or the creation of biovalue and biocapital around the many existing cell lines and bio-
banks with tissue samples from these groups (Birch and Tyfield, 2013), or the “biomedicalisa-
tion” (Clarke, 2014) of a much discriminated minority, its history and corporeality, or how the
genetic framing of Roma will transform the policies of national states or the European Union.
One could also contextualize this case study within the work on other isolated populations,
their specific ethical challenges (Mascalzoni et al., 2010), and how they have been met with
more sensitivity in other cases (Floersch, Longhofer and Latta, 1997; Lindee, 2005). We be-
lieve that the case of the Roma offers new insights and perspectives for this strand of research;
in turn, we also hope that STS has something to offer to the Roma.
Clearly, many of the problematic aspects we have pointed out are not specific to the research
in Roma communities, but of a more general scope in biomedical research: How to best collect,
curate and share data, how to best protect the privacy of donors, how to gain and hold the trust
of patients and donors, how to deal with issues of property, how to handle attrition and docu-
mentation, how to speak to and about patients or test subjects in sensitive and responsible
ways, how to involve them and how to guarantee benefit sharing, how to meet the reproduci-
bility crisis: these are issues widely discussed in research institutions and ethics committees
as well. All large-scale research projects involving humans have to grapple with these prob-
Yet for the genetic studies about Roma, or other such vulnerable groups, we note three speci-
ficities: First, in a regular national health study in, say, Germany, a participant does not, or at
least not to the same extent, risk to be stigmatized on the basis of their nationality or „origin“
or „ancestry“. Second, their privacy risk is a generally given – no data is perfectly safe – but
not a heightened one: Information on their home village will not be published along with their
genetic data. Thirdly, in the studies on Roma, many of the named problematic issues appear
at once and sometimes in cumulated ways.
A population that is represented along these lines is at heightened risk: not only for stigmati-
sation and privacy violation, but also for application errors in medical and forensic genetics.
Representativity is hence a matter of concern: for the scientists who want to avoid flawed re-
search results and negative consequences thereof for the contributing volunteers; for the
group, in this case the Roma, who have a right to benefit from scientific progress without being
exposed to further harm; and for social sciences and humanities scholars writing about the
risks of essentialization.
What can be said about the usefulness of this strand of genetic research? What is its potential?
Who should have an interest in genetic studies of vulnerable groups?
Apart from the obvious benefits for the scientists involved – interesting research questions,
topics for PhD theses, data to draw on – and the benefits for society – increased understanding
of genetic diseases, development of new therapies – the Roma themselves could benefit from
this research if that was a built-in goal on the side of researchers, health care providers and
public support programs. Inclusion of Roma in the sense of “patient empowerment” by provi-
ding access to all the data dealing with those disorders that are particularly prevalent in Roma
groups will allow them to use all the new findings, e.g. within the scope of genetic diagnostics.
This, of course, requires free access to the respective health services and increased resources
for supporting these families and communities. Furthermore, for an individual to seek the ad-
vice of a genetic counselor, trust is a crucial prerequisite that their data being handled with
While at least theoretically, Roma are among those who could benefit from medical therapies
developed on the basis of their data, a geneticist who has studied DNA data from Roma ac-
knowledges:!“Most studies have remained in the realm of scientific exploration, away from the
health needs of the Roma” (Kalaydjieva, Gresham and Calafell, 2001:2).
With regard to forensic genetics, Roma could benefit if the databases used for frequency ass-
essment were not biased against them (Lipphardt and Surdu, submitted). Roma could also
have an interest in learning about their own history from population history genetics; however,
the narrative currently dominating in these studies risks essentialising, stigmatizing and exclu-
ding effects, and other scenarios that could perhaps also be backed by the data are not con-
Providing ones’ data in the hope of becoming a recipient of benefit rather than of harm is a
matter of trust. Trust in genomics is obviously not easy for vulnerable and isolated groups: not
just because they are vulnerable, but because their privacy and stigmatization risks are much
higher than those of majority populations, and because geneticists might be so fascinated by
a specific population that they lose the communitie’s perspective out of sight. Hence we do not
share the optimism with regard to the de-essentializing power of genomics: As long as vulne-
rable groups are exposed to the kind of representation we have observed, there is still a lot of
(inter- and transdisciplinary) work to be done. For example, it takes extended efforts to ensure
vulnerable groups that they will not be misrepresented.
One could ask a wealth of historical questions about overlaps and continuities between race
science and population genetics. As the sampling schemes studies target an „unmixed“ core
proportion of an otherwise much more fuzzy population, the authors seem to assume and favor
an ideal type of representative individual for that population. Whether the conceptual justifica-
tion draws on the term „race“ or on the term „population“, whether the data consists of anthro-
pometric measurements or DNA data: This is typological thinking in the paradigm of population
It would also be legitimate and necessary, in our view, to pose the question how this research
could relate to racist discrimination of Roma. More urgent, perhaps, is a discussion of genetic
essentialism, or genetic determinism, and the many possible implications and consequences
for those who are labeled Roma. Applications of genetic technologies are to be expected – if
they are not already in place – in citizenship issues, in law enforcement, and on the biomedical
market. There is a vast abundance of praising statements in these studies about how valuable
a tool, how unique a resource the Roma are for the geneticists, for the investigation of rare
and complex diseases, for forensic technologies; and, one might add, also for patient organi-
zations, pharma companies, and biomedical investment.
For the sake of focus, we have deliberately not raised these questions here, but we think they
need to be discussed. Though we have very selectively focused on questions of representati-
vity and sampling, we emphasize that many of those methodological themes are tightly con-
nected to questions of ethics, privacy, justice, and benefits. We do not mean to suggest that
the practices behind these studies per se have an unethical bias. There might be indeed a
respectful and supportive engagement with individuals in these studies. As we said in the be-
ginning, in spite of the manifold problems we have pointed out, we do note a growing aware-
ness in the genetic studies over the past twenty years towards more regular ethical procedures,
caution in terminology, and self-identification as a basic principle in recruitment.
Perhaps, in some publications, the nebulous silence around sampling is in fact an expression
of care or concern about how to protect individuals from harm, or from being exposed. How-
ever, we argue that silence, in this case, contributes to misrepresentation. Not addressing the
challenges does not reduce the harm, it rather adds to the risk of inflicting more harm.
While isolated groups are among the most promising ones for genetic research, these are
oftentimes the most vulnerable groups. Recognizing that such communities, families and
neighborhoods exist does not mean to accept that “all European Roma” can be represented
by samples from those places. Even much more importantly, though, recognizing that such
communities, families and neighborhoods exist under most precarious conditions, comes with
incredible responsibilities. How to engage with these communities and individuals, how to sup-
port them, how to protect them, how to represent them fairly and correctly? Which way – ac-
cording to the Roma themselves – would be the best way to find out about their genetic
„history”? Is it in their interest at all to know about it? And is it of interest for them to have data
on genetic predispositions to specific diseases possibly enriched in the genomes of some
members of some families or communities? Whom else would they agree to share such know-
ledge with? What are the risks and downsides of knowing?
Increased awareness and new institutional forms of recognition are needed as the challenges
of representing vulnerable populations have not yet gained the full attention they deserve.
We are grateful for a DFG grant on our former project The Genetic Construction of Roma Groupness
and its Interdisciplinary Entanglements (2016-2019). It was during this project that we built up a large
part of our database with genetic studies on Roma and we developed contextual information relevant to
our analysis. Many thanks to Anja Reuss and her colleagues from the Zentralrat der Deutschen Sinti
und Roma, and to Frank Reuter from the Forschungsstelle für Antiziganismus. We thank Yves Moreau
for the insightful conversations on ethical aspects related to genetic data collection. We thank Leon
Kokkoliadis and Eric Llaveria Caselles, student assistants at Max Planck Institute for the History of
Science, who contributed by collecting genetic papers focusing on Roma at the beginning of our project
(2014-2015). We thank Cedric Bradbury and Sarah Weitz, student assistants at University College Frei-
burg, who helped with collecting, sorting and organizing the database of genetic studies of Roma (2016-
2020). We are grateful to Silvia Stößer who helped us with various administrative tasks. We had inspiring
discussions about genetic studies on Roma with Anna Lipphardt, Nicholas Buchanan, Amade M'charek,
Huub van Baar, Ildikó Plájás, and many colleagues at Universit Freiburg and in other places. Our un-
derstanding of the genetic studies benefitted from FRIAS funding during 2018/2019, and particularly
from very fruitful and insightful encounters with our colleagues from the interdisciplinary project group at
Freiburg Institute for Advanced Studies (FRIAS): Anna Köttgen, Anne-Christine Mupepele, Peter Pfaffel-
huber and Fabian Staubach. We are grateful for numerous conversations with Matthias Wienroth, Sa-
bine Lutz-Bonengel, Ulrike Schmidt, Gudrun Rappold, Till Andlauer, Tino Plümecke and Thomas Schul-
ze on this topic. We have received helpful comments on this text from Denise Syndercomb Court, Nils
Ellebrecht, Tino Plümecke, Nicholas Buchanan, Anna Lipphardt and our students of the seminar “Ge-
netic studies of vulnerable populations”. We are grateful for advice from the DFG-Senatskommission für
Grundsatzfragen in der Genforschung, in particular from Brigitte Schlegelberger.
About I (2012). Underclass Gypsies: An Historical Approach on Categorization and
Exclusion in France in the Nineteenth and Twentieth Centuries. In M. Stewart (Ed.),
The Gypsy 'Menace': Populism and the New Anti-Gypsy Politics (pp.95-117).
Achim V (1998) The Roma in Romanian History. Budapest: Central European
Acton TA (2015) Scientific racism, popular racism and the discourse of the Gypsy Lore
Society. Ethnic and Racial Studies 39(7): 1187-1204.
Angelicheva D et al. (1999) Congenital cataracts facial dysmorphism neuropathy
(CCFDN) syndrome: a novel developmental disorder in Gypsies maps to
18qter. European Journal of Human Genetics 7(5): 560-566.
Angelicheva D et al. (1997) Cystic fibrosis mutations and associated haplotypes in
Bulgaria–a comparative population genetic study." Human Genetics 99.4: 513-520.
Barca-Tierno V et al. (2011) Identification of a Gypsy SHOX mutation (p. A170P) in
Léri-Weill dyschondrosteosis and Langer mesomelic dysplasia. European Journal
of Human Genetics 19(12): 1218-1225.
Benjamin R (2009) A lab of their own: Genomic sovereignty as postcolonial science
policy. Policy and Society 28(4): 341-355.
Berescu C (2019) How Many Ghettos Can We Count? Identifying Roma Neighbour-
hoods in Romanian Municipalities. In Vincze E et al. (Eds.) Racialized Labour in
Romania. Cham: Palgrave Macmillan, pp. 179-205.
Birch K and Tyfield D (2013) Theorizing the Bioeconomy: Biovalue, Biocapital, Bioeco-
nomies or… What? Science, Technology and Human Values 38.3 (2013): 299-327.
Bliss C (2015) Science and Struggle: Emerging Forms of Race and Activism in the
Genomic Era. The ANNALS of the American Academy of Political and Social
Science 661(1): 86-108.
Bliss C (2018) Social by nature: the promise and peril of sociogenomics. Stanford:
Stanford University Press.
Bolnick DA (2008) Individual ancestry inference and the reification of race as a
biological phenomenon. In Koenig LA, Lee SSJ and Richardson ASS (eds.),
Revisiting race in a genomic age, pp. 70-85. Rutgers University Press.
Bogdal KM (2011) Europa erfindet die Zigeuner: Eine Geschichte von Faszination und
Verachtung. Berlin: Suhrkamp.
Brophy L (2005) Gypsies the focus to genetic cure. UWAnews 24 (6), May 16.
Brožková D et al. (2016) HMSN Lom in 12 Czech patients, with one unusual case due
to uniparental isodisomy of chromosome 8. Journal of Human Genetics 62 (3): 431.
Brubaker R (2015) Grounds for difference. Harvard University Press.
Brubaker R (2004) Ethnicity without groups. Harvard University Press.
Castella M et al. (2011) Origin, functional role, and clinical impact of Fanconi anemia
FANCA mutations. Blood 117.14: 3759-3769.
Cazacu C et al. (2013) Personalized medicine for whom? The situation of Romani
people. Revista Romana de Bioetica, 11(3): 84-91.
Cell Press (2012) European Romani exodus began 1,500 years ago, DNA evidence
shows. EurekAlert, 6 December 2012.
Clarke VA (1973) Genetic factors in some British Gypsies. Genetic Variation in Britain.
In Roberts DF and Sunderland E (Eds.). Genetic Variation in Britain (pp.181-195).
London: Taylor & Francis.
Clarke AE (2014) Biomedicalization. The Wiley Blackwell encyclopedia of health,
illness, behavior, and society, pp.137-142.
Colomer J et al. (2000) Hereditary motor and sensory neuropathy-Lom (HMSNL) in a
Spanish family: clinical, electrophysiological, pathological and genetic studies.
Neuromuscular Disorders 10(8): 578-583.
Council of Europe (CoE) (2012) Descriptive glossary of terms relating to Roma issues.
on%2018%20May%202012.pdf (accessed on 12.11.2020).
de Pablo R et al. (1992) Distribution of HLA antigens in Spanish Gypsies: a
comparative study. Tissue Antigens 40 (4): 187-196.
Desviat LR, Pérez B and Ugarte M (1997) Phenylketonuria in Spanish Gypsies:
prevalence of the IVS10nt546 mutation on haplotype 34. Human Mutation 9(1): 66-
Donert C (2008).‘The struggle for the soul of the Gypsy’: marginality and mass
mobilization in Stalinist Czechoslovakia. Social History 33(2): 123-144.
Duster T (2015) A post-genomic surprise. The molecular reinscription of race in
science, law and medicine. The British Journal of Sociology 66(1): 1-27.
Egorova Y (2014) 'Theorizing "Jewish genetics": DNA, culture, and historical narrative.
In Roth L and Valman N (eds) The Routledge handbook of contemporary Jewish
cultures. Abingdon, Oxon: Routledge, pp. 353-364.
Egorova Y (2010) De/geneticizing caste: population genetic research in South Asia.
Science as Culture 18(4). 417-434:
Ehler E and Vanek D (2017) Forensic genetic analyses in isolated populations with
examples of central European Valachs and Roma. Journal of Forensic and Legal
Medicine 48: 46-52.
Epstein S (2007) Inclusion: The politics of difference in medical research. Chicago:
University of Chicago Press.
ERA-Net for Research Programmes on Rare Diseases website. Session chair: "Exome
and whole genome sequencing studies in rare diseases". Professor Béla Melegh.
Available at: http://www.erare.eu/speaker/professor-b%C3%A9la-melegh
(accessed 27 February 2018).!
Fiatal S et al. (2016) High prevalence of smoking in the roma population seems to have
no genetic background. Nicotine & Tobacco Research 18.12: 2260-2267.
Filhol E (2013) Le contrôle des Tsiganes en France (1912-1969). Paris: KARTHALA
Floersch J, Longhofer J and Latta K (1997) Writing Amish culture into genes: biological
reductionism in a study of manic depression. Culture, Medicine and Psychiatry
Fortier AM (2012) Genetic Indigenisation in ‘The People of the British Isles’. Science
as Culture 21(2): 153-175.
Fraser A M (1992). The Gypsies. Oxford, UK: Blackwell.
Fujimura J H et al. (2014) Clines without classes: How to make sense of human
variation. Sociological Theory 32.3: 208-227.
Fujimura JH and Rajagopalan R (2011) Different differences: The use of ‘genetic an-
cestry’ versus race in biomedical human genetic research. Social Studies of Science
Fullwiley D (2015) Race, genes, power. The British Journal of Sociology 66(1): 36-45.
Fullwiley D (2008) The Biologistical Construction of Race: Admixture'Technology and
the New Genetic Medicine. Social Studies of Science 38(5): 695-735.
Gannett L (2014) Biogeographical ancestry and race. Studies in History and Philoso-
phy of Science Part C: Studies in History and Philosophy of Biological and Biome-
dical Sciences 47: 173-184.
Gannett L (2003) Making populations: Bounding genes in space and in time. Philoso-
phy of Science 70(5): 989-1001.
Gomez-Carballa A et al. (2013) Indian signatures in the westernmost edge of the Euro-
pean romani diaspora: new insight from mitogenomes. PLoS One 8(10): p.e75397.
Gresham D et al. (2001) Origins and divergence of the Roma (gypsies). The American
Journal of Human Genetics 69(6): 1314-1331.
Hacking I (1986) Making Up People. In Heller TL, Sosna M and Wellbery DE (eds.)
Reconstructing Individualism. Stanford: Stanford University Press.
Hacking I (1999) The social construction of what? Harvard University Press.
Hinterberger A (2012) Investing in life, investing in difference: Nations, populations and
genomes. Theory, Culture & Society 29(3): 72-93.
Iovita RP and Schurr TG (2004) Reconstructing the origins and migrations of diasporic
populations: the case of the European Gypsies. American Anthropologist, 106(2):
Jobling M et al. (2014). Human evolutionary genetics: origins, peoples & disease (2nd
edition). Garland Science.
Jonuz E (2009) Stigma Ethnizität. Wie zugewanderte Romafamilien der Ethnisierungs-
falle begegnen. Budrich: Leverkusen.
Kalanin J et al. (1994) Gypsy phenylketonuria: a point mutation of the phenylalanine
hydroxylase gene in Gypsy families from Slovakia. American Journal of Medical Ge-
netics 49.2: 235-239.
Kalaydjieva L, Gresham D and Calafell F (2001) Genetic studies of the Roma (Gyp-
sies): a review. BMC medical genetics 2(1): 2-5.
Kaneva R. et al. (2003) A genome-wide linkage scan of bipolar disorder in three
extended Gypsy families (P164). In Barden N et al. (eds.), Abstracts for the XIth
World Congress of Psychiatric Genetics Quebec City Convention Centre October 4-
8, 2003 Sponsored by the International Society of Psychiatric Genetics. American
Journal of Medical Genetics Part B 112.1.
Kaneva R et al. (2009) Bipolar disorder in the Bulgarian Gypsies: Genetic heterogene-
ity in a young founder population. American Journal of Medical Genetics Part B:
Neuropsychiatric Genetics 150.2: 191-201.
Kent Michael et al. (2015) Building the genomic nation:‘Homo Brasilis’ and the ‘Geno-
ma Mexicano’in comparative cultural perspective. Social Studies of Science 45.6:
Koenig BA, Lee SSJ. and Richardson SS -eds. (2008) Revisiting race in a genomic
age.!Rutgers University Press.
Kósa Z et al. (2015) Prevalence of metabolic syndrome among Roma: a comparative
health examination survey in Hungary. The European Journal of Public Health 25(2):
Kowal E and Radin J ( 2015) Indigenous biospecimen collections and the cryopolitics
of frozen life. Journal of Sociology 51(1): 63-80.
Kowal E, Radin J and Reardon J (2013) Indigenous body parts, mutating temporalities,
and the half-lives of postcolonial technoscience. Social Studies of Science, 43(4),
Kovats M (2013) Integration and the Politicisation of Roma Identity. In A. Biró & W.
Guy (Eds.), From Victimhood to Citizenship. The Path of Roma Integration (pp. 260-
342). Budapest: Kossuth Kiadó.
Kóczé A (2018) Race, migration and neoliberalism: distorted notions of Romani migra-
tion in European public discourses. Social Identities 24(4): 459-473.
Kyllingstad, J. (2012). Norwegian Physical Anthropology and the Idea of a Nordic Mas-
ter Race. Current Anthropology 53(S5):46–56.
Ladányi J and Szelényi I (2001) The social construction of Roma ethnicity in Bulgaria,
Romania and Hungary during market transition. Review of Sociology 7(2): 79-89.
Law I and Kovats M (2018) Rethinking Roma: Identities, Politicisation and New Agen-
Lee SSJ, Mountain J and Koenig BA (2001) The meanings of race in the new
genomics: implications for health disparities research. Yale Journal of Health,
Policy, Law & Ethics 1(1): 33-75.
Leslie S et al. (2015) The fine-scale genetic structure of the British population. Nature
Lindee MS (2005) Moments of truth in genetic medicine. JHU Press.
Lipphardt V and Surdu M (submitted, 2020) DNA Data from Roma in Forensic Genetic
Studies and Databases: Risks and Challenges.!American Journal of Bioethics,
preprint available on bioRxiv, the preprint server for biology.
Lipphardt V (2019) Über den allzu sorglosen Umgang mit population labels und
sampling schemes. Zeitschrift für Geschichte der Wissenschaften, Technik und
Medizin 27(2): 167-178.
Lipphardt V (2016) The Body as a Substrate of Differentiation. Shifting the Focus from
Race Science to Life Scientists' Research on Human Variation. Varia
Historia 33(61): 109-133.
Lipphardt V (2014) “Geographical Distribution Patterns of Various Genes”: Genetic
studies of human variation after 1945. Studies in History and Philosophy of Science
Part C: Studies in History and Philosophy of Biological and Biomedical Sciences 47:
Lipphardt V and Niewöhner J (2007) Producing difference in an age of biosociality.
Biohistorical narratives, standardisation and resistance as translations. Science,
Technology & Innovation Studies 3(1): 45-65.
Lucassen L (1991) The power of definition. Stigmatisation, minoritisation and ethnicity
illustrated by the history of Gypsies in the Netherlands. Netherlands Journal of
Social Sciences 27: 80-91.
Lucassen L (1997) «Harmful tramps» Police professionalization and Gypsies in Ger-
many, 1700-1945. Crime, Histoire & Sociétés/Crime, History & Societies 1(1): 29-
Lucassen L, Willems W and Cottaar AM (1998) Gypsies and other itinerant groups: A
socio-historical approach. London: Palgrave Macmillan.
Mack SJ et al. (2006) 13th International Histocompatibility Workshop Anthropolo-
gy/Human Genetic Diversity Joint Report:!Methods used in the generation and pre-
paration of data for analysis in the 13th International Histocompatibility Workshop.
In Hansen JA (Ed.). Immunobiology of the Human MHC: vol. 1. International Histo-
compatibility Workshop and Conference. Victoria, Ca; Seattle USA, Histocompatibi-
lity Working Group Press, 2006. p. 1-16.
Magyari L et al. (2014) Marked differences of haplotype tagging SNP distribution, lin-
kage, and haplotype profile of IL23 receptor gene in Roma and Hungarian popula-
tion samples. Cytokine 65(2): 148-152.
Martínez-Cruz B et al. (2015) Origins, admixture and founder lineages in European
Roma. European Journal of Human Genetics 24(6): 937-943.
Mascalzoni D et al. (2010) Comparison of participant information and informed consent
forms of five European studies in genetic isolated populations. European Journal of
Human Genetics 18(3): 296- 302.
Mašindová I et al. MARVELD2 (DFNB49) mutations in the hearing impaired central
european roma population-prevalence, clinical impact and the common origin. PloS
one 10.4: 1-13.
Mayall D (2004) Gypsy identities 1500-2000: From Egipcyans and Moon-men to the
ethnic Romany. Routledge.
M'charek A (2005) The Human Genome Diversity Project: an ethnography of scientific
practice. Cambridge University Press.
M’charek A, Schramm K and Skinner D (2014) Topologies of race: Doing territory,
population and identity in Europe. Science, Technology, & Human Values 39(4):
Melegh BI et al. (2017). Refining the South Asian Origin of the Romani people. BMC
genetics 18(1): 82.
Mendizabal I et al. (2013) Implications of population history of European Romani on
genetic susceptibility to disease. Human Heredity 76: 194-200.
Mendizabal I et al. (2012) Reconstructing the population history of European Romani
from genome-wide data. Current Biology 22(24): 2342-2349.
Molnar MZ et al (2012) Roma ethnicity and clinical outcomes in kidney transplant
recipients." International urology and nephrology 44.3: 945-954.
Morar B and Kalaydjieva L (2008) Roma/Gypsies: Footprints in the Genome. In VT
Koven (ed.) Population Genetic Research Progress, pp. 229-244, Nova Biomedical
Morar B et al. (2004) Mutation history of the Roma/Gypsies. American Journal of Hu-
man Genetics 75(4): 596-609.
Moreau Y (2019) Crack down on genomic surveillance. Nature 576: 36-38.
Moorjani P et al. (2013) Reconstructing Roma history from genome-wide data. PloS
one 8(3): e58633.
Munsterhjelm M (2014) Beyond the Line: Violence and the Objectification of the Kari-
tiana Indigenous People as Extreme Other in Forensic Genetics. International
Journal for the Semiotics of Law-Revue internationale de Sémiotique juridique
Myers M (2019) An inheritance of exclusion: Roma education, genetics and the turn to
biosocial solutions. Research in Education, p.0034523719880205.
Nagy K et al. (2017) Distinct Penetrance of Obesity-Associated Susceptibility Alleles
in the Hungarian General and Roma Populations. Obesity Facts 10.5: 444-457.
Nagy M et al. (2007) Searching for the origin of Romanies: Slovakian Romani, Jats of
Haryana and Jat Sikhs Y-STR data in comparison with different Romani popular-
tions. Forensic Science International 169(1): 19-26.
NaKo Gesundheitsstudie website. Available at: https://nako.de/allgemeines/glossar/
(accessed on 11.03.2020).
Novokmet N and Pavčec Z (2007) Genetic polymorphisms of 15 AmpFlSTR identifiler
loci in Romani population from Northwestern Croatia. Forensic Science International
Nash C (2013) Genome geographies: Mapping national ancestry and diversity in
human population genetics. Transactions of the Institute of British Geographers
Nature 492, 156 (2012). Romani have Indian ancestry.
Novembre J et al. (2008) Genes mirror geography within Europe. Nature 456.7218:
Okely J (1983) The Traveller-Gypsies. Cambridge: Cambridge University Press.
Ong A (2016) Fungible life: Experiment in the Asian city of life. Duke University Press.
Pálsson G (2008) The rise and fall of a biobank: the case of Iceland. In Gottweis H and
Petersen A (eds.) Biobanks. Governance in comparative perspective, pp. 41-56,
Parson W and Roewer L (2010) Publication of population data of linearly inherited DNA
markers in the International Journal of Legal Medicine. International Journal of Legal
Medicine 124(5): 505-509.
Picker G (2017) Racial cities: Governance and the segregation of Romani people in
urban Europe. London and New York: Routledge.
Pikó P et al. (2017) Genetic factors exist behind the high prevalence of reduced high-
density lipoprotein cholesterol levels in the Roma population. Atherosclerosis 263:
Plášilová M et al. (1999) Identification of a single ancestral CYP1B1 mutation in Slovak
Gypsies (Roms) affected with primary congenital glaucoma. Journal of Medical
Genetics 36.4: 290-294
Plájás IZ, M’charek A and van Baar H (2019) Knowing “the Roma”: Visual technologies
of sorting populations and the policing of mobility in Europe. Environment and
Planning D: Society and Space 37(4): 589-605.
Poveda A, Ibáñez M E and Rebato E (2014) Common variants in BDNF, FAIM2, FTO,
MC4R, NEGR1, and SH2B1 show association with obesity-related variables in
Spanish Roma population. American Journal of Human Biology 26(5): 660-669.
Poviliunas A. (2011) Lithuania. Promoting Social Inclusion of Roma .A Study of
National Policies. On behalf of the European Commission DG Employment, Social
Affairs and Inclusion.
Radin J and Kowal E (2015). Indigenous blood and ethical regimes in the United States
and Australia since the 1960s. American Ethnologist 42(4): 749-765.
Rajagopalan RM, Nelson A and Fujimura JH (2017). In Felt U et al. (eds) The Hand-
book of Science and Technology Studies. Fourth Edition, Cambridge: MIT Press,
Ramal L M et al. (2001) HLA class II allele distribution in the Gypsy community of
Andalusia, southern Spain. Tissue Antigens 57.2: 138-143.
Reardon J (2017) The postgenomic condition: ethics, justice, and knowledge after the
genome. Chicago: University of Chicago Press.
Reardon J (2005) Race to the Finish: Identity and Governance in an Age of Genomics.
Princeton University Press.
Regueiro M et al. (2011) Divergent patrilineal signals in three Roma populations.
American Journal of Physical Anthropology 144.1: 80-91.
Rex-Kiss B, Szabó L and Szabó S (1972a). Blood group investigations among the
Gypsy population of Hungary. I. Examination of ABO, MN and Rh blood groups.
Annales immunologiae Hungaricae 16: 355-370.
Saiz MS et al. (2014) Action protocols in DNA identification of isolated populations. J
Forensic Res 5(218): 2.
Salihović M P et al. (2011) The role of the Vlax Roma in shaping the European Romani
maternal genetic history. American Journal of Physical Anthropology 146(2): 262-
Santos RV, Da Silva GO and Gibbon S (2014) Pharmacogenomics, human genetic
diversity and the incorporation and rejection of color/race in Brazil. Biosocieties
Schramm K, Skinner D and Rottenburg R (eds.) (2012) Identity politics and the new
genetics: Re/creating categories of difference and belonging. Berghahn Books.
Schwartz-Marín E et al. (2015) Colombian forensic genetics as a form of public
science: The role of race, nation and common sense in the stabilization of DNA
populations. Social Studies of Science 45(6): 862-885.
Surdu M (2016) Those Who Count. Budapest, New York: Central European University
Surdu M and Kovats M (2015) Roma identity as an expert-political construction. Social
Inclusion 3(5): 5-18.
Surdu M (2019) Why the “real” numbers on Roma are fictitious: Revisiting practices of
ethnic quantification. Ethnicities 19 (3): 486–502.
Star SL (1983) Simplification in scientific work: An example from neuroscience re-
search. Social Studies of Science 13(2): 205-228.
Stewart M (1997) The Time of the Gypsies. Boulder, Colo: Westview Press.
Stewart M (2013). Roma and Gypsy "Ethnicity" as a Subject of Anthropological Inquiry.
Annual Review of Anthropology 42: 415-432.
Szakony, Balogh and Muszbek (1999) Simultaneous occurrence of follicular lymphoma
in two monozygotic twins. British Journal of Haematology107: 461-465.
TallBear K (2013) Native American DNA: Tribal belonging and the false promise of
genetic science. Minneapolis: University of Minnesota Press.
Tarkkala H and Tupasela A (2018) Shortcut to success? Negotiating genetic unique-
ness in global biomedicine. Social Studies of Science 48(5): 740-761.
Tournev I (2016) The Meryon Lecture at the 18th Annual Meeting of the Meryon So-
ciety Wolfson College, Oxford, UK, 12th September 2014: Neuromuscular disorders
in Roma (Gypsies)–collaborative studies, epidemiology, community-based carrier
testing program and social activities. Neuromuscular Disorders 26(1): 94-103.
Tremlett A (2014) Making a difference without creating a difference: Super-diversity as
a new direction for research on Roma minorities. Ethnicities 14(6): 830-848.
Tsai YY (2010) Geneticizing Ethnicity: A study on the “Taiwan Bio-Bank”. East Asian
Science, Technology and Society: An International Journal 4(3): 433-455.
Tupasela A (2016) Populations as brands in medical research: placing genes on the
global genetic atlas. Biosocieties 12.1:47-65.
van Baar H (2018) Contained mobility and the racialization of poverty in Europe: The
Roma at the development–security nexus. Social identities 24(4): 442-458.
van Baar H (2015). The Perpetual Mobile Machine of Forced Mobility: Europe’s Roma
and the Institutionalization of Rootlessness. In Jansen Y, Celikates R and de Bloois
J (Eds.), The Irregularization of Migration in Contemporary Europe: detention, de-
portation, drowning (pp. 71-86). London: Rowman & Littlefield International.
Varszegi D et al. (2014) Hodgkin disease therapy induced second malignancy suscep-
tibility 6q21 functional variants in Roma and Hungarian population samples.
Pathology & Oncology Research 20.3: 529-533.
Vincze E (2019) Ghettoization: The Production of Marginal Spaces of Housing and the
Reproduction of Racialized Labour. In Vincze E et al. (Eds.) Racialized Labour in
Romania. Cham: Palgrave Macmillan, pp. 63-95.
Vermeersch P (2005) Marginality, Advocacy, and the Ambiguities of Multiculturalism:
Notes on Romani Activism in Central Europe. Identities: Global Studies in Culture
and Power, 12, 451-478.
Wade P (2017) Liberalism and Its Contradictions: Democracy and Hierarchy in Mesti-
zaje and Genomics in Latin America. Latin American Research Review 52(4): 623–
Wade P et al. (eds.) (2014) Mestizo genomics: race mixture, nation, and science in
Latin America. Durham and London: Duke University Press.
Willems W (1997) In Search of the True Gypsy: From Enlightenment to Final Solution.
London: Frank Cass.
Yıldız C and De Genova N (2017) Un/Free mobility: Roma migrants in the European
Union. Social Identities 24 (4): 425-441.
The grouping of Roma in one homogenous category has a long history; it is beyond the scope of this article
to retrace it. The term „Roma“ was introduced in political and academic discourse to replace the term “Gypsies”
after 1971 when the first World Romani Congress was held. “Gypsies” is considered a pejorative term by many who
identify as Roma, though others self-identify as “Gypsies” in censuses, interviews or other situations. Even though
ethnic self-identification is considered state-of-the-art in census taking, some state administrations still employ the
term “Roma” for classifying people who would not self-identify as such. For example, some groups who are counted
under the category “Roma” would self-identify as Egyptians and Ashkali (in Macedonia, Kosovo and Albania),
Boyash or Rudars (in Romania, Serbia, Croatia and Hungary). The Council of Europe’s (CoE) understanding of the
term “Roma” is widely used in policy reports and academic publications: ”The term ‘Roma’ used at the Council of
Europe refers to Roma, Sinti, Kale and related groups in Europe, including Travellers and the Eastern groups (Dom
and Lom), and covers the wide diversity of the groups concerned, including persons who identify themselves as
Gypsies (CoE, 2012:4)”. Further, the CoE (2012:7) comes close to an essentialist definition of Roma, linking ethnic
belonging with ancestral origin: “The term ‘Roma’, as used internationally, denotes all groups sharing a common
Indian origin (Roma, Sinti, Kale), and the communities who refer to themselves as Roma, found mainly in the
Balkans and central and eastern Europe, but also throughout the world”. Although we consider it problematic to use
the term “Roma” to ascribe ancestral origins to people who do not self-idenfy accordingly, or as an umbrella category
in censuses and expert estimates (see Surdu, 2016), we use it in this article to refer to the persons addressed as
Roma, “Gypsies” or “Roma/Gypsies“ in genetic studies and those who are subsumed as „European Roma
population“ in these studies.
Not included in this analysis are ca. 70 seroanthropological studies published between 1921 and 1994.
For an overview of ethically problematic aspects of using DNA data of Roma in forensic contexts see Lipphardt
and Surdu (submitted) “ DNA Data from Roma in Forensic Genetic Studies and Databases: Risks and Challenges”
Further publications are in the making. In Freiburg, we closely collaborate with colleagues from the life
sciences (biologists, epidemiologists) and mathematicians.
To name but a few who have demonstrated that cultural, political and social preassumptions about hu-
man groups not only inform the research designs, group labels and the collection of DNA data, but also reinforce
existing stereotypical group notions concerning ethnic or racial minorities, vulnerable and marginalized groups:
Fujimura et al 2014; Bliss 2015, 2018; Duster, 2015; Fullwiley, 2015; Gannett, 2014; Lipphardt, 2014, 2019;
M‘charek, Schramm, Skinner 2014; Radin and Kowal, 2015; Munsterhjelm, 2014; Rajagopalan, Nelson and
Fujimura, 2017; Reardon, 2017; Schwartz-Marin et al., 2015; Tallbear, 2013 and Wade et al., 2014.
More specifically, some have sought to overcome the limitations of small samples by technological solutions. But
these solutions draw on the reconstruction of a supposed biological population; that is, a number of people
sharing common biological ancestors, and not a politically or socially defined population.
To be sure, this is standardized routine in English speaking countries such as the US, Canada, and Great
Britain, but not or much less so in other countries.
In subsequent publications, we will use quantitative methods for a statistical analysis of our text col-
An overwhelming number of scholars, representatives of Roma NGOs and international organizations,
policy makers and politicians consider that a census based on self-assignment cannot produce a reliable count of
Roma. Some argue that Roma “hide” their “true” identity and choose to self-identify with other ethnic labels. This,
however, undermines the concept of self-identification as such and implicitly subscribes to an essentialised per-
spective of Roma ethnicity based on allegedly objective criteria.
Such grand ”biohistorical narratives” can be understood as stories constitutive of social formations such as eth-
nic groups and nations and described in evolutionary biology language with concepts such as mutation, selection,
drift, founder events and admixture (Lipphardt and Niewöhner, 2007). These are intertwined with personal family
stories of heritage and kinship.
In genomics, so-called hypothesis-free methods are highly valued for achieving novel and unexpected insights.
Analysing the genomes of patients affected by the same disease symptoms for commonalities, one hopes for a
significant finding or correlation. STS-scholars would argue that the sampling of a patient group (i.e. before the
experiment is run) cannot be hypothesis-free – as the patients are hypothesized to represent a group affected by
the same condition. For a genetic history study, the hypothesis is to be found in the sampling as well, but also in
the assumptions about the “ancestral” population; the equivalent to the “experiment” is the population’s history.
The recounted narrative of that population history is also a hypothesis about how the observed genetic structure
has emerged over time.
Some genetic papers make references to articles published by the Gypsy Lore Society mostly before
1945. Morar et al. (2004), Kalaydjieva et al. (2005), Gresham et al. (2001) and Tournev (2014) refer to a
publication from 1915-1916; Moorjani et al. (2013) refers to a publication from 1927; de Pablo et al.(1992) and
Ramal et al. (2001) refer to a publication from 1923; Regueiro et al. (2011) refers to a publication from
1941.Historically the publications of the Gypsy Lore Society were a major source of “scientific racism” until the
late 1970s (Acton, 2015).
Angus Fraser’s book “The Gypsies” (1992) is the most often cited publication from the humanities; in
most cases as supporting the claims about endogamy as a cultural tradition among Roma. However, Fraser also
suggests that “mixing” was very frequent. Yet those of Fraser’s statements that contradict the conceptualization of
Roma as a genetic isolate are not cited in the genetic studies.
A noteworthy exception are the co-author contributions by ethnographers Marushiakova and Popov to
the genetic studies of Gresham et al. (2001) and Martinez-Cruz et al. (2015).
In the territories of the present day Romania, the country with the largest Roma population, Roma were
enslaved from the 13th until mid 19th century.
In the past, geneticists have found various ways to overcome these problems, such as offering
incentives. If there were effective incentives to self identify as Roma, the geneticists would need to consider what
this implies in terms of building a sample, as these incentives would perhaps attract people they did not expect to
We thank Peter Pfaffelhuber, Department of Mathematical Stochastics, Freiburg University, for his
insightful comment on the non-comparability of the samples.
The dataset from the study of Gresham et al. (2001) has been shared with other research teams at least
20 times. In some cases we observed unexplained attrition of data.
Considerable definitory unclarity exists for the three terms “endogamy”, “inbreeding” and “incest”, both between
genetics and social sciences and within them. “Endogamy” and “incest” can be viewed as the two end points of a
spectrum of in-group parenthood. In between the two, there is a vast range of parenthood between more or less
closely related partners. In genetics, “inbreeding” is used interchangeably with the other two terms, and for cove-
ring phenomena all over the whole spectrum. Social scientist understand endogamy mainly as in-group marriage
and parenthood between partners from unrelated families. “Inbreeding” would rather be understood as overlap-
ping with “incest”. Of course, “relatedness” and “incest” are culturally contingent concepts and differ between
countries and societies.
The text reads: „from Hungary (3 linguistically and culturally separated sub-groups: 7 samples from Olah
(Vlah), 4 samples from Beas (Boyash) and 4 samples from Romungro), 4 samples from Romania, 4 samples from
Spain and 4 samples from Slovakia [...].“
A leading forensic journal stated in 2010: ‘It is therefore of utmost importance to carefully describe the
sampled population correctly and in detail with respect to geographic origin and demographic background
applying termini from molecular anthropology and population genetics. This includes the use of a correct ethonym
(e.g. “Roma” instead of “Gypsy”, “Europeans” instead of “Caucasians”, etc.), the definition of the linguistic, and (if
applicable) cultural groups (e.g. casts) and subgroups.’ (Parson and Roewer, 2010: 506).
Or „Caucasians“ (mostly used in biomedical studies), „Whites“ (e.g. Castella et al., 2011; Varszegy
et.al., 2014), “indigenous” (e.g. Pamjav et al., 2011), or “non-Roma”’ “Non-Roma” and “Caucasian” are sometimes
used interchangeably (e.g. Masindova et al., 2015 and Molnar et al., 2012). This interchangeable use of
categories suggests that in some cases “non-Roma” is used as a way of coding racial division.
Interestingly, if unusual, Hungary, Slovenia and Lithuania are included into „Western European“.
Angelicheva et al., 1997; Desviat, Perez and Ugarte (1997:67); Morar and Kalaydjieva, 2008; Kalanin et
A good dozen of such „private“ mutations are reported in the literature.!