Vincent A. Traag's research while affiliated with Leiden University and other places

Publications (82)

Preprint
The study of biases, such as gender or racial biases, is an important topic in the social and behavioural sciences. However, the concept of bias is not always clearly defined in the literature. Definitions of bias are often ambiguous, or definitions are not provided at all. To study biases in a precise way, it is important to have a well-defined co...
Preprint
Citations in science are being studied from several perspectives. On the one hand, there are approaches such as scientometrics and the science of science, which take a more quantitative perspective. In this chapter I briefly review some of the literature on citations, citation distributions and models of citations. These citations feature prominent...
Preprint
Theoretical arguments and empirical investigations indicate that a high proportion of published findings are false or do not replicate. The current position paper provides a broad perspective on this scientific error, focusing both on reform history and on opportunities for future reform. Talking points are organised along four main themes: methodo...
Article
Full-text available
This paper introduces a framework for understanding complex temporal interaction patterns in large-scale scientific collaboration networks. In particular, we investigate how two key concepts in science studies, scientific collaboration and scientific mobility, are related and possibly differ between fields. We do so by analyzing multilayer temporal...
Article
Full-text available
Articles in high-impact journals are, on average, more frequently cited. But are they cited more often because those articles are somehow more “citable”? Or are they cited more often simply because they are published in a high-impact journal? Although some evidence suggests the latter the causal relationship is not clear. We here compare citations...
Article
Full-text available
Most scientometricians reject the use of the journal impact factor for assessing individual articles and their authors. The well-known San Francisco Declaration on Research Assessment also strongly objects against this way of using the impact factor. Arguments against the use of the impact factor at the level of individual articles are often based...
Article
Full-text available
As the COVID-19 pandemic unfolds, researchers from all disciplines are coming together and contributing their expertise. CORD-19, a dataset of COVID-19 and coronavirus publications, has been made available alongside calls to help mine the information it contains and to create tools to search it more effectively. We analyse the delineation of the pu...
Article
Full-text available
[This corrects the article DOI: 10.1098/rsos.190207.].
Preprint
Full-text available
In the past decades, many countries have started to fund academic institutions based on the evaluation of their scientific performance. In this context, peer review is often used to assess scientific performance. Bibliometric indicators have been suggested as an alternative. A recurrent question in this context is whether peer review and metrics te...
Article
Examining coauthorship networks is key to study scientific collaboration patterns and structural characteristics of scientific communities. Here, we studied coauthorship networks of sociologists in Italy, using temporal and multi-level quantitative analysis. By looking at publications indexed in Scopus, we detected research communities among Italia...
Preprint
Full-text available
As the COVID-19 pandemic unfolds, researchers from all disciplines are coming together and contributing their expertise. CORD-19, a dataset of COVID-19 and coronavirus publications, has recently been published alongside calls to help mine the information it contains, and to create tools to search it more effectively. Here, we focus on the delineati...
Article
Full-text available
Citation networks of scientific publications offer fundamental insights into the structure and development of scientific knowledge. We propose a new measure, called intermediacy, for tracing the historical development of scientific knowledge. Given two publications, an older and a more recent one, intermediacy identifies publications that seem to p...
Preprint
Articles in high-impact journals are by definition more highly cited on average. But are they cited more often because the articles are somehow "better"? Or are they cited more often simply because they appeared in a high-impact journal? Although some evidence suggests the latter the causal relationship is not clear. We here compare citations of pu...
Chapter
This chapter is concerned with signed networks, where each link is associated with either a positive (+) or negative sign (‐). Blockmodeling, as a way of partitioning social networks, started with a clear substantive rationale expressed in terms of social roles. However, the availability of algorithms for partitioning (unsigned) networks, based on...
Article
Full-text available
Minority integration is a highly contested topic in public debates, and assimilationist actors appear to have gained discursive ground. However, it remains difficult to accurately depict how power relations in debates change and evolve. In this study, the public debates on minority integration in Flanders and the Netherlands between 2006 and 2012 a...
Article
Full-text available
Community detection is often used to understand the structure of large and complex networks. One of the most popular algorithms for uncovering community structure is the so-called Louvain algorithm. We show that this algorithm has a major defect that largely went unnoticed until now: the Louvain algorithm may yield arbitrarily badly connected commu...
Article
Full-text available
When performing a national research assessment, some countries rely on citation metrics whereas others, such as the UK, primarily use peer review. In the influential Metric Tide report, a low agreement between metrics and peer review in the UK Research Excellence Framework (REF) was found. However, earlier studies observed much higher agreement bet...
Preprint
Citation networks of scientific publications offer fundamental insights into the structure and development of scientific knowledge. We propose a new measure, called intermediacy, for tracing the historical development of scientific knowledge. Given two publications, an older and a more recent one, intermediacy identifies publications that seem to p...
Preprint
Community detection is often used to understand the structure of large and complex networks. One of the most popular algorithms for uncovering community structure is the so-called Louvain algorithm. We show that this algorithm has a major defect that largely went unnoticed until now: the Louvain algorithm may yield arbitrarily badly connected commu...
Preprint
When performing a national research assessment, some countries rely on citation metrics whereas others, such as the UK, primarily use peer review. In the influential Metric Tide report, a low agreement between metrics and peer review in the UK Research Excellence Framework (REF) was found. However, earlier studies observed much higher agreement bet...
Article
Full-text available
Signed networks appear naturally in contexts where conflict or animosity is apparent. In this book chapter we review some of the literature on signed networks, especially in the context of partitioning. Most of the work is founded in what is known as structural balance theory. We cover the basic mathematical principles of structural balance theory....
Article
Most scientometricians reject the use of the journal impact factor for assessing individual articles and their authors. The well-known San Francisco Declaration on Research Assessment also strongly objects against this way of using the impact factor. Arguments against the use of the impact factor at the level of individual articles are often based...
Article
Protesters are usually young, relatively well educated, middle class people that are politically engaged. But where do protesters come from? We here show, based on mobile phone data, that distance is an important impedance to protest attendance. Most protesters come from nearby regions, suggesting distance forms an obstacle to participation. Althou...
Article
This paper elaborates a relational approach to examine discursive contention. We develop a network method to identify groups forming through contentious interactions as well as relational measures of polarization, leadership, solidarity and various aspects of discursive power. The paper analyzes how an assimilationist movement confronted its advers...
Article
Full-text available
Mobile phone data have been extensively used in the recent years to study social behavior. However, most of these studies are based on only partial data whose coverage is limited both in space and time. In this paper, we point to an observation that the bias due to the limited coverage in time may have an important influence on the results of the a...
Data
Results for presidential candidates, election 2000. Daily donors (a)—raw data is transparent, smoothed data is solid—and cumulative donors (b), probability effect of (c) donor degree, (d) common community donor degree, (e) donor communities, (f) source diversity and (g) previous donation. Logistic regression results (h) general effects, (i) network...
Data
Logistic regression results for republican candidates. Results for 2000–2012 for donation to the republican candidate as dependent variable. (TIFF)
Data
Results for presidential candidates, election 2004. Daily donors (a)—raw data is transparent, smoothed data is solid— and cumulative donors (b), probability effect of (c) donor degree, (d) common community donor degree, (e) donor communities, (f) source diversity and (g) previous donation. Logistic regression results (h) general effects, (i) networ...
Data
Results for presidential candidates, election 2008. Daily donors (a)—raw data is transparent, smoothed data is solid— and cumulative donors (b), probability effect of (c) donor degree, (d) common community donor degree, (e) donor communities, (f) source diversity and (g) previous donation. Logistic regression results (h) general effects, (i) networ...
Data
Results for parties, election 2008. Daily donors (a)—raw data is transparent, smoothed data is solid— and cumulative donors (b), probability effect of (c) donor degree, (d) common community donor degree, (e) donor communities, (f) source diversity and (g) previous donation. Logistic regression results (h) general effects, (i) network effects for ne...
Data
Results for presidential candidates, election 2012. Daily donors (a)—raw data is transparent, smoothed data is solid— and cumulative donors (b), probability effect of (c) donor degree, (d) common community donor degree, (e) donor communities, (f) source diversity and (g) previous donation. Logistic regression results (h) general effects, (i) networ...
Data
Cross-cutting logistic regression results for democratic party. Results for 2000–2012 for donation to the democratic party as dependent variable, including cross-exposure effects (i.e. effect of exposure to republican donors on democratic donations). (TIFF)
Data
Logistic regression results for democratic candidates. Results for 2000–2012 for donation to the democratic candidate as dependent variable. (TIFF)
Data
Results for parties, election 2012. Daily donors (a)—raw data is transparent, smoothed data is solid— and cumulative donors (b), probability effect of (c) donor degree, (d) common community donor degree, (e) donor communities, (f) source diversity and (g) previous donation. Logistic regression results (h) general effects, (i) network effects for ne...
Data
Results for parties, election 2004. Daily donors (a)—raw data is transparent, smoothed data is solid— and cumulative donors (b), probability effect of (c) donor degree, (d) common community donor degree, (e) donor communities, (f) source diversity and (g) previous donation. Logistic regression results (h) general effects, (i) network effects for ne...
Data
Logistic regression results for democratic party. Results for 2000–2012 for donation to the democratic party as dependent variable. (TIFF)
Data
Cross-cutting logistic regression results for republican party. Results for 2000–2012 for donation to the republican party as dependent variable, including cross-exposure effects (i.e. effect of exposure to democratic donors on republican donations). (TIFF)
Data
Logistic regression results for republican party. Results for 2000–2012 for donation to the republican party as dependent variable. (TIFF)
Article
Full-text available
Money is central in US politics, and most campaign contributions stem from a tiny, wealthy elite. Like other political acts, campaign donations are known to be socially contagious. We study how campaign donations diffuse through a network of more than 50000 elites and examine how connectivity among previous donors reinforces contagion. We find that...
Data
Cross-cutting logistic regression results for republican candidates. Results for 2000–2012 for donation to the republican candidate as dependent variable, including cross-exposure effects (i.e. effect of exposure to democratic donors on republican donations). (TIFF)
Data
Results for parties, election 2000. Daily donors (a)—raw data is transparent, smoothed data is solid— and cumulative donors (b), probability effect of (c) donor degree, (d) common community donor degree, (e) donor communities, (f) source diversity and (g) previous donation. Logistic regression results (h) general effects, (i) network effects for ne...
Data
Cross-cutting logistic regression results for democratic candidates. Results for 2000–2012 for donation to the democratic candidate as dependent variable, including cross-exposure effects (i.e. effect of exposure to republican donors on democratic donations). (TIFF)
Data
Data for replication. The Excel file donations.xls contains detailed donation records for the presidential campaigns for studying whether the complex contagion of donations is driven by cohesive reinforcement or independent reinforcement. It also contains the aggregate statistics for other campaigns to predict the total amount of money raised. The...
Chapter
Social networks have been of much interest in recent years. We here focus on a network structure derived from co-occurrences of people in traditional newspaper media. We find three clear deviations from what can be expected in a random graph. First, the average degree in the empirical network is much lower than expected, and the average weight of a...
Article
This paper presents a new method of identifying a nation's political elite using computational techniques on digitised newspaper articles. It begins by describing the three most widely used methods of identifying political elites: positional, decisional and reputational. It then introduces the "reported elite method", exploring the kinds of elites...
Article
Many complex networks exhibit a modular structure of densely connected groups of nodes. Usually, such a modular structure is uncovered by the optimisation of some quality function. Although flawed, Modularity remains one of the most popular quality functions. The Louvain algorithm was originally developed for optimising Modularity, but has been app...
Article
Nodes in real-world networks are repeatedly observed to form dense clusters, often referred to as communities. Methods to detect these groups of nodes usually maximize an objective function, which implicitly contains the definition of a community. We here analyze a recently proposed measure called Surprise, which assesses the quality of the partiti...
Article
This paper introduces the Elite Network Shifts (ENS) project to the Asian Studies community where computational techniques are used with digitised newspaper articles to describe changes in relations among Indonesian political elites. Reflecting on how "political elites" and "political relations" are understood by the elites, as well as across the d...
Article
We present a new computational methodology to identify national political elites, and demonstrate it for Indonesia. On the basis that elites have an "organised capacity to make real and continuing political trouble", we identify them as those individuals who occur most frequently in a large corpus of politically-oriented newspaper articles. Doing t...
Article
Studies of human attention dynamics analyses how attention is focused on specific topics, issues or people. In online social media, there are clear signs of exogenous shocks, bursty dynamics, and an exponential or powerlaw lifetime distribution. We here analyse the attention dynamics of traditional media, focussing on co-occurrence of people in new...
Article
The rise of social media allowed for rich analyses of their content and their network structure. As traditional media (i.e. newspapers and magazines) are being digitized, similar analyses can be undertaken. This provides a glimpse of the elite, as the news mostly revolves around the more influential members of society. We here focus on a network st...
Chapter
Although the field of community detection is relatively young, already quite some methods and algorithms have been introduced in the literature. In this chapter, we will review several of these methods, and provide some algorithms for implementing these methods. We will derive most of these methods from a relatively general framework, to which we r...
Chapter
The field of community detection has a short but rich history, and communities have been found to be useful in many different settings. We here review two applications of community detection, and in the process we will show how previously discussed problems appear and are addressed.
Chapter
The distinction between positive and negative links is not often made. Nonetheless, it can be essential for understanding the network structure. We here review an old theory from sociology, known as social balance theory. The idea is similar to the old adage of “the enemy of my enemy is my friend". We will derive some of the classical results, whic...
Chapter
Most methods for community detection assume that the weight of links is positive. However, there are many situations in which it is natural to use negative weights, for example, for modelling conflict or hatred, or correlations. We briefly address this issue in this chapter, and see that some methods are better able to cope with negative weights th...
Chapter
Some multi-resolution methods may be able to overcome the issue of the resolution-limit. Nonetheless, it remains difficult to find “meaningful” or “good” resolution values. In addition, it is not always clear whether the observed partition is really different from what can be observed in a random graph. We here introduce the notion of the significa...
Chapter
The evolution of cooperation is a long-standing problem that has baffled biologists and sociologists alike. The problem is that not cooperating often allows for a higher immediate benefit, so why should cooperation take place? Nonetheless, we often observe cooperative behaviour, especially so in humans. One of the theories is that people use a repu...
Chapter
Although modularity has been one of the most frequently used methods the past decade, it suffers from some drawbacks. We will review these drawbacks here, and see whether the other methods reviewed in this thesis suffer from similar drawbacks. One of the most well-known problems is that of the resolution-limit, and we will introduce a more formal a...
Chapter
Social balance theory states that signed social networks should tend to split in two factions, each faction having only positive links within and negative links between the two. Although the theory has long been concerned with finding evidence of such groupings in social networks, little attention has been devoted to what dynamics may give rise to...
Chapter
In many online settings, such as in online markets or peer-to-peer applications, we want to preferably deal with trustworthy partners. Usually, by letting users rate each other, some indication of trustworthiness is obtained. However, the ratings of users that are themselves not trustworthy should not be trusted. We here suggest a method for solvin...
Article
We argue that theories regarding the relationship between trade and conflict could benefit greatly from accounting for the networked structure of international trade. Indirect trade relations reduce the probability of conflict by creating (1) opportunity costs of conflict beyond those reflected by direct trade ties; and (2) negative externalities f...
Article
Full-text available
Many complex networks show signs of modular structure, uncovered by community detection. Although many methods succeed in revealing various partitions, it remains difficult to detect at what scale some partition is significant. This problem shows foremost in multi-resolution methods. We here introduce an efficient method for scanning for resolution...
Data
Phase portrait of system S12-S13. Circular orbits in the upper half plane (a >0) are traversed counter clockwise, whereas circular orbits in the lower half plane (a <0) are traversed clockwise. (TIFF)
Data
Proofs and details of statements in the main paper. (PDF)
Article
Full-text available
Social life coalesces into communities through cooperation and conflict. As a case in point, Shwed and Bearman (2010) studied consensus and contention in scientific communities. They used a sophisticated modularity method to detect communities on the basis of scientific citations, which they then interpreted as directed positive network ties. They...
Article
Full-text available
Mobile phone datasets allow for the analysis of human behavior on an unprecedented scale. The social network, temporal dynamics and mobile behavior of mobile phone users have often been analyzed independently from each other using mobile phone datasets. In this article, we explore the connections between various features of human behavior extracted...
Article
Full-text available
Social networks with positive and negative links often split into two antagonistic factions. Examples of such a split abound: revolutionaries versus an old regime, Republicans versus Democrats, Axis versus Allies during the second world war, or the Western versus the Eastern bloc during the Cold War. Although this structure, known as social balance...
Conference Paper
The unprecedented amount of data from mobile phones creates new possibilities to analyze various aspects of human behavior. Over the last few years, much effort has been devoted to studying the mobility patterns of humans. In this paper we will focus on unusually large gatherings of people, i.e. unusual social events. We introduce the methodology o...
Article
Detecting communities in large networks has drawn much attention over the years. While modularity remains one of the more popular methods of community detection, the so-called resolution limit remains a significant drawback. To overcome this issue, it was recently suggested that instead of comparing the network to a random null model, as is done in...
Conference Paper
Explaining how cooperation can emerge, and persist over time in various species is a prime challenge for both biologists and social scientists. Whereas cooperation in non-human species might be explained through mechanisms such as kinship selection or reciprocity, this is usually regarded as insufficient to explain the extent of cooperation observe...
Conference Paper
Networks have attracted a great deal of attention the last decade, and play an important role in various scientific disciplines. Ranking nodes in such networks, based on for example PageRank or eigenvector centrality, remains a hot topic. Not only does this have applications in ranking web pages, it also allows peer-to-peer systems to have effectiv...
Article
Detecting communities in complex networks accurately is a prime challenge, preceding further analyses of network characteristics and dynamics. Until now, community detection took into account only positively valued links, while many actual networks also feature negative links. We extend an existing Potts model to incorporate negative links as well,...

Citations

... In this case, we should not normalise citations for the journal J, because doing so would most likely make the normalised citations a less accurate indicator for quality Q, not a more accurate indicator for quality Q. In fact, based on this observation, the journal J might be a more accurate indicator of Q than the citations C, as suggested by Waltman and Traag (2020). Now suppose that author prestige A and departmental prestige P affects acceptance, so that A → J and P → J, for which there is some evidence, as we saw in section II A. If we assume that A and P are independent of Q we might want to normalise for those effects, so that the normalised citations are not biased by A or P , but a more accurate reflection of Q. ...
... However, this conclusion is problematic if there is a causal effect of where a paper is published on how frequently it is cited. Being published in a high-ranked journal will affect the subsequent citations (Traag, 2021), and the citations do not necessarily reflect whether peer review is predictive, the citations just reflect the causal effect of being published in a certain venue. A similar problem plays in a recent analysis of the predictive validity of peer review when highlighting publications in a journal (Antonoyiannakis, 2021). ...
... Bibliometric studies have been widely applied at multiple scholarly areas with few successfully guiding decision-making across the respective thematic fields (19)(20)(21). At present, a plethora of scientometric and bibliometric studies have been published aiming at gaining more insights on the landscape of publications related to COVID-19 (13,(22)(23)(24)(25)(26). However, only a small portion of bibliometric analyses have explored temporally the pandemic in terms of research output during the first months (27)(28)(29)(30), and additionally, the economic aspect driving scholarly productivity has not been systematically examined. ...
... Ce faisant, elle empêcherait la structuration de la sociologie italienne par de véritables controverses scientifiques et oppositions théoriques, participant d'une certaine stérilisation de la discipline. Et ce d'autant plus que, comme l'ont confirmé les principales études quantitatives sur le sujet, fondées sur l'analyse des citations mutuelles et des cosignatures entre universitaires italiens(Riviera, 2015 ;Akbaritabar et al., 2020), la division de l'espace national de la sociologie en composantes recoupe en partie celle liée aux spécialisations thématiques : le Mi-To étant par exemple surreprésenté parmi les sociologues des inégalités, de l'économie, du politique et des mouvements sociaux ; les « Catholiques » parmi les sociologues de la culture, des migrations, de l'intervention sociale et de la communication ; et les « Romains » parmi les méthodologues. ...
... Wenn man nach Publikationen sucht, die irgendwo im Text Begriffe erwähnen, die mit dem Virus zusammenhängen, dann stößt man auf 58.000 Veröffentlichungen. Dies sind im Vergleich zu früheren Epidemien mehrere Tausend Publikationen mehr [301]. In der Mehrzahl handelte es sich bei den Publikationen nicht um Studien mit Daten, sondern um Meinungsäußerungen [302] -nicht überraschend angesichts der kaum vorhandenen empirischen Basis in der Frühzeit der Pandemie. ...
... El análisis de las citas como herramienta para evaluar una revista científica fue desarrollado por Eugene Garfield cuando trabajaba en el Institute for Scientific Information (ISI 'Instituto para Información Científica), e introdujo la base de datos WoS (perteneciente al ISI) y publicó el Journal Citation Reports (JCR 'Informes de citas en revistas científicas) en 1976. [10] En un informe suyo de hace más de 50 años, se refirió al análisis de citas como un medio válido y valioso para crear descripciones históricas exactas de campos científicos [11]. Las citas son siempre una influencia del área de conocimiento, en especial para las áreas de gran impacto científico; de donde es posible determinar los estudios más relevantes [12], por lo que para conocer el impacto de una revista, se debe analizar las citas de sus publicaciones. ...
... The line (deletion) index of balance measures the minimum number of links whose removal results in balance. Since then, apart from subsequence works focusing on this index [280,281], many other approaches have been proposed, such as measures of balance in terms of simple cycles [279,[282][283][284][285], in terms of inconsistent links within the signed blockmodel framework [286,287], walk-based measures [288][289][290], energy-based measures [291], measures based on algebraic topology tools [292], and on solution of correlation clustering problems [293,294]. For a recent comparison of these measures; see [295,296]. ...
... Single-cell clustering is always an important work in the field of single-cell analysis, which allows us to infer the identity of cells. PhenoGraph is applied as the clustering method in CITEMO framework, which uses the Leiden algorithm as an emerging clustering method designed specifically for singlecell data [77,78]. Especially, PhenoGraph is optimized for the clusters with broken links in Leiden clustering distribution, giving a more reasonable clustering result with more subpopulations. ...
... Studies have reported wildly varying correlations, ranging from as low as 0.3 to as high as 0.97. There are two major factors that explain the differences in these results (Traag and Waltman, 2019). The first factor is what level of aggregation is being studied. ...
... Literature [10] adopted human body detection based on FAST-CNN. Literature [11] proposed an attitude partitioning network for node detection and intensive regression. ...