Gianmarco De Francisci Morales’s research while affiliated with ISI Foundation and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (139)


Figure 1: Probabilistic Graphical Model (left) and description of nodes (right). Circles represent stochastic variables; gray nodes are observed and white ones are latent.
Figure 3: Mean values of the estimated parameters, with 95% credible intervals. (a) The coefficients are the causal effects of the parents of í µí±†, Equation (3). These contribute to the weighted average of the mean of the normal extraction of í µí±†. (b) The coefficients are the log-odds in Equation (6). (c) The coefficients are log í µí»½ í µí±1 in Equation (4). They represent the interaction term between í µí°· í µí± í µí±¢í µí± and í µí±†. When the coefficient is positive, an increase of í µí±† increases the odds of writing in a subreddit with a positive score in that sociodemographic category. (d) The coefficients are log í µí»½ í µí°¼ 1 in Equation (5). They represent the interaction term between í µí±ƒ í µí±† and í µí°· í µí± í µí±¢í µí± . When the coefficient is positive, an increase of í µí±ƒ í µí±† · í µí°· í µí± í µí±¢í µí± increases the odds of interacting with an activist.
Causal Modeling of Climate Activism on Reddit
  • Preprint
  • File available

October 2024

·

36 Reads

·

Luca Maria Aiello

·

·

Gianmarco De Francisci Morales

Climate activism is crucial in stimulating collective societal and behavioral change towards sustainable practices through political pressure. Although multiple factors contribute to the participation in activism, their complex relationships and the scarcity of data on their interactions have restricted most prior research to studying them in isolation, thus preventing the development of a quantitative, causal understanding of why people approach activism. In this work, we develop a comprehensive causal model of how and why Reddit users engage with activist communities driving mass climate protests (mainly the 2019 Earth Strike, Fridays for Future, and Extinction Rebellion). Our framework, based on Stochastic Variational Inference applied to Bayesian Networks, learns the causal pathways over multiple time periods. Distinct from previous studies, our approach uses large-scale and fine-grained longitudinal data (2016 to 2022) to jointly model the roles of sociodemographic makeup, experience of extreme weather events, exposure to climate-related news, and social influence through online interactions. We find that among users interested in climate change, participation in online activist communities is indeed influenced by direct interactions with activists and largely by recent exposure to media coverage of climate protests. Among people aware of climate change, left-leaning people from lower socioeconomic backgrounds are particularly represented in online activist groups. Our findings offer empirical validation for theories of media influence and critical mass, and lay the foundations to inform interventions and future studies to foster public participation in collective action.

Download

Figure 1: Two multigraphs with the same degree sequence and JCM.
Figure 7: The graphs í µí°º ′ (upper) and í µí°º ′′ (lower) used in the counterexample.
Dataset characteristics: number of vertices, number of edges, average and median degree, and average and median color frequency.
Polaris: Sampling from the Multigraph Configuration Model with Prescribed Color Assortativity

September 2024

·

8 Reads

We introduce Polaris, a network null model for colored multi-graphs that preserves the Joint Color Matrix. Polaris is specifically designed for studying network polarization, where vertices belong to a side in a debate or a partisan group, represented by a vertex color, and relations have different strengths, represented by an integer-valued edge multiplicity. The key feature of Polaris is preserving the Joint Color Matrix (JCM) of the multigraph, which specifies the number of edges connecting vertices of any two given colors. The JCM is the basic property that determines color assortativity, a fundamental aspect in studying homophily and segregation in polarized networks. By using Polaris, network scientists can test whether a phenomenon is entirely explained by the JCM of the observed network or whether other phenomena might be at play. Technically, our null model is an extension of the configuration model: an ensemble of colored multigraphs characterized by the same degree sequence and the same JCM. To sample from this ensemble, we develop a suite of Markov Chain Monte Carlo algorithms, collectively named Polaris-*. It includes Polaris-B, an adaptation of a generic Metropolis-Hastings algorithm, and Polaris-C, a faster, specialized algorithm with higher acceptance probabilities. This new null model and the associated algorithms provide a more nuanced toolset for examining polarization in social networks, thus enabling statistically sound conclusions.


Moral Judgments in Online Discourse are not Biased by Gender

August 2024

·

44 Reads

The interaction between social norms and gender roles prescribes gender-specific behaviors that influence moral judgments. Here, we study how moral judgments are biased by the gender of the protagonist of a story. Using data from r/AITA, a Reddit community with 17 million members who share first-hand experiences seeking community judgment on their behavior, we employ machine learning techniques to match stories describing similar situations that differ only by the protagonist's gender. We find no direct causal effect of the protagonist's gender on the received moral judgments, except for stories about ``friendship and relationships'', where male protagonists receive more negative judgments. Our findings complement existing correlational studies and suggest that gender roles may exert greater influence in specific social contexts. These results have implications for understanding sociological constructs and highlight potential biases in data used to train large language models.


FIG. 1. Construction of directed hypergraph configuration models. (a) A directed hypergraph (top) and its representation as a bipartite graph (bottom). The left vertices (circles) correspond to hypergraph nodes, while the right vertices (hexagons) correspond to hyperedges. Dotted lines in the directed hypergraph separate the head and tail of each hyperedge, with arrows pointing toward the tail. (b) The characteristics of the observed hypergraph preserved by DHDM and DHJM: left and right in-and out-degree sequences (top) and JOINT (joint out-in-degree tensor) (bottom). The right in-degree sequence corresponds to the head-size sequence, while the right outdegree sequence corresponds to the tail-size sequence.
FIG. 3. Density of infected nodes in contact networks. We show the values of ρ Ã in the stationary state of contagion dynamics on the observed hypergraph, and on 33 samples generated by NUDHY-DEGS and NUDHY-JOINT, varying infection rate λ and nonlinearity parameter ν, for LYON, HIGH, EMAIL-EU, and EMAIL-ENRON. We report also the output of the AMEs as defined in Ref. [67]. The infection rate is rescaled with the invasion threshold λ c . Errors bars correspond to 1 standard deviation.
FIG. 5. (a) Bipartite graphs obtained from Fig. 1(a) after the application of the PSO (1, green hexagon, þ1), (6, orange hexagon, þ1) ⟶ PSO (1, orange hexagon, þ1), (6, green hexagon, þ1), and of the RPSO (2, green hexagon, −1), (5, orange hexagon, −1), ⟶ RPSO (2, orange hexagon, −1), (5, green hexagon, −1). The edges involved in the operations are highlighted in red. Left nodes with the same inand out-degree are outlined with the same color. Right nodes with the same in-and out-degree are outlined with the same pattern. (b) Changes in the neighborhood of a left node after the application of a sequence of PSOs and of RPSOs. PSOs preserve the number of ingoing and outgoing edges of each node. RPSOs preserve also the in-and out-degree of the nodes connected to each node.
Higher-Order Null Models as a Lens for Social Systems

August 2024

·

42 Reads

·

1 Citation

Physical Review X

Despite the widespread adoption of higher-order mathematical structures such as hypergraphs, methodological tools for their analysis lag behind those for traditional graphs. This work addresses a critical gap in this context by proposing two microcanonical random null models for directed hypergraphs: the directed hypergraph degree model () and the directed hypergraph JOINT model (). These models preserve essential structural properties of directed hypergraphs such as node in- and out-degree sequences and hyperedge head- and tail-size sequences, or their joint tensor. We also describe two efficient Markov chain Monte Carlo algorithms, - and -, to sample random hypergraphs from these ensembles. To showcase the interdisciplinary applicability of the proposed null models, we present three distinct use cases in sociology, epidemiology, and economics. First, we reveal the oscillatory behavior of increased homophily in opposition parties in the U.S. Congress over a 40-year span, emphasizing the role of higher-order structures in quantifying political group homophily. Second, we investigate a nonlinear contagion in contact hypernetworks, demonstrating that disparities between simulations and theoretical predictions can be explained by considering higher-order joint degree distributions. Last, we examine the economic complexity of countries in the global trade network, showing that local network properties preserved by explain the main structural economic complexity indexes. This work advances the development of null models for directed hypergraphs, addressing the intricate challenges posed by their complex entity relations, and providing a versatile suite of tools for researchers across various domains. Published by the American Physical Society 2024


Conspiracy theories and where to find them on TikTok

July 2024

·

101 Reads

TikTok has skyrocketed in popularity over recent years, especially among younger audiences, thanks to its viral trends and social challenges. However, concerns have been raised about the potential of this platform to promote and amplify online harmful and dangerous content. Leveraging the official TikTok Research API and collecting a longitudinal dataset of 1.5M videos shared in the US over a period of 3 years, our study analyzes the presence of videos promoting conspiracy theories, providing a lower-bound estimate of their prevalence (approximately 0.1% of all videos) and assessing the effects of the new Creator Program, which provides new ways for creators to monetize, on the supply of conspiratorial content. We evaluate the capabilities of state-of-the-art open Large Language Models to identify conspiracy theories after extracting audio transcriptions of videos, finding that they can detect harmful content with high precision but with overall performance comparable to fine-tuned traditional language models such as RoBERTa. Our findings are instrumental for content moderation strategies that aim to understand and mitigate the spread of harmful content on rapidly evolving social media platforms like TikTok.


Fig. 1. Top-10 targeting criteria by total spending.
Fig. 2. a) Distributions of impressions-per-EUR across the ads of each party. The cross indicates the mean of the distribution. b) Difference between average impressions-per-EUR of a party and average impressions-per-EUR in the overall sample. c) Distribution of impressions-per-EUR across the ads of each party. The cross indicates the mean of the distribution.
Fig. 3. a) Discrepancy in the age distribution between the actual and target audience (in %). We find that ads by most parties (except AfD) are seen by more users between 25 and 34 than originally intended. b) Comparison of actual and target audience by age for political ads published by AfD. A red color (i.e. proprotion of target audience is larger than actual audience) indicates areas where the difference between the actual and targeted audience is negative (green for positive, i.e. proportion of target audience smaller than actual audience). Younger and older users see the ads less often than originally intended by the party. c) Discrepancy in the gender distribution between the actual and target audience (in %). We find large differences between male and female audiences for right-wing parties (e.g. Union, AfD), implying that ads are seen by considerably fewer females than originally intended due to the algorithmic ad delivery.
Fig. 5. Average difference between actual vs. predicted impressionsper-EUR based on our machine learning model over 10 runs.
Systematic discrepancies in the delivery of political ads on Facebook and Instagram

June 2024

·

43 Reads

·

4 Citations

PNAS Nexus

Political advertising on social media has become a central element in election campaigns. However, granular information about political advertising on social media was previously unavailable, thus raising concerns regarding fairness, accountability, and transparency in the electoral process. In this paper, we analyze targeted political advertising on social media via a unique, large-scale dataset of over 80000 political ads from Meta during the 2021 German federal election, with more than 1.1 billion impressions. For each political ad, our dataset records granular information about targeting strategies, spending, and actual impressions. We then study (i) the prevalence of targeted ads across the political spectrum; (ii) the discrepancies between targeted and actual audiences due to algorithmic ad delivery; and (iii) which targeting strategies on social media attain a wide reach at low cost. We find that targeted ads are prevalent across the entire political spectrum. Moreover, there are considerable discrepancies between targeted and actual audiences, and systematic differences in the reach of political ads (in impressions-per-EUR) among parties, where the algorithm favor ads from populists over others.




Impossibility result for Markov chain Monte Carlo sampling from microcanonical bipartite graph ensembles

May 2024

·

5 Reads

·

1 Citation

PHYSICAL REVIEW E

Markov Chain Monte Carlo (MCMC) algorithms are commonly used to sample from graph ensembles. Two graphs are neighbors in the state space if one can be obtained from the other with only a few modifications, e.g., edge rewirings. For many common ensembles, e.g., those preserving the degree sequences of bipartite graphs, rewiring operations involving two edges are sufficient to create a fully connected state space, and they can be performed efficiently. We show that, for ensembles of bipartite graphs with fixed degree sequences and number of butterflies (k2,2 bicliques), there is no universal constant c such that a rewiring of at most c edges at every step is sufficient for any such ensemble to be fully connected. Our proof relies on an explicit construction of a family of pairs of graphs with the same degree sequences and number of butterflies, with each pair indexed by a natural c, and such that any sequence of rewiring operations transforming one graph into the other must include at least one rewiring operation involving at least c edges. Whether rewiring this many edges is sufficient to guarantee the full connectivity of the state space of any such ensemble remains an open question. Our result implies the impossibility of developing efficient, graph-agnostic, MCMC algorithms for these ensembles, as the necessity to rewire an impractically large number of edges may hinder taking a step on the state space.


Hyper-distance oracles in hypergraphs

The VLDB Journal

We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: The line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding the materialization of the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this latter task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyperedges in the context of protein-protein interactions.


Citations (64)


... In this sense, directed hypergraphs enhance modeling by distinguishing between source and target sets in each hyperedge [34]. Tools to study directed hypergraphs are largely underdeveloped, with notable exceptions in areas such as null models [35], synchronization [36], overlapping patterns between two hyperedges of limited size [37], and some early proposals to define reciprocity [38,39]. ...

Reference:

The microscale organization of directed hypergraphs
Higher-Order Null Models as a Lens for Social Systems

Physical Review X

... Facebook in particular encourages greater reliance on its algorithmic tools, such as Lookalike Audiences, MetaAdvantage+ audience when removing the targeting abilities [23,81,85,94]. As we discussed in Section 2.2, algorithmic identification of the right audience and machine learning driven ad delivery optimization, are processes that can (and do) lead to echo chambers in political ad delivery [3,18] and discrimination in opportunity advertising [2,55,56,59,110], even when the advertiser targets a politically or racially balanced audience. Furthermore, recent work shows that for ads with greater reliance on algorithmic targeting tools, the user-facing explanations and controls are inaccurate and ineffective [21]. ...

Systematic discrepancies in the delivery of political ads on Facebook and Instagram

PNAS Nexus

... Many such constrained swap algorithms have been shown to be disconnected under double-edge swaps or have unknown connectivity [18,31,51]. We believe that the the use of k-edge swaps, while less efficient, can still bring answers about the viability of sampling from the graph models. ...

Impossibility result for Markov chain Monte Carlo sampling from microcanonical bipartite graph ensembles
  • Citing Article
  • May 2024

PHYSICAL REVIEW E

... Recent literature on Probabilistic Generative Agent-Based Models (PGABMS) proposed the use of Bayesian Networks to leverage observed longitudinal data for learning the causal mechanisms to be encoded into those models [19], including for example the backfire effect [20,27]. Similarly, probabilistic generative frameworks have been used to model community engagement to study dynamics such as social influence, the echo chamber effect, or political ideology [26,28]. ...

Likelihood-Based Methods Improve Parameter Estimation in Opinion Dynamics Models
  • Citing Conference Paper
  • March 2024

... Two further studies published in 2023 similarly drew upon MFT to examine COVID-19 masking and other prophylactic behaviors. Mejova, et al. conducted a large study of Twitter messages to examine individual responses to government mask mandates in the United States during 2020 [49]. The study confirmed that anti-mask attitudes were associated most with people of conservative leanings. ...

Authority without Care: Moral Values behind the Mask Mandate Response
  • Citing Article
  • June 2023

Proceedings of the International AAAI Conference on Web and Social Media

... Nevertheless, the data from social media is invaluable for research. The information gathered from social media platforms is timely and relevant for studying substance use characteristics longitudinally [17][18][19]. As a result, the data obtained from social media offers a unique opportunity to observe substance use behaviors within a broad population, enabling a deeper comprehension of the underlying patterns and factors influencing substance use behaviors. ...

The Pursuit of Peer Support for Opioid Use Recovery on Reddit
  • Citing Article
  • June 2023

Proceedings of the International AAAI Conference on Web and Social Media

Duilio Balsamo

·

·

Gianmarco De Francisci Morales

·

[...]

·

... Many studies has also analyzed discussions in Reddit communities to understand how interactions among participants influence behavior. For example, Petruzzellis et al. exploited the r/ChangeMyView subreddit to analyze changes in online information consumption behavior arising after opinion changes [13]. In [14], Cauteruccio et al. investigated the emotional experiences in eSports spectatorship using the r/leagueoflegends subreddit: they show that spectators supporting the same team tend to engage in cohesive discussions, while interactions among those supporting different teams are less salient. ...

On the Relation between Opinion Change and Information Consumption on Reddit
  • Citing Article
  • June 2023

Proceedings of the International AAAI Conference on Web and Social Media

... Time-series analysis or state-transition models are also practical methods for analyzing dynamic characteristics from data. For example, Monti et al. (2023) employed time series analysis to improve agent income estimation in their agent-based model of the housing market. ...

On learning agent-based models from data

... A combination of public pressure (3) and regulatory efforts (e.g. the Honest Ads Act in the United States (29) and the Digital Services Act in the E.U. (30)) have pushed social media platforms to strengthen their transparency efforts around political advertising. Indeed, Meta has launched the Meta Ad Library, which provides public access to all political and social ads published on Facebook and Instagram, and allows researchers to study political advertising at scale (31)(32)(33)(34)(35) (see Supplementary Material S1 for a comprehensive overview of the literature). However, existing analyses had only limited access to political ads, since crucial information about targeting was missing. ...

The Thin Ideology of Populist Advertising on Facebook during the 2019 EU Elections
  • Citing Conference Paper
  • April 2023