Marian-Andrei Rizoiu

Marian-Andrei Rizoiu
University of Technology Sydney | UTS · Faculty of Engineering and Information Technology

PhD

About

116
Publications
32,228
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,602
Citations
Introduction
I am an Associate Professor leading the Behavioral Data Science lab at the University of Technology Sydney. My research blends computer science, psycholinguistics, digital communication and stochastic modelling to understand human attention dynamics in the online environment, the emergence of influence and opinion polarization. I am the recipient of the prestigious Excellence Award and Academic of the Year at the 2023 Australian Defence Industry Awards.
Additional affiliations
May 2014 - March 2016
National ICT Australia Ltd
Position
  • Researcher
September 2013 - May 2014
Lumière University Lyon 2
Position
  • PostDoc Position
October 2009 - June 2013
Lumière University Lyon 2
Position
  • PhD Student

Publications

Publications (116)
Preprint
Full-text available
Misinformation is often viewed as an information problem, attributed to people’s poor understanding of consensus-based facts. While this is a contributing factor, we propose that misinformation and conspiracy are simply more engaging explanations for our world. Can we combat misinformation by making facts fun? We present the findings of a controlle...
Preprint
Full-text available
The spread of content on social media is shaped by intertwining factors on three levels: the source, the content itself, and the pathways of content spread. At the lowest level, the popularity of the sharing user determines its eventual reach. However, higher-level factors such as the nature of the online item and the credibility of its source also...
Article
Online extremism has severe societal consequences, including normalizing hate speech, user radicalization, and increased social divisions. Various mitigation strategies have been explored to address these consequences. One such strategy uses positive interventions: controlled signals that add attention to the opinion ecosystem to boost certain opin...
Article
In this paper, we ask how effective Meta's content moderation strategy was on its flagship platform, Facebook, during the COVID-19 pandemic. We analyse the performance of 18 Australian right-wing/anti-vaccination pages, posts and commenting sections collected between January 2019 and July 2021, and use engagement metrics and time series analysis to...
Article
In response to the rise of various fringe movements in recent years, from anti-vaxxers to QAnon, there has been increased public and scholarly attention to misinformation and conspiracy theories and the online communities that produce them. However, efforts at understanding the radicalisation process largely focus on those who go on to commit viole...
Article
The multivariate Hawkes process (MHP) is widely used for analyzing data streams that interact with each other, where events generate new events within their own dimension (via self-excitation) or across different dimensions (via cross excitation). However, in certain applications, the timestamps of individual events in some dimensions are unobserva...
Article
Full-text available
Today, the internet is an integral part of our daily lives, enabling people to be more connected than ever before. However, this greater connectivity and access to information increase exposure to harmful content, such as cyber-bullying and cyber-hatred. Models based on machine learning and natural language offer a way to make online platforms safe...
Article
Full-text available
Startup companies solve many of today’s most challenging problems, such as the decarbonisation of the economy or the development of novel life-saving vaccines. Startups are a vital source of innovation, yet the most innovative are also the least likely to survive. The probability of success of startups has been shown to relate to several firm-level...
Article
Full-text available
In 2022, the European Union introduced the Digital Services Act (DSA), a new legislation to report and moderate harmful content from online social networks. Trusted flaggers are mandated to identify harmful content, which platforms must remove within a set delay (currently 24 h). Here, we analyze the likely effectiveness of EU-mandated mechanisms f...
Preprint
Full-text available
Biomedical summarization requires large datasets to train for text generation. We show that while transfer learning offers a viable option for addressing this challenge, an in-domain pre-training does not always offer advantages in a BioASQ summarization task. We identify a suitable model architecture and use it to show a benefit of a general-domai...
Preprint
Full-text available
Understanding the relationship between emerging technology and research and development has long been of interest to companies, policy makers and researchers. In this paper new sources of data and tools are combined with a novel technique to construct a model linking a defined set of emerging technologies with the global leading R&D spending compan...
Preprint
Full-text available
Startup companies solve many of today’s most complex and challenging scientific, technical and social problems, such as the decarbonisation of the economy, air pollution, and the development of novel life-saving vaccines. Startups are a vital source of social, scientific and economic innovation, yet the most innovative are also the least likely to...
Preprint
Full-text available
Startup companies solve many of today's most complex and challenging scientific, technical and social problems, such as the decarbonisation of the economy, air pollution, and the development of novel life-saving vaccines. Startups are a vital source of social, scientific and economic innovation, yet the most innovative are also the least likely to...
Article
Full-text available
Mobile phones contain a wealth of private information, so we try to keep them secure. We provide large-scale evidence that the psychological profiles of individuals and their relations with their peers can be predicted from seemingly anonymous communication traces – calling and texting logs that service providers routinely collect. Based on two ext...
Preprint
Full-text available
Social media is being increasingly weaponized by state-backed actors to elicit reactions, push narratives and sway public opinion. These are known as Information Operations (IO). The covert nature of IO makes their detection difficult. This is further amplified by missing data due to the user and content removal and privacy requirements. This work...
Chapter
The rapid advances in automation technologies, such as artificial intelligence (AI) and robotics, pose an increasing risk of automation for occupations, with a likely significant impact on the labour market. Recent social-economic studies suggest that nearly 50% of occupations are at high risk of being automated in the next decade. However, the lac...
Preprint
Full-text available
The rapid advances in automation technologies, such as artificial intelligence (AI) and robotics, pose an increasing risk of automation for occupations, with a likely significant impact on the labour market. Recent social-economic studies suggest that nearly 50\% of occupations are at high risk of being automated in the next decade. However, the la...
Technical Report
Full-text available
The motorways along the Sydney region are equipped with traffic count detectors, which record the number of vehicle passing by in a dedicated time slot. The data collected from these sensors will be referenced in the paper as 'motorway traffic flow'. One important problem is that based on the collected data of motorway traffic flow, how to predict...
Preprint
Full-text available
Automatic identification of hateful and abusive content is vital in combating the spread of harmful online content and its damaging effects. Most existing works evaluate models by examining the generalization error on train-test splits on hate speech datasets. These datasets often differ in their definitions and labeling criteria, leading to poor m...
Preprint
Full-text available
Recent years have seen the rise of extremist views in the opinion ecosystem we call social media. Allowing online extremism to persist has dire societal consequences, and efforts to mitigate it are continuously explored. Positive interventions, controlled signals that add attention to the opinion ecosystem with the aim of boosting certain opinions,...
Preprint
Full-text available
The political opinion landscape, in a democratic country, lays the foundation for the policies that are enacted, and the political actions of individuals. As such, a reliable measure of ideology is an important first step in a river of downstream problems, such as; understanding polarization, opinion dynamics modeling, and detecting and intervening...
Article
Qualitative research provides methodological guidelines for observing and studying communities and cultures on online social media platforms. However, such methods demand considerable manual effort from researchers and can be overly focused and narrowed to certain online groups. This work proposes a complete solution to accelerate the qualitative a...
Thesis
Full-text available
The spread of disinformation in the 21st century has become of enormous concern for the integrity of democracy, the way we relate to each other online, and in extreme cases, the health and safety of individuals. This project explores how we can utilise information from disinformation campaigns in the past to predict disinformation as it arises into...
Preprint
Full-text available
This work introduces a novel multivariate temporal point process, the Partial Mean Behavior Poisson (PMBP) process, which can be leveraged to fit the multivariate Hawkes process to partially interval-censored data consisting of a mix of event timestamps on a subset of dimensions and interval-censored event counts on the complementary dimensions. Fi...
Thesis
Full-text available
The popularisation of social media has led to widespread occurrences of echo chambers, selective exposure and misinformation. This is particularly concerning with regard to contentious topics, where a lack of interaction with opposing views can lead to complacency or stubbornness. We build on past work in an attempt to determine how exposure to dif...
Thesis
Full-text available
Recently, social media has been blamed for the increasingly polarised nature of political discourse in our society. The ability to measure and combat political polarisation on social media is of significant importance if we wish to prevent polarisation from degrading the functioning of democracy and social cohesion. Stance detection provides a viab...
Preprint
Full-text available
Social influence pervades our everyday lives and lays the foundation for complex social phenomena. In a crisis like the COVID-19 pandemic, social influence can determine whether life-saving information is adopted. Existing literature studying online social influence suffers from several drawbacks. First, a disconnect appears between psychology appr...
Preprint
Full-text available
Qualitative research provides methodological guidelines for observing and studying communities and cultures on online social media platforms. However, such methods demand considerable manual effort from researchers and may be overly focused and narrowed to certain online groups. In this work, we propose a complete solution to accelerate qualitative...
Article
Full-text available
Job security can never be taken for granted, especially in times of rapid, widespread and unexpected social and economic change. These changes can force workers to transition to new jobs. This may be because new technologies emerge or production is moved abroad. Perhaps it is a global crisis, such as COVID-19, which shutters industries and displace...
Article
Full-text available
Ever since the web began, the number of websites has been growing exponentially. These websites cover an ever-increasing range of online services that fill a variety of social and economic functions across a growing range of industries. Yet the networked nature of the web, combined with the economics of preferential attachment, increasing returns a...
Preprint
Full-text available
Hawkes processes are a popular means of modeling the event times of self-exciting phenomena, such as earthquake strikes or tweets on a topical subject. Classically, these models are fit to historical event time data via likelihood maximization. However, in many scenarios, the exact times of historical events are not recorded for either privacy (e.g...
Chapter
Full-text available
This paper studies the dynamics of opinion formation and polarization in social media. We investigate whether users’ stance concerning contentious subjects is influenced by the online discussions they are exposed to and interactions with users supporting different stances. We set up a series of predictive exercises based on machine learning models....
Article
Full-text available
Developing new methods for modelling infectious diseases outbreaks is important for monitoring transmission and developing policy. In this paper we propose using semi-mechanistic Hawkes Processes for modelling malaria transmission in near-elimination settings. Hawkes Processes are well founded mathematical methods that enable us to combine the bene...
Article
Full-text available
In Australia and beyond, journalism is reportedly an industry in crisis, a crisis exacerbated by COVID-19. However, the evidence revealing the crisis is often anecdotal or limited in scope. In this unprecedented longitudinal research, we draw on data from the Australian journalism jobs market from January 2012 until March 2020. Using Data Science a...
Thesis
Full-text available
Abstract With the prosperity of the online labour market, more and more employers are will- ing to post recruitment advertisements on the websites. The demand of the labour market changes in a unpredictable speed and many new skills emerge then quickly reflect on the labour market. For the purpose of extracting the existing skills and even find new...
Preprint
Full-text available
This paper studies the dynamics of opinion formation and polarization in social media. We investigate whether the stance of users with respect to contentious subjects is influenced by the online discussions that they are exposed to, and by the interactions with users supporting different stances. We set up a series of predictive exercises, in which...
Preprint
Full-text available
The impact of online social media on societal events and institutions is profound; and with the rapid increases in user uptake, we are just starting to understand its ramifications. Social scientists and practitioners who model online discourse as a proxy for real-world behavior, often curate large social media datasets. A lack of available tooling...
Preprint
Full-text available
Job security can never be taken for granted, especially in times of rapid, widespread and unexpected social and economic change. These changes can force workers to transition to new jobs. This may be because technologies emerge or production is moved abroad. Perhaps it is a global crisis, such as COVID-19, which shutters industries and displaces la...
Preprint
Full-text available
It is not news that our mobile phones contain a wealth of private information about us, and that is why we try to keep them secure. But even the traces of how we communicate can also tell quite a bit about us. In this work, we start from the calling and texting history of 200 students enrolled in the Netsense study, and we link it to the type of re...
Preprint
Full-text available
In Australia and beyond, journalism is reportedly an industry in crisis, a crisis exacerbated by COVID-19. However, the evidence revealing the crisis is often anecdotal or limited in scope. In this unprecedented longitudinal research, we draw on data from the Australian journalism jobs market from January 2012 until March 2020. Using Data Science a...
Preprint
Full-text available
Developing new methods for modelling infectious diseases outbreaks is important for mon- itoring transmission and developing policy. In this paper we propose using semi-mechanistic Hawkes Processes for modelling malaria transmission in near-elimination settings. Hawkes Processes are mathematical methods that enable us to combine the benefits of bot...
Preprint
Full-text available
Traffic flow prediction, particularly in areas that experience highly dynamic flows such as motorways, is a major issue faced in traffic management. Due to increasingly large volumes of data sets being generated every minute, deep learning methods have been used extensively in the latest years for both short and long term prediction. However, such...
Preprint
Full-text available
Congestion prediction represents a major priority for traffic management centres around the world to ensure timely incident response handling. The increasing amounts of generated traffic data have been used to train machine learning predictors for traffic, however, this is a challenging task due to inter-dependencies of traffic flow both in time an...
Preprint
Full-text available
Modeling online discourse dynamics is a core activity in understanding the spread of information, both offline and online, and emergent online behavior. There is currently a disconnect between the practitioners of online social media analysis - usually social, political and communication scientists - and the accessibility to tools capable of handli...
Article
A comprehensive understanding of data quality is the cornerstone of measurement studies in social media research. This paper presents in-depth measurements on the effects of Twitter data sampling across different timescales and different subjects (entities, networks, and cascades). By constructing complete tweet streams, we show that Twitter rate l...
Article
Full-text available
The Hawkes process (HP) has been widely applied to modeling self-exciting events including neuron spikes, earthquakes and tweets. To avoid designing parametric triggering kernel and to be able to quantify the prediction confidence, the non-parametric Bayesian HP has been proposed. However, the inference of such models suffers from unscalability or...
Preprint
This research develops a Machine Learning approach able to predict labor shortages for occupations. We compile a unique dataset that incorporates both Labor Demand and Labor Supply occupational data in Australia from 2012 to 2018. This includes data from 1.3 million job advertisements (ads) and 20 official labor force measures. We use these data as...
Conference Paper
Full-text available
A comprehensive understanding of data bias is the cornerstone of mitigating biases in social media research. This paper presents in-depth measurements of the effects of Twitter data sampling across different timescales and different subjects (entities, networks, and cascades). By constructing two complete tweet streams, we show that Twitter rate li...
Preprint
Full-text available
A comprehensive understanding of data bias is the cornerstone of mitigating biases in social media research. This paper presents in-depth measurements of the effects of Twitter data sampling across different timescales and different subjects (entities, networks, and cascades). By constructing two complete tweet streams, we show that Twitter rate li...
Preprint
Full-text available
Ever since the web began, the number of websites has been growing exponentially. These websites cover an ever-increasing range of online services that fill a variety of social and economic functions across a growing range of industries. Yet the networked nature of the web, combined with the economics of preferential attachment, increasing returns a...
Preprint
Hawkes processes have been successfully applied to understand online information diffusion and popularity of online items. Most prior work concentrate on individually modeling successful diffusion cascades, while discarding smaller cascades which, however, account for a majority proportion of the available data. In this work, we propose a set of to...
Preprint
In this work, we develop a new approximation method to solve the analytically intractable Bayesian inference for Gaussian process models with factorizable Gaussian likelihoods and single-output latent functions. Our method -- dubbed QP -- is similar to the expectation propagation (EP), however it minimizes the $L^2$ Wasserstein distance instead of...
Article
Full-text available
Work is thought to be more enjoyable and beneficial to individuals and society when there is congruence between one’s personality and one’s occupation. We provide large-scale evidence that occupations have distinctive psychological profiles, which can successfully be predicted from linguistic information unobtrusively collected through social media...
Article
Full-text available
Online videos have shown tremendous increase in Internet traffic. Most video hosting sites implement recommender systems, which connect the videos into a directed network and conceptually act as a source of pathways for users to navigate. At present, little is known about how human attention is allocated over such large-scale networks, and about th...
Preprint
Full-text available
This research develops a data-driven method to generate sets of highly similar skills based on a set of seed skills using online job advertisements (ads) data. This provides researchers with a novel method to adaptively select occupations based on granular skills data. We apply this adaptive skills similarity technique to a dataset of over 6.7 mill...