Petra Kralj Novak

Petra Kralj Novak
Jožef Stefan Institute | IJS · Department of Knowledge Tecknologies

Ph.D.

About

105
Publications
30,436
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,035
Citations

Publications

Publications (105)
Conference Paper
Full-text available
As social media usage increases, so does the volume of toxic content on these platforms, motivating the Machine Learning (ML) community to focus on automating hate speech detection. While modern ML algorithms are known to provide nearly human-like results for a variety of downstream Natural Language Processing (NLP) tasks, the classification of hat...
Article
Full-text available
We discuss the added value of various approaches for identifying similarities in social network communities based on the content they produce. We show the limitations of observing communities using topology-only and illustrate the benefits and complementarity of including supplementary data when analyzing social networks. As a case study, we analyz...
Preprint
Full-text available
We propose an approach for comparing communities in social networks based on used hashtags, shared news sources, and discussed topics and their sentiment, both derived through transformer-based language modeling. We apply this multifaceted comparison to study the reactions of the Ex-Yugoslavian retweet communities to the Russian invasion of Ukraine...
Conference Paper
Full-text available
The Russian invasion of Ukraine marks a dramatic change in international relations globally, as well as in specific, already unstable, regions. The geographical area of interest in this paper is a part of ex-Yugoslavia where the BCMS (Bosnian, Croatian, Montenegrin, Serbian) languages are spoken, official varieties of a pluricentric Serbo-Croatian...
Article
Full-text available
The quality of annotations in manually annotated hate speech datasets is crucial for automatic hate speech detection. This contribution focuses on the positive effects of manually annotating online comments for hate speech within the context in which the comments occur. We quantify the impact of context availability by meticulously designing an exp...
Chapter
Full-text available
Hate speech annotation for training machine learning models is an inherently ambiguous and subjective task. In this paper, we adopt a perspectivist approach to data annotation, model training and evaluation for hate speech classification. We first focus on the annotation process and argue that it drastically influences the final data quality. We th...
Article
Full-text available
We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution t...
Article
Full-text available
Twitter data exhibits several dimensions worth exploring: a network dimension in the form of links between the users, textual content of the tweets posted, and a temporal dimension as the time-stamped sequence of tweets and their retweets. In the paper, we combine analyses along all three dimensions: temporal evolution of retweet networks and commu...
Article
Full-text available
Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos th...
Article
Full-text available
Communities in social networks often reflect close social ties between their members and their evolution through time. We propose an approach that tracks two aspects of community evolution in retweet networks: flow of the members in, out and between the communities, and their influence. We start with high resolution time windows, and then select se...
Preprint
Full-text available
We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution t...
Preprint
Full-text available
Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos th...
Preprint
Full-text available
Communities in social networks often reflect close social ties between their members and their evolution through time. We propose an approach that tracks two aspects of community evolution in retweet networks: flow of the members in, out and between the communities, and their influence. We start with high resolution time windows, and then select se...
Conference Paper
Full-text available
We analyze the evolution of Twitter activities in Slovenia in recent years. We construct networks, with Twitter users as nodes, and retweet relations as edges. We detect communities and influential users in them, and track how they evolve during times of political changes and start of the Covid-19 pandemic. We observe the following: Most of the inf...
Conference Paper
Full-text available
As the popularity of social media has been growing steadily since the beginning of their era, the use of data from these platforms to analyze social phenomena is becoming more and more reliable. In this paper, we use tweets posted over a period of two years (2018-2020) to analyze the socio-political environment in Slovenia. We use network analysis...
Article
Full-text available
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Article
Full-text available
Recent studies have shown that online users tend to select information that adheres to their system of beliefs, ignore information that does not, and join groups that share a common narrative. This information environment can elicit tribalism instead of informed debate, especially when issues are controversial. Algorithmic solutions, fact-checking...
Preprint
The massive diffusion of social media fosters disintermediation and changes the way users are informed, the way they process reality, and the way they engage in public debate. The cognitive layer of users and the related social dynamics define the nature and the dimension of informational threats. Users show the tendency to interact with informatio...
Article
Full-text available
According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society. The disintermediated paradigm of content production and consumption on online social media might foster the formation of homogeneous communities (echo-chambers) around specific worldviews. Such a scenario...
Conference Paper
We address the question of how can publicly accessible information be used to make a map of the political actors and their leanings, that would benefit both policy makers and stakeholders in the European Commission’s ‘Better regulation agenda’ and contribute to social stability. We explore this possibility by using data from the Transparency Regist...
Conference Paper
Whilst impact investing has recently exhibited exceptionally high growth rates, creating an interconnected and functioning market remains an open challenge. Social media play an increasingly important role in understanding communication and relations between different players in the market. This is the first time that network, content, and sentimen...
Article
Full-text available
The 2008 financial crisis unveiled the intrinsic failures of the financial system as we know it. As a consequence, impact investing started to receive increasing attention, as evidenced by the high market growth rates. The goal of impact investment is to generate social and environmental impact alongside a financial return. In this paper we identif...
Article
Full-text available
Currency trading (Forex) is the largest world market in terms of volume. We analyze trading and tweeting about the EUR-USD currency pair over a period of three years. First, a large number of tweets were manually labeled, and a Twitter stance classification model is constructed. The model then classifies all the tweets by the trading stance signal:...
Article
Full-text available
Social media are an important source of information about the political issues, reflecting, as well as influencing, public mood. We present an analysis of Twitter data, collected over 6 weeks before the Brexit referendum, held in the UK in June 2016. We address two questions: what is the relation between the Twitter mood and the referendum outcome,...
Article
Full-text available
According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society. The disintermediated paradigm of content production and consumption on online social media might foster the formation of homophile communities (echo-chambers) around specific worldviews. Such a scenario...
Article
Full-text available
There is a new generation of emoticons, called emojis, increasingly used in mobile communications and social media. In the last two years, over ten billion of emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to a small number of well-known emoticons which carry clear emo...
Article
Unstructured data, such as news and blogs, can provide valuable insights into the financial world. We present the NewsStream portal, an intuitive and easy-to-use tool for news analytics, which supports interactive querying and visualizations of the documents at different levels of detail. It relies on a scalable architecture for real-time processin...
Article
Full-text available
According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society. The disintermediated paradigm of content production and consumption on online social media might foster the formation of homophile communities (echo-chambers) around specific worldviews. Such a scenario h...
Article
Full-text available
Background With the increasing pace of new Genetically Modified Organisms (GMOs) authorized or in pipeline for commercialization worldwide, the task of the laboratories in charge to test the compliance of food, feed or seed samples with their relevant regulations became difficult and costly. Many of them have already adopted the so called "matrix a...
Article
Full-text available
A stream of unstructured news can be a valuable source of hidden relations between different entities, such as financial institutions, countries, or persons. We present an approach to continuously collect online news, recognize relevant entities in them, and extract time-varying networks. The nodes of the network are the entities, and the links are...
Article
Full-text available
Motivated by recent financial crises, significant research efforts have been put into studying contagion effects and herding behaviour in financial markets. Much less has been said regarding the influence of financial news on financial markets. We propose a novel measure of collective behaviour based on financial news on the Web, the News Cohesiven...
Conference Paper
Subgroup discovery aims at constructing symbolic rules that describe statistically interesting subsets of instances with a chosen property of interest. Semantic subgroup discovery extends standard subgroup discovery approaches by exploiting ontological concepts in rule construction. Compared to previously developed semantic data mining systems SDM-...
Conference Paper
Full-text available
The article presents an approach to computational knowledge discovery through the mechanism of bisociation. Bisociative reasoning is at the heart of creative, accidental discovery (e.g., serendipity), and is focused on finding unexpected links by crossing contexts. Contextualization and linking between highly diverse and distributed data and knowle...
Article
Full-text available
In experimental data analysis, bioinformatics researchers increasingly rely on tools that enable the composition and reuse of scientific workflows. The utility of current bioinformatics workflow environments can be significantly increased by offering advanced data mining services as workflow components. Such services can support, for instance, know...
Conference Paper
Full-text available
With the expanding of the Semantic Web and the availability of numerous ontologies which provide domain background knowledge and semantic descriptors to the data, the amount of semantic data is rapidly growing. The data mining community is faced with a paradigm shift: instead of mining the abundance of empirical data supported by the background kno...
Article
Full-text available
The paper presents an approach to computational knowledge discovery through the mechanism of bisociation. Bisociative reasoning is at the heart of creative, accidental discovery (e.g., serendipity), and is focused on finding unexpected links by crossing contexts. Contextu- alization and linking between highly diverse and distributed data and knowle...
Article
Full-text available
This article presents an approach to microarray data analysis using discretised expression values in combination with a methodology of closed item set mining for class labeled data (RelSets). A statistical 2 x 2 factorial design analysis was run in parallel. The approach was validated on two independent sets of two-color microarray experiments usin...