Petra Kralj NovakJožef Stefan Institute | IJS · Department of Knowledge Tecknologies
Petra Kralj Novak
Ph.D.
About
105
Publications
30,436
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,035
Citations
Publications
Publications (105)
As social media usage increases, so does the volume of toxic content on these platforms, motivating the Machine Learning (ML) community to focus on automating hate speech detection. While modern ML algorithms are known to provide nearly human-like results for a variety of downstream Natural Language Processing (NLP) tasks, the classification of hat...
We discuss the added value of various approaches for identifying similarities in social network communities based on the content they produce. We show the limitations of observing communities using topology-only and illustrate the benefits and complementarity of including supplementary data when analyzing social networks. As a case study, we analyz...
We propose an approach for comparing communities in social networks based on used hashtags, shared news sources, and discussed topics and their sentiment, both derived through transformer-based language modeling. We apply this multifaceted comparison to study the reactions of the Ex-Yugoslavian retweet communities to the Russian invasion of Ukraine...
The Russian invasion of Ukraine marks a dramatic change in international relations globally, as well as in specific, already unstable, regions. The geographical area of interest in this paper is a part of ex-Yugoslavia where the BCMS (Bosnian, Croatian, Montenegrin, Serbian) languages are spoken, official varieties of a pluricentric Serbo-Croatian...
The quality of annotations in manually annotated hate speech datasets is crucial for automatic hate speech detection. This contribution focuses on the positive effects of manually annotating online comments for hate speech within the context in which the comments occur. We quantify the impact of context availability by meticulously designing an exp...
Hate speech annotation for training machine learning models is an inherently ambiguous and subjective task. In this paper, we adopt a perspectivist approach to data annotation, model training and evaluation for hate speech classification. We first focus on the annotation process and argue that it drastically influences the final data quality. We th...
We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution t...
Twitter data exhibits several dimensions worth exploring: a network dimension in the form of links between the users, textual content of the tweets posted, and a temporal dimension as the time-stamped sequence of tweets and their retweets. In the paper, we combine analyses along all three dimensions: temporal evolution of retweet networks and commu...
Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos th...
Communities in social networks often reflect close social ties between their members and their evolution through time. We propose an approach that tracks two aspects of community evolution in retweet networks: flow of the members in, out and between the communities, and their influence. We start with high resolution time windows, and then select se...
We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution t...
Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos th...
Communities in social networks often reflect close social ties between their members and their evolution through time. We propose an approach that tracks two aspects of community evolution in retweet networks: flow of the members in, out and between the communities, and their influence. We start with high resolution time windows, and then select se...
We analyze the evolution of Twitter activities in Slovenia in recent years. We construct networks, with Twitter users as nodes, and retweet relations as edges. We detect communities and influential users in them, and track how they evolve during times of political changes and start of the Covid-19 pandemic. We observe the following: Most of the inf...
As the popularity of social media has been growing steadily since the beginning of their era, the use of data from these platforms to analyze social phenomena is becoming more and more reliable. In this paper, we use tweets posted over a period of two years (2018-2020) to analyze the socio-political environment in Slovenia. We use network analysis...
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Recent studies have shown that online users tend to select information that adheres to their system of beliefs, ignore information that does not, and join groups that share a common narrative. This information environment can elicit tribalism instead of informed debate, especially when issues are controversial. Algorithmic solutions, fact-checking...
The massive diffusion of social media fosters disintermediation and changes the way users are informed, the way they process reality, and the way they engage in public debate. The cognitive layer of users and the related social dynamics define the nature and the dimension of informational threats. Users show the tendency to interact with informatio...
According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society. The disintermediated paradigm of content production and consumption on online social media might foster the formation of homogeneous communities (echo-chambers) around specific worldviews. Such a scenario...
We address the question of how can publicly accessible information be used to make a map of the political actors and their leanings, that would benefit both policy makers and stakeholders in the European Commission’s ‘Better regulation agenda’ and contribute to social stability. We explore this possibility by using data from the Transparency Regist...
Whilst impact investing has recently exhibited exceptionally high growth rates, creating an interconnected and functioning market remains an open challenge. Social media play an increasingly important role in understanding communication and relations between different players in the market. This is the first time that network, content, and sentimen...
The 2008 financial crisis unveiled the intrinsic failures of the financial system as we know it. As a consequence, impact investing started to receive increasing attention, as evidenced by the high market growth rates. The goal of impact investment is to generate social and environmental impact alongside a financial return. In this paper we identif...
Currency trading (Forex) is the largest world market in terms of volume. We analyze trading and tweeting about the EUR-USD currency pair over a period of three years. First, a large number of tweets were manually labeled, and a Twitter stance classification model is constructed. The model then classifies all the tweets by the trading stance signal:...
Social media are an important source of information about the political issues, reflecting, as well as influencing, public mood. We present an analysis of Twitter data, collected over 6 weeks before the Brexit referendum, held in the UK in June 2016. We address two questions: what is the relation between the Twitter mood and the referendum outcome,...
According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society.
The disintermediated paradigm of content production and consumption on online social media might foster the formation of homophile communities (echo-chambers) around specific worldviews. Such a scenario...
There is a new generation of emoticons, called emojis, increasingly used in
mobile communications and social media. In the last two years, over ten billion
of emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a
shorthand to express concepts and ideas. In contrast to a small number of
well-known emoticons which carry clear emo...
Unstructured data, such as news and blogs, can provide valuable insights into
the financial world. We present the NewsStream portal, an intuitive and
easy-to-use tool for news analytics, which supports interactive querying and
visualizations of the documents at different levels of detail. It relies on a
scalable architecture for real-time processin...
According to the World Economic Forum, the diffusion of unsubstantiated rumors on online social media is one of the main threats for our society. The disintermediated paradigm of content production and consumption on online social media might foster the formation of homophile communities (echo-chambers) around specific worldviews. Such a scenario h...
Background
With the increasing pace of new Genetically Modified Organisms (GMOs) authorized or in pipeline for commercialization worldwide, the task of the laboratories in charge to test the compliance of food, feed or seed samples with their relevant regulations became difficult and costly. Many of them have already adopted the so called "matrix a...
A stream of unstructured news can be a valuable source of hidden relations
between different entities, such as financial institutions, countries, or
persons. We present an approach to continuously collect online news, recognize
relevant entities in them, and extract time-varying networks. The nodes of the
network are the entities, and the links are...
Motivated by recent financial crises, significant research efforts have been put into studying contagion effects and herding behaviour in financial markets. Much less has been said regarding the influence of financial news on financial markets. We propose a novel measure of collective behaviour based on financial news on the Web, the News Cohesiven...
Subgroup discovery aims at constructing symbolic rules that describe statistically interesting subsets of instances with a chosen property of interest. Semantic subgroup discovery extends standard subgroup discovery approaches by exploiting ontological concepts in rule construction. Compared to previously developed semantic data mining systems SDM-...
The article presents an approach to computational knowledge discovery through the mechanism of bisociation. Bisociative reasoning is at the heart of creative, accidental discovery (e.g., serendipity), and is focused on finding unexpected links by crossing contexts. Contextualization and linking between highly diverse and distributed data and knowle...
In experimental data analysis, bioinformatics researchers increasingly rely on tools that enable the composition and reuse of scientific workflows. The utility of current bioinformatics workflow environments can be significantly increased by offering advanced data mining services as workflow components. Such services can support, for instance, know...
With the expanding of the Semantic Web and the availability of numerous ontologies which provide domain background knowledge
and semantic descriptors to the data, the amount of semantic data is rapidly growing. The data mining community is faced with
a paradigm shift: instead of mining the abundance of empirical data supported by the background kno...
The paper presents an approach to computational knowledge discovery through the mechanism of bisociation. Bisociative reasoning is at the heart of creative, accidental discovery (e.g., serendipity), and is focused on finding unexpected links by crossing contexts. Contextu- alization and linking between highly diverse and distributed data and knowle...
This article presents an approach to microarray data analysis using discretised expression values in combination with a methodology of closed item set mining for class labeled data (RelSets). A statistical 2 x 2 factorial design analysis was run in parallel. The approach was validated on two independent sets of two-color microarray experiments usin...