Theodora TsikrikaInformation Technologies Institute (ITI) | ITI
Theodora Tsikrika
About
127
Publications
15,350
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,286
Citations
Introduction
Skills and Expertise
Publications
Publications (127)
Interpretability is a topic that has been in the spotlight for the past few years. Most existing interpretability techniques produce interpretations in the form of rules or feature importance. These interpretations, while informative, may be harder to understand for non-expert users and therefore, cannot always be considered as adequate explanation...
This work proposes a framework integrating AI technologies and tools to assist micro and small hosting service providers (HSPs) in adhering to the terrorist content online (TCO) Regulation. The framework encompasses: (i) a suite of AI tools for the automated detection and removal of TCO, (ii) a federated learning infrastructure for (re-)training an...
Medical organisations are at great risk of cyberattacks. Considering the complex infrastructure, the integration of susceptible Internet of Medical Things devices and lack of appropriate cybersecurity training of the staff, both technical and medical, the healthcare domain is a common target for malicious actors. Therefore, to safeguard their infra...
In the era of ever-growing online social networking communities, reports of online crimes of various forms and targeting are growing exponentially, highlighting the imperative need for the development and enforcement of solutions and measures aimed at early detection and prevention. Specifically, in today’s landscape, child sexual abuse (CSA) and e...
Artificial intelligence (AI) technology has greatly impacted various aspects of modern society and economy. Recent technological advancements have significantly improved decision-making and support systems, as well as autonomous processes, by utilising different types of data, such as text, visual content, and video footage. To provide accurate out...
Terrorist financing (TF) poses a significant threat to the security and stability of the European Union (EU) and its member states. To combat TF in an efficient and timely manner, advanced technologies are required. The EU-funded Cut the Cord project (CTC) seeks to strengthen the EU’s capability to understand and counter TF by exploring innovative...
Plenty of metrics do exist, aiming to evaluate the maturity of products, systems and processes when identifying their deployment readiness. Even though an excessive effort has been made to develop new frameworks as well as update and integrate existing frameworks, calculation methodologies and indicators, it has a limited usage from the EU-funded s...
PERIVALLON is a Horizon Europe co-funded Innovation Action project, entitled ‘Protecting the European territory from organised environmental crime through intelligent threat detection tools’ that started in December 2022 and focuses on combatting organised environmental crime by: (1) delivering an environmental crime observatory aiming to provide a...
With the constant growth of social media in our daily lives, a huge amount of information is generated online by multiple social networks. However, what can we actually extract with the science of social media sensing? It is a very challenging task to mine meaningful data out of this vast crowdsourcing volume, which also rapidly changes or ends up...
Addressing bias in NLP-based solutions is crucial to promoting fairness, avoiding discrimination, building trust, upholding ethical standards, and ultimately improving their performance and reliability. On the topic of bias detection and mitigation in textual data, this work examines the effect of different bias detection models along with standard...
This is a report on the fourteenth edition of the Conference and Labs of the Evaluation Forum (CLEF 2023), held on September 18--21, 2023, in Thessaloniki, Greece. CLEF was a four-day hybrid event combining a conference and an evaluation forum. The conference featured keynotes by Barbara Plank and Claudia Hauff, and presentation of peer-reviewed re...
This work presents a framework for predicting the likelihood of terrorist incidents within a specific time period in a target country based on machine learning and deep learning models. These models are trained on localised news data sourced from the Global Event, Language, and Tone Database (GDELT) Project for the target country, while the terrori...
This paper proposes a unified framework for the detection of statistically significant changes in time series related to Bitcoin transactions. The time locations of these changes are linked to the occurrences of events which could be further investigated aiming to reveal potential illicit activity. The proposed framework includes: (a) the extractio...
Time series analysis can be an asset in the hands of the authorities, as it can enable the understanding and monitoring of trends of criminal activities. In this work, a variety of methods is exploited to detect significant points of change in crime-related time series that may indicate the occurrence of events that require attention. In particular...
Graph embedding techniques have been introduced in recent years with the aim of mapping graph data into low-dimensional vector spaces, so that conventional machine learning methods can be exploited. In particular, in the DeepWalk model, truncated random walks are employed in random walk-based approaches to capture structural links-connections betwe...
Data Augmentation approaches often use Language Models, pretrained on large quantities of unlabeled generic data, to conditionally generate examples. However, the generated data can be of subpar quality and struggle to maintain the same characteristics as the original dataset. To this end, we propose a Data Augmentation method for low-resource and...
An essential factor toward ensuring the security of individuals and critical infrastructures is the timely detection of potentially threatening situations. To this end, especially in the law enforcement context, the availability of effective and efficient threat assessment mechanisms for identifying and eventually preventing crime‐ and terrorism‐re...
This chapter analyzes the growing threat of behavioral radicalization online by giving an overview of the Internet's role in spreading radical content and the commission of terrorist offences. It identifies four radicalization‐related online activities, namely hate speech, terrorist financing, terrorist incitement, terrorist recruitment and trainin...
The graph embedding process aims to transform nodes and edges into a low dimensional vector space, while preserving the graph structure and topological properties. Random walk based methods are used to capture structural relationships between nodes, by performing truncated random walks. Afterwards, the SkipGram model with the negative sampling appr...
The often observed unavailability of large amounts of training data typically required by deep learning models to perform well in the context of NLP tasks has given rise to the exploration of data augmentation techniques. Originally, such techniques mainly focused on rule-based methods (e.g. random insertion/deletion of words) or synonym replacemen...
Analyzing content generated on social media has proven to be a powerful tool for early detection of crisis-related events. Such an analysis may allow for timely action, mitigating or even preventing altogether the effects of a crisis. However, the high noise levels in short texts present in microblogging platforms, combined with the limited publicl...
Given the increasing occurrence of deviant activities in online platforms, it is of paramount importance to develop methods and tools that allow in-depth analysis and understanding to then develop effective countermeasures. This work proposes a framework towards detecting statistically significant change points in terrorism-related time series, whi...
Nowadays, the visual information captured by CCTV surveillance and body worn cameras is continuously increasing. Such visual information is often used for security purposes, such as the recognition of suspicious activities, including potential crime- and terrorism-related activities and violent behaviours. To this end, specific tools have been deve...
Formulating effective queries for retrieving domain-specific content from the Web and social media is very important for practitioners in several fields, including law enforcement analysts involved in terrorism-related investigations. Query reformulation aims at transforming the original query in such a way, so as to increase the search effectivene...
Web bots vary in sophistication based on their purpose, ranging from simple automated scripts to advanced web bots that have a browser fingerprint, support the main browser functionalities, and exhibit a humanlike behaviour. Advanced web bots are especially appealing to malicious web bot creators, due to their browser-like fingerprint and humanlike...
Renewable energy sources and the increasing interest in green energy has been the driving force behind many innovations in the energy sector, such as how utility companies interact with their customers and vice versa. The introduction of smart grids is one of these innovations in what is basically a fusion between the traditional energy grid with t...
This is a report on the tenth edition of the \textsl{Conference and Labs of the Evaluation Forum} (CLEF 2020), (virtually) held from September 22--25, 2020, in Thessaloniki, Greece.
CLEF was a four day event combining a Conference and an Evaluation Forum.
The Conference featured keynotes by Ellen Voorhees and Yiannis Kompasiaris, and presentation...
This work examines violence detection in video scenes of crowds and proposes a crowd violence detection framework based on a 3D convolutional deep learning architecture, the 3D-ResNet model with 50 layers. The proposed framework is evaluated on the Violent Flows dataset against several state-of-the-art approaches and achieves higher accuracy values...
This work presents the user-centered design and development of a generic and extensible visualization framework that can be re-used in various scenarios in order to communicate large–scale heterogeneous multimedia information obtained from social media and Web sources, through user-friendly interactive visualizations in real-time. Using the particu...
The Web and social media nowadays play an increasingly significant role in spreading terrorism-related propaganda and content. In order to deploy counterterrorism measures, authorities rely on automated systems for analysing text, multimedia, and social media content on the Web. However, since each of these systems is an isolated solution, investig...
Automated programs (bots) are responsible for a large percentage of website traffic. These bots can either be used for benign purposes, such as Web indexing, Website monitoring (validation of hyperlinks and HTML code), feed fetching Web content and data extraction for commercial use or for malicious ones, including, but not limited to, content scra...
This paper describes a new test collection for passage retrieval from health-related Web resources in Spanish. The test collection contains 10,037 health-related documents in Spanish, 37 topics representing complex information needs formulated in a total of 167 natural language questions, and manual relevance assessments of text passages, pooled fr...
Identifying terrorism-related key actors in social media is of vital significance for law enforcement agencies and social media organizations in their effort to counter terrorism-related online activities. This work proposes a novel framework for the identification of key actors in multidimensional social networks formed by considering several diff...
Social media are widely used by terrorist organizations and extremist groups for disseminating propaganda and recruiting new members. Given the recent pledges both by the major social media platforms and governments towards combating online terrorism, our work aims at understanding the terrorism-related content posted on social media and distinguis...
Collaborative filtering recommenders leverage past user-item ratings in order to predict ratings for new items. One of the most critical steps in such methods corresponds to the formation of the neighbourhood that contains the most similar users or items, so that the ratings associated with them can be employed for predicting new ratings. This work...
Heterogeneous sources of information, such as images, videos, text and metadata are often used to describe different or complementary views of the same multimedia object, especially in the online news domain and in large annotated image collections. The retrieval of multimedia objects, given a multimodal query, requires the combination of several s...
Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framework for discovering resources on any given topic that resid...
Social media are widely used among terrorists to communicate and disseminate their activities. User-to-user interaction (e.g. mentions, follows) leads to the formation of complex networks, with topology that reveals key-players and key-communities in the terrorism domain. Both the administrators of social media platforms and Law Enforcement Agencie...
The deliberate misuse of technical infrastructure (including the Web and social media) for cyber deviant and cybercriminal behaviour, ranging from the spreading of extremist and terrorism-related material to online fraud and cyber security attacks, is on the rise. This workshop aims to better understand such phenomena and develop methods for tackli...
Community detection is a valuable tool for analyzing complex networks. This work investigates the community detection problem based on the density-based algorithm DBSCAN*. This algorithm requires, though, a lower bound for the community size to be determined a priori, a challenging task. To this end, this work proposes the application of a Martinga...
The East-European Conference in Advances in Data Bases & Information Systems (ADBIS) spans 20 years of life. Here, by using simple statistic measures and bibliographic analysis, we illustrate basic characteristics and features of ADBIS, i.e. the venues, persons and countries involved. Also, its international character, its competitiveness and its s...
Monitoring terrorist groups and their suspicious activities in social media is a challenging task, given the large amounts of data involved and the need to identify the most influential users in a smart way. To this end, many efforts have focused on using centrality measures for the identification of the key players in terrorism-related social medi...
This work proposes a generic focused crawling framework for discovering resources on any given topic that reside on the Surface or the Dark Web. The proposed crawler is able to seamlessly traverse the Surface Web and several darknets present in the Dark Web (i.e. Tor, I2P and Freenet) during a single crawl by automatically adapting its crawling beh...
This work investigates the effectiveness of a novel interactive search engine in the context of discovering and retrieving Web resources containing recipes for synthesizing Home Made Explosives (HMEs). The discovery of HME Web resources both on Surface and Dark Web is addressed as a domain-specific search problem; the architecture of the search eng...
Effective multimedia retrieval requires the combination of the heterogeneous media contained within multimedia objects and the features that can be extracted from them. To this end, we extend a unifying framework that integrates all well-known weighted, graph-based, and diffusion-based fusion techniques that combine two modalities (textual and visu...
The 10th European Summer School in Information Retrieval (ESSIR 2015) was held in Thessaloniki, Greece between August 31 and September 4, 2015. The summer school offered high quality lectures on 13 topics in Information Retrieval and related areas, a new edition of the Symposium on Future Directions in Information Access (FDIA), group activities, a...
The Dark Web, a part of the Deep Web that consists of several darknets (e.g. Tor, I2P, and Freenet), provides users with the opportunity of hiding their identity when surfing or publishing information. This anonymity facilitates the communication of sensitive data for legitimate purposes, but also provides the ideal environment for transferring inf...
This work investigates the effectiveness of a state-of-the-art concept detection framework for the automatic classification of multimedia content, namely images and videos, embedded in publicly available Web resources containing recipes for the synthesis of Home Made Explosives (HMEs), to a set of predefined semantic concepts relevant to the HME do...
This work proposes a novel framework that integrates diverse state-of-the-art technologies for the discovery, analysis, retrieval, and recommendation of heterogeneous Web resources containing multimedia information about homemade explosives (HMEs), with particular focus on HME recipe information. The framework corresponds to a knowledge management...
This work proposes a framework for the discovery of environmental Web resources providing air quality measurements and forecasts. Motivated by the frequent occurrence of heatmaps in such Web resources, it exploits multimedia evidence at different stages of the discovery process. Domain-specific queries generated using empirical information and mach...
Enabling effective multimedia information processing, analysis, and access applications in online social multimedia settings requires data representation models that capture a broad range of the characteristics of such environments and ensure interoperability. We propose a flexible model for describing Socially Interconnected MultiMedia-enriched Ob...
Evaluation initiatives have been widely credited with contributing highly to the development and advancement of information access systems, by providing a sustainable platform for conducting the very demanding activity of comparable experimental evaluation in a large scale. Measuring the impact of such benchmarking activities is crucial for assessi...
In this paper we propose a novel approach to training noise-resilient concept detectors from clickthrough data collected by image search engines. We take advantage of the query logs to automatically produce concept detector training sets that, however, suffer from label noise, i.e. erroneously assigned labels. We explore two alternative approaches...
This work evaluates the combination of multiple evidence for discovering groups of users with similar interests. User groups are created by analysing the search logs recorded for a sample of 149 users of a professional image search engine in conjunction with the textual and visual features of the clicked images, and evaluated by exploiting their to...
Focussed crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic based on evidence obtained from the already downloaded pages. This work proposes a classifier-guided focussed crawling approach tha...
In this paper we explore the benefits of latent variable modelling of clickthrough data in the domain of image retrieval. Clicks in image search logs are regarded as implicit relevance judgements that express both user intent and important relations between selected documents. We posit that clickthrough data contains hidden topics and can be used t...
This paper assesses the scholarly impact of the CLEF evaluation campaign by performing a bibliometric analysis of the citations of the CLEF 2000–2009 proceedings publications collected through Scopus and Google Scholar. Our analysis indicates a significant impact of CLEF, particularly for its well-established Adhoc, ImageCLEF, and QA labs, and for...
This study aims at gaining insights into user group identification in professional image search. The user groups are built by analysing the search logs recorded by a commercial picture portal for a sample of 170 users, in conjuction with the users' occupational and topical profile information, and a topical classification of the available images. O...