About
55
Publications
7,976
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
326
Citations
Introduction
Skills and Expertise
Education
January 2015 - August 2020
October 1986 - January 1993
Publications
Publications (55)
Entity matching, also known as user identity linkage, is a critical task in data integration. While established techniques primarily focus on large-scale networks, there are several applications where small networks pose challenges due to limited training data and sparsity. This study addresses entity matching in the field of criminology, where sma...
This chapter provides an in-depth account of current research activities and applications in the field of Speech Technology (ST). It discusses technical, scientific, commercial and societal aspects in various ST sub-fields and relates ST to the wider areas of Natural Language Processing and Artificial Intelligence. Furthermore, it outlines breakthr...
This chapter provides a comprehensive overview of innovation and the ELG marketplace as core elements for the generation of value and the creation of an active, attractive and vibrant community surrounding the European Language Grid. Innovation is an essential element in making ELG a credible and sustainable undertaking. However, it does not happen...
When preparing the European Language Grid EU project proposal and designing the overall concept of the platform, the need for drawing up a long-term sustainability plan was abundantly evident. Already in the phase of developing the proposal, the centrepiece of the sustainability plan was what we called the “ELG legal entity”, i. e., an independent...
Präsentation über Hatespeach und Hatecrime und die technischen Möglichkeit Inhalte in Social Media Netzwerken automatisiert zu analysieren.
Our perception of the situation in a country or a region is strongly influenced by the reflection of this situation in mass and social media channels. This effect is even more pronounced for geographically and culturally distant regions, for which no firsthand experience is available. To avoid information overload, news outlets typically filter the...
Tracking terrorism and organized crime activities require multi-source information fusion at high spatiotemporal resolution. Here, we examined the potential jointly use of Earth Observation data and big data from online sources for forming relationships that can help actions against terrorism and organized crime. A four-step novel methodological pi...
Emotionalisierung des Politischen, Empörung als auch Verweise auf utopische Alternativen sind von jeher Bestandteil demokratischer Debatten, können aber ebenso zu
demokratischer Delegitimierung führen. Eine besondere Rolle nimmt hier im jungen
21. Jahrhundert Social Media ein: Gruppierungen äußerer politischer Lager gewinnen
in den sozialen Medien...
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade h...
With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT b...
The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the...
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade h...
With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT b...
Big Data Analyse über Hasspostings im Kontext des österreichischen Rechtsrahmens.
The phenomenon of fake news is nothing new. It has been around as long as people have had a vested interest in manipulating opinions and images, dating back to historical times, for which written accounts exist and probably much beyond. Referring to it as post-truth seems futile, as there’s probably never been an era of truth when it comes to news....
This paper describes SIIP (Speaker Identification Integrated Project) a high performance innovative and sustainable Speaker Identification (SID) solution, running over large voice samples database. The solution is based on development, integration and fusion of a series of speech analytic algorithms which includes speaker model recognition, gender...
Since the summer of 2015, the refugee crisis in Europe has grown to be one of the biggest challenges Europe has faced since WW2. The development of this humanitarian crisis are the topic of discussions throughout Europe and covered by media on a daily basis. Germany in particular has been the focus of migration. Over time, in Germany and the neighb...
Traditional media have a long history in covering natural disasters and crises. In many instances, these media remain major providers of information about an event. In recent years, however, information about natural disasters has increasingly been disseminated on a significant scale via Social Media platforms. These media provide new, additional a...
Crises and disasters are covered continuously and without interruption by today’s media, especially social media. There is not a single significant occurrence within the flow of events which they do not document. Consequently, the information contained in media—especially social media like Facebook and Twitter— provides an often neglected potential...
Sentiment analysis has been well in the focus of researchers in recent years. Nevertheless despite the considerable amount of literature in the field the majority of publications target the domains of movie and product reviews in English. The current paper presents a novel sentiment analysis method, which extends the state-of-the-art by trilingual...
Usually, in intelligent tutoring systems the task sequencing is done by means of expert and domain knowledge. In a former work we presented a new efficient task sequencer without using the expensive expert and domain knowledge. This task sequencer only uses former performances and decides about the next task according to Vygotsky’s Zone of Proximal...
Crises and disasters occur all over the world with the highest impact on the most vulnerable in society.
Generating a thorough and trusted status of information about the situation is a priority for effective and coordinated disaster management and relief measures delivered by governmental organizations (GOs)
and non-governmental organizations (NGO...
Traditional and social media are known to be of great benefit for crisis-and disaster communication. In the past majority of cases, however, these media have been collected, processed and analyzed separately. Previous research focused mostly on aspects of communication within a single medium and a single channel only (typically Twitter). Little wor...
Analysis of social media and traditional media provides significant information to first responders in times of
natural disasters. Sentiment analysis , particularly of social media originating from the affected population, forms an integral part of multifaceted media analysis. The current paper extends an existing methodology to the domain of natur...
Traditional and social media are known to be of great benefit for crisis- and disaster communication. In this paper we argue for the combination of these different kinds of media in a cross-media, multi-media and multi-lingual approach, claiming that a combined view will yield superior results to individual media only. We emphasize the importance o...
In this paper we describe the role of media in the context of natural disasters. Traditional media have a long history in covering disasters. They are a major provider of information in times of disaster and will remain so in the future. In recent years, however there has been a significant change: information about natural disasters has increasing...
In this paper we describe work in progress on a cross-media content analysis approach and framework, which is currently being developed within the QuOIMA project. We describe the role of media, and how possible links between social and traditional media and terminology and communication patterns are envisioned to be connected to the different phase...
In this paper we discuss the role of Open Source Intelligence (OSINT) in Disaster Management. In particular we present the use of the Sail Labs Media Mining System in the context of disaster relief operations and use samples to point out advantages and strengths of the MM-System. Future challenges in research and further development of this field a...
Early detection of potential health threats is crucial for taking actions in time. It is unclear in which information source an event is reported first and, information from various sources can be complementing. Thus, it is important to search for information in a very broad range of sources. Furthermore, real-time processing is necessary to deal w...
Understanding users'search intent expressed through their search queries is crucial to Web search and online advertisement. Web query classification (QC) has been widely studied for this purpose. Most previous QC algorithms classify individual queries ...
We describe the Sail Labs Media Mining System which is capable of processing vast amounts of data typically gathered from open sources in unstructured form. The data are processed by a set of components and the output is produced in MPEG7 format. The origin and kind of input may be as diverse as a set of satellite receivers monitoring TV stations o...
In this notebook paper we describe the technical details of the submissions to TRECVID 2006 from CMU Informedia team. We participated in the high-level feature extraction and the search (automatic and interactive) tasks. Our emphasis is on various techniques used for the search task, where our interactive runs won the first place in the interactive...
A computerized method is provided for adding a new word to a vocabulary of a speech system, the vocabulary comprising words and corresponding acoustic patterns for a language or language domain. Within a determination step for the new word, a regularity value is determined which measures the conformity with respect to the pronunciation in the langu...
We report on a series of experiments addressing the fact that German is less suited than English for word-based n-gram language models. Several systems were trained at different vocabulary sizes using various sets of lexical units. They
were evaluated against a newly created corpus of German and Austrian broadcast news.
We present experiments on automatic language identification in the
broadcast news domain. Because of the inherent diversity of news
broadcasts, speech is extracted from the raw audio data by means of
phone-level decoding using broad classes of phonemes. Training and
testing was performed on recordings of German, English, Spanish and
French news sho...
We report on a series of experiments addressing the fact that German is less suited than English for word-based n-gram language models. Several systems were trained at different vocabulary sizes using various sets of lexical units. They were evaluated against a newly created corpus of German and Austrian broadcast news.
Traditional audio-visual archives are nowadays being replaced by digital multimedia content management systems. These systems manage the audio-visual data itself as well as additional logging information (meta-data). To actually make these resources available and capitalize on their content usually manual annotation by experts is required, which ma...