About
296
Publications
28,571
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,051
Citations
Introduction
Current institution
Additional affiliations
September 1998 - June 1999
Johannes-Kepler-Universität Linz
October 2002 - present
November 1999 - September 2002
Publications
Publications (296)
Determining the sustainability impact of companies is a highly complex subject which has garnered more and more attention over the past few years. Today, investors largely rely on sustainability-ratings from established rating-providers in order to analyze how responsibly a company acts. However, those ratings have recently been criticized for bein...
Fairness metrics are used to assess discrimination and bias in decision-making processes across various domains, including machine learning models and human decision-makers in real-world applications. This involves calculating the disparities between probabilistic outcomes among social groups, such as acceptance rates between male and female applic...
The reason behind the unfair outcomes of AI is often rooted in biased datasets. Therefore, this work presents a framework for addressing fairness by debiasing datasets containing a (non-)binary protected attribute. The framework proposes a combinatorial optimization problem where heuristics such as genetic algorithms can be used to solve for the st...
We extract the sentiment from german and english news articles on companies in the DAX40 stock market index and use it to create a sentiment-powered pendant. Comparing it to existing products which adjust their weights at pre-defined dates once per month, we show that our index is able to react more swiftly to sentiment information mined from onlin...
Fairness metrics are used to assess discrimination and bias in decision-making processes across various domains, including machine learning models and human decision-makers in real-world applications. This involves calculating the disparities between probabilistic outcomes among social groups, such as acceptance rates between male and female applic...
Motivated by the recital (67) of the current corrigendum of the AI Act in the European Union, we propose and present measures and mitigation strategies for discrimination in tabular datasets. We specifically focus on datasets that contain multiple protected attributes, such as nationality, age, and sex. This makes measuring and mitigating bias more...
In this paper, we deal with bias mitigation techniques that remove specific data points from the training set to aim for a fair representation of the population in that set. Machine learning models are trained on these pre-processed datasets, and their predictions are expected to be fair. However, such approaches may exclude relevant data, making t...
This paper presents evidence for an effect of genre on the use of discourse connectives in argumentation. Drawing from discourse processing research on reasoning based structures, we use fill-mask computation to measure genre-induced expectations of argument realisation, and beta regression to model the probabilities of these realisations against a...
The reason behind the unfair outcomes of AI is often rooted in biased datasets. Therefore, this work presents a framework for addressing fairness by debiasing datasets containing a (non-)binary protected attribute. The framework proposes a combinatorial optimization problem where heuristics such as genetic algorithms can be used to solve for the st...
Fairness is a critical consideration in data analytics and knowledge discovery because biased data can perpetuate inequalities through further pipelines. In this paper, we propose a novel pre-processing method to address fairness issues in classification tasks by adding synthetic data points for more representativeness. Our approach utilizes a stat...
Object Detection is one of the most fundamental and challenging areas in computer vision. A detailed analysis and evaluation is key to understanding the performance of custom Deep Learning models. In this contribution, we present an application which is able to run inference on custom data for models created in different machine learning frameworks...
In modern data analysis, time is often considered just another feature. Yet time has a special role that is regularly overlooked. Procedures are usually only designed for time-independent data and are therefore often unsuitable for the temporal aspect of the data. This is especially the case for clustering algorithms. Although there are a few evolu...
It is no longer possible to imagine our everyday life without time series data. This includes, for example, market developments, COVID-19 cases, electricity prices, and other data from a wide variety of domains. An important task in the analysis of these data is the detection of anomalies. In most cases, this is accomplished by examining individual...
In distributional semantic accounts of the meaning of noun-noun compounds (e.g. starfish, bank account, houseboat) the important role of constituent polysemy remains largely unaddressed(cf. the meaning of star in starfish vs. star cluster vs. star athlete). Instead of semantic vectors that average over the different meanings of a constituent, disam...
Abnormal torsion of the lower limbs may adversely affect joint health. This study developed and validated a deep learning-based method for automatic measurement of femoral and tibial torsion on MRI. Axial T2-weighted sequences acquired of the hips, knees, and ankles of 93 patients (mean age, 13 ± 5 years; 52 males) were included and allocated to tr...
Purpose:
To develop and validate a deep learning-based method for automatic quantitative analysis of lower-extremity alignment.
Materials and methods:
In this retrospective study, bilateral long-leg radiographs (LLRs) from 255 patients that were obtained between January and September of 2018 were included. For training data (n = 109), a U-Net co...
The topological structure of RDF graphs inherently differs from other types of graphs, like social graphs, due to the pervasive existence of hierarchical relations (TBox), which complement transversal relations (ABox). Graph measures capture such particularities through descriptive statistics. Besides the classical set of measures established in th...
We present a multistage method for deep semantic segmentation of bone structures based on a landmark-based shape regression and subsequent local segmentation of relevant areas. Our solution covers the entire pipeline from 2D-based pre-segmentation, a method for fast deep 3D shape regression and subsequent patch-based 3D semantic segmentation for fi...
Since the amount of sequentially recorded data is constantly increasing, the analysis of time series (TS), and especially the identification of anomalous points and subsequences, is nowadays an important field of research. Many approaches consider only a single TS, but in some cases multiple sequences need to be investigated. In 2019 we presented a...
The automatic annotation of coral images is important for researching the underwater ecosystem, which is the focus of the Image-CLEFcoral task. We participated by refining our approaches from the last years challenge for localization and classification of corals within images of sea floor. Underwater images bear multiple difficulties which we tackl...
Clustering of time series data is a major part of data mining.
In this paper, we consider multiple multivariate time series and the clustering
of their data points per timestamp. One of the major problems of
this approach is that the temporal connection of clusterings at different
times can neither be guaranteed nor tracked. For this reason we pres...
Alpha matting aims to estimate the translucency of an object in a given image. The resulting alpha matte describes pixel-wise to what amount foreground and background colors contribute to the color of the composite image. While most methods in literature focus on estimating the alpha matte, the process of estimating the foreground colors given the...
The discovery of knowledge by analyzing time series is an important field of research. In this paper we investigate multiple multivariate time series, because we assume a higher information value than regarding only one time series at a time. There are several approaches which make use of the granger causality or the cross correlation in order to a...
An important step of many image editing tasks is to extract specific objects from an image in order to place them in a scene of a movie or compose them onto another background. Alpha matting describes the problem of separating the objects in the foreground from the background of an image given only a rough sketch. We introduce the PyMatting package...
Time series analysis is a part of data mining and nowadays an important field of research due to the increasing amount of data that is recorded sequentially by various systems. Especially the identification of anomalous subsequences arouses great interest, since a manual search for errors or malfunctions is not possible in most cases. Often outlier...
The analysis of time series is an important field of research in data mining. This includes different sub areas like trend analysis, outlier detection, forecasting or simply the comparison of multiple time series. Clustering is also an equally important and vast field in time series analysis. Different clustering algorithms provide different analys...
The paper presents two approaches for automatic Computed Tomography (CT) report and tuberculosis (TB) severity scoring which were two subtasks of ImageCLEFtuberculosis 2019 challenge. While our first approach uses image processing techniques for feature extraction from CT scans, our second approach uses artificial neural networks (ANN) for predicti...
In this paper we present the approaches that achieved the first place in this years ImageCLEFcoral challenge. The task of the challenge was the localization and classification of corals within images of sea ground. Therefore we had to extract bounding boxes for each coral and labeling them with the specific type of substrate. We applied a state-of-...
As the availability and the inter-connectivity of RDF datasets grow, so does the necessity to understand the structure of the data. Understanding the topology of RDF graphs can guide and inform the development of, e.g. synthetic dataset generators, sampling methods, index structures, or query optimizers. In this work, we propose two resources: (i)...
As the availability and the inter-connectivity of RDF datasets grow, so does the necessity to understand the structure of the data. Understanding the topology of RDF graphs can guide and inform the development of, e.g. synthetic dataset generators, sampling methods, index structures, or query optimizers. In this work, we propose two resources: (i)...
Clustering high dimensional data is a challenging problem for fuzzy clustering algorithms because of so-called concentration of distance phenomenon. The most fuzzy clustering algorithms fail to work on high dimensional data producing cluster prototypes close to the center of gravity of the data set. The presence of noise and outliers in data is an...
Nowadays tuberculosis is still a widespread disease that causes worldwide more than one million deaths and ten million new infections every year. As part of ImageCLEF 2018, we investigated whether the severity of the disease can be determined from CT scans, only. We therefore extracted features from the images which we then tested with several clas...
Nowadays tuberculosis is still a widespread disease that causes worldwide more than one million deaths and ten million new infections every year. As part of ImageCLEF 2018, we investigated whether the severity of the disease can be determined from CT scans, only. We therefore extracted features from the images which we then tested with several clas...
In 2018, tuberculosis was one of the top 10 causes of death
worldwide. Especially patients that develop a multidrug-resistance are
endangered and need special medical treatment. Within the ImageCLEF
2018 challenge the automatic distinction between drug-sensitive and multidrug-resistant
tuberculosis was investigated by only using the CT scan,
age an...
To store Linked Data one may choose from a growing number of available database systems: from traditional relational databases to RDF triple stores, not to mention the area of NoSQL technologies. Comparisons of database systems often use benchmarks to evaluate systems with the best overall runtime performance. However, the structure of data and que...
Tuberculosis is a widespread disease and one of the top
causes of death worldwide. Especially the distinction between drug-sensitive
and multidrug-resistant tuberculosis is still problematic. The ImageCLEF
2017 Tuberculosis Task 1 aims at the development of a machine
learning system able to decide whether a patient suffers from multidrugresistant
t...
Zusammenfassung
Online-Partizipationsverfahren werden in den letzten Jahren vermehrt von Städten und Gemeinden eingesetzt, um ihre Bürger in politische Entscheidungsprozesse einzubeziehen. Der vorliegende Beitrag beginnt mit einer Kategorisierung von Online-Partizipationsverfahren im politischen Kontext in Deutschland und fokussiert auf das Beteili...
The 17th Conference on Database Systems for Business, Technology, and Web (BTW2017) of the German Informatics Society (GI) took place in March 2017 at the University of Stuttgart in Germany. A Data Science Challenge was organized for the first time at a BTW conference by the University of Stuttgart and Sponsor IBM. We challenged the participants to...
Many outlier detection tasks involve a classification of outliers of di erent types. Most standard procedures solve this problem in two steps: First, an outlier detection algorithm is carried out, which is normally trained on outlier free data, only, since the samples of outliers are limited. Second, the outliers detected in that step, are classifi...
In this work, we present the results of a systematic study to investigate the (commercial) benefits of automatic text summarization systems in a real world scenario. More specifically, we define a use case in the context of media monitoring and media response analysis and claim that even using a simple query-based extractive approach can dramatical...
ive single document summarization is considered as a challenging problem in the field of artificial intelligence and natural language processing. Meanwhile and specifically in the last two years, several deep learning summarization approaches were proposed that once again attracted the attention of researchers to this field.
It is a well-known issu...
Given a set of sentences, a sentence orderer permutes the sentences in a way that the final text is linguistically coherent and semantically understandable. In this work, we focus on the binary and ternary tasks of ordering a pair of sentences regarding their linguistic coherence. We propose a methodology to automatically collect and annotate sente...
In this paper, we present a new possibilistic multivariate fuzzy c-means (PMFCM) clustering algorithm. PMFCM is a combination of multivariate fuzzy c-means (MFCM) and possibilistic fuzzy c-means (PFCM) that produces membership degrees of data objects to each cluster according to each feature and typicality values of data objects to each cluster. In...
This paper focuses on the automated extraction of argument components from user content in the German online participation project Tempelhofer Feld. We adapt existing argumentation models into a new model for decision-oriented online participation. Our model consists of three categories: major positions, claims, and premises. We create a new German...
Clustering is an important technique for identifying groups of similar data objects within a data set. Since problems during the data collection and data preprocessing steps often lead to missing values in the data sets, there is a need for clustering methods that can deal with such imperfect data. Approaches proposed in the literature for adapting...
https://aclweb.org/anthology/S/S16/S16-1090.pdf
Modern image sharing platforms such as instagram or flickr support an easy publication of photos to the internet, thus leading to great numbers of available photos. However, many of the images are not properly tagged so that there is no notion of what they are showing.
For the example of mountain recognition it is advisable to create reference silh...
The lexical similarity measure is used for calculating the similarities between strings. Existing lexical-based methods usually base on either n-grams or Dice’s approaches. These measures have a good performance and could be extended by adjusting the parameter. However, they do not return reasonable results in some situations where strings are quit...
http://www.aclweb.org/anthology/P/P15/P15-2068.pdf
Nowadays, there are a lot of natural lan-guage processing pipelines that are based on training data created by a few experts. This paper examines how the prolifera-tion of the internet and its collaborative application possibilities can be practically used for NLP. For that purpose, we ex-amine ho...
Images on the Web appear with other textual contents—referred to as
Web Image Context
—providing valuable information to the image semantics. Unfortunately, HTML
documents are usually cluttered with
multiple different contents to different topics and therefore the right image context needs to be precisely determined in order to deliver high quality...
In this paper we expose how state of the art background subtraction models can be optimized for moving camera recordings. During our research work we found out that none of the commonly used background subtraction models is able to subtract the background accurately, when the camera is moving. Camera motion leads to motion areas in the background,...
This paper presents an automatic ontology matching approach (called LSSOM - Lexical Structural Semantic-based Ontology Matching method) which brings a final alignment by combining three kinds of different similarity measures: lexical-based, structure-based, and semantic-based techniques as well as using information in ontologies including names, la...
Automatic keyphrase extraction aims at extracting a compact representation of a single document which can be used for various applications such as indexing, classification or summarization. Existing methods for keyphrase extraction usually define the set of phrases of a document as a crisp set and by scoring the phrases, they select the keyphrases...
Measurement of similarity plays an important role in data mining and information retrieval. Several techniques for calculating the similarities between objects have been proposed so far, for example, lexical-based, structure-based and instance-based measures. Existing lexical similarity measures usually base on either ngrams or Dice's approaches to...
The recognition of landmarks in images can help to manage large image collections and thus is desirable for many image retrieval ap- plications. A practical system has to be scalable with an increasing num- ber of landmarks. For the domain of landmark recognition we investigate state-of-the-art CBIR methods on an image dataset of 900 landmarks. Our...
Several approaches for computing semantic similarity and relatedness measures between terms have been developed. This paper proposes a new semantic similarity measure between two nodes concentrating on nouns as well as their hypernym/hyponym relationships based on the structure of Wordnet. In particular, the similarity of two given nouns depends no...
A very valuable piece of information in newspaper articles is the tonality of extracted statements. For the analysis of tonality of newspaper articles either a big human effort is needed, when it is carried out by media analysts, or an automated approach which has to be as accurate as possible for a Media Response Analysis (MRA). To this end, we wi...
In this paper we introduce a repeating motion based video classification system. Videos from certain topical areas like sports, home improvement, or mechanical motion often show specific repeating movements. Main and side frequencies of these repetitions can be considered as motion features. We receive these features by the Fourier transform of spa...
The extraction of statements is an essential step in a Media Response Analysis (MRA), because statements in news represent the most important information for a customer of a MRA and can be used as the underlying data for Opinion Mining in newspaper articles. We propose a machine learning approach to tackle this problem. For each sentence, our metho...
The sentiment in news articles is not created only through single words, also linguistic factors, which are invoked by different contexts, influence the opinion-bearing words. In this paper, we apply various commonly used approaches for sentiment analysis and expand research by analysing semantic features and their influence to the sentiment. We us...
The great development of semantic web in the distributed environment leads to the different forms of ontologies. Therefore, ontology matching is an important task in order to share knowledge among applications more easily. In this paper, we propose an automatic ontology matching method by combining lexical and structure-based measures. A basic lexi...
Subspace clustering is an extension of traditional clustering that enables finding clusters in subspaces within a data set, which means subspace clustering is more suitable for detecting clusters in high-dimensional data sets. However, most subspace clustering methods usually require many complicated parameter settings, which are almost troublesome...
In this paper, we propose a general framework to track and collect user interactions with dynamic webpages. Using the AJAX, PHP, and MySQL technologies, we implement and realize the client-side-scripting framework to collect client paradata in a seamlessly manner. Being stored in a persistent storage at the server, the data were then structured and...
In this study, we address the problem of finding the optimal number of clusters on incomplete data using cluster validity functions. Experiments were performed on different data sets in order to analyze to what extent cluster validity indices adapted to incomplete data can be used for validation of clustering results. Moreover we analyze which fuzz...
This contribution introduces a new corpus of a German Media Response Analysis called the pressrelations dataset which can be used in several tasks of Opinion Mining: Sentiment Analysis, Opinion Extraction and the determination of viewpoints. Professional Media Analysts created a corpus of 617 documents which contains 1,521 statements. The statement...
Adapting opinion mining for news articles is a challenging field and at the same time it is very interesting for many analyses, applications and systems in the field of media monitoring. In this paper, we illustrate specifics in this area in comparison with sentiment analysis of product reviews. Likewise, we introduce new methods for the determinat...
Presentation modeling, which captures the layout of an HTML page, is a very important aspect of modeling Web Applications (WAs). However, presentation modeling is often neglected during forward engineering of Web Applications; therefore, most of these applications are poorly modeled or not modeled at all. This paper discusses the design, implementa...
In this article we present an application of data mining to the medical domain sleep research, an approach for automatic sleep stage scoring and apnea-hypopnea detection. By several combined techniques (Fourier and wavelet transform, derivative dynamic time warping, and waveform recognition), our approach extracts meaningful features (frequencies a...
This paper presents an approach for time series prediction using a Hidden Markov Model, which bases on inter-time-serial correlations. These correlations between time series of a given database are automatically discovered by hierarchically clustering motif-based time series representations, which can be used for the prediction of the future develo...
Malicious JavaScript code has been actively and recently utilized as a vehicle for Web-based security attacks. By exploiting vulnerabilities such as cross-site scripting (XSS), attackers are able to spread worms, conduct Phishing attacks, and do Web page redirection to “typically” porn Web sites. These attacks can be preemptively prevented if the m...
This paper presents a new approach of classification in which multiple decision trees are combined together for achieving better accuracy compared to that achieved by each of the individual constituent decision trees. A major unit of the proposed system is the combination unit for which we present two algorithms; one is based on pre-pruning and tru...
Various ontology matching solutions have been proposed so far. In this paper, we present a method to match two ontologies using a basic lexical similarity measure (edit-distance) in order to obtain initial mappings and a new structure-based similarity measure as well as to find correspondences among the concepts of the given ontologies. The structu...
This paper discusses an approach, which allows classifying videos by frequency spectra. Many videos contain activities with repeating movements. Sports videos, home improvement videos, or videos showing mechanical motion are some example areas. Motion of these areas usually repeats with a certain main frequency and several side frequencies. Transfo...
Clustering techniques in data mining aim to find interesting patterns in data sets. However, traditional clustering methods are not suitable for large, high-dimensional data. Subspace clustering is an extension of traditional clustering that enables finding clusters in subspaces within a data set, which means subspace clustering is more suitable fo...