Stefan Conrad

Stefan Conrad
  • Prof. Dr.
  • Chair at Heinrich Heine University Düsseldorf

About

296
Publications
28,571
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,051
Citations
Current institution
Heinrich Heine University Düsseldorf
Current position
  • Chair
Additional affiliations
September 1998 - June 1999
Johannes-Kepler-Universität Linz
October 2002 - present
Heinrich Heine University Düsseldorf
November 1999 - September 2002
Ludwig-Maximilians-Universität in Munich

Publications

Publications (296)
Preprint
Full-text available
Determining the sustainability impact of companies is a highly complex subject which has garnered more and more attention over the past few years. Today, investors largely rely on sustainability-ratings from established rating-providers in order to analyze how responsibly a company acts. However, those ratings have recently been criticized for bein...
Chapter
Fairness metrics are used to assess discrimination and bias in decision-making processes across various domains, including machine learning models and human decision-makers in real-world applications. This involves calculating the disparities between probabilistic outcomes among social groups, such as acceptance rates between male and female applic...
Preprint
Full-text available
The reason behind the unfair outcomes of AI is often rooted in biased datasets. Therefore, this work presents a framework for addressing fairness by debiasing datasets containing a (non-)binary protected attribute. The framework proposes a combinatorial optimization problem where heuristics such as genetic algorithms can be used to solve for the st...
Preprint
Full-text available
We extract the sentiment from german and english news articles on companies in the DAX40 stock market index and use it to create a sentiment-powered pendant. Comparing it to existing products which adjust their weights at pre-defined dates once per month, we show that our index is able to react more swiftly to sentiment information mined from onlin...
Preprint
Full-text available
Fairness metrics are used to assess discrimination and bias in decision-making processes across various domains, including machine learning models and human decision-makers in real-world applications. This involves calculating the disparities between probabilistic outcomes among social groups, such as acceptance rates between male and female applic...
Preprint
Full-text available
Motivated by the recital (67) of the current corrigendum of the AI Act in the European Union, we propose and present measures and mitigation strategies for discrimination in tabular datasets. We specifically focus on datasets that contain multiple protected attributes, such as nationality, age, and sex. This makes measuring and mitigating bias more...
Preprint
Full-text available
In this paper, we deal with bias mitigation techniques that remove specific data points from the training set to aim for a fair representation of the population in that set. Machine learning models are trained on these pre-processed datasets, and their predictions are expected to be fair. However, such approaches may exclude relevant data, making t...
Conference Paper
Full-text available
This paper presents evidence for an effect of genre on the use of discourse connectives in argumentation. Drawing from discourse processing research on reasoning based structures, we use fill-mask computation to measure genre-induced expectations of argument realisation, and beta regression to model the probabilities of these realisations against a...
Chapter
The reason behind the unfair outcomes of AI is often rooted in biased datasets. Therefore, this work presents a framework for addressing fairness by debiasing datasets containing a (non-)binary protected attribute. The framework proposes a combinatorial optimization problem where heuristics such as genetic algorithms can be used to solve for the st...
Chapter
Fairness is a critical consideration in data analytics and knowledge discovery because biased data can perpetuate inequalities through further pipelines. In this paper, we propose a novel pre-processing method to address fairness issues in classification tasks by adding synthetic data points for more representativeness. Our approach utilizes a stat...
Chapter
Object Detection is one of the most fundamental and challenging areas in computer vision. A detailed analysis and evaluation is key to understanding the performance of custom Deep Learning models. In this contribution, we present an application which is able to run inference on custom data for models created in different machine learning frameworks...
Article
Full-text available
In modern data analysis, time is often considered just another feature. Yet time has a special role that is regularly overlooked. Procedures are usually only designed for time-independent data and are therefore often unsuitable for the temporal aspect of the data. This is especially the case for clustering algorithms. Although there are a few evolu...
Article
Full-text available
It is no longer possible to imagine our everyday life without time series data. This includes, for example, market developments, COVID-19 cases, electricity prices, and other data from a wide variety of domains. An important task in the analysis of these data is the detection of anomalies. In most cases, this is accomplished by examining individual...
Preprint
Full-text available
In distributional semantic accounts of the meaning of noun-noun compounds (e.g. starfish, bank account, houseboat) the important role of constituent polysemy remains largely unaddressed(cf. the meaning of star in starfish vs. star cluster vs. star athlete). Instead of semantic vectors that average over the different meanings of a constituent, disam...
Article
Full-text available
Abnormal torsion of the lower limbs may adversely affect joint health. This study developed and validated a deep learning-based method for automatic measurement of femoral and tibial torsion on MRI. Axial T2-weighted sequences acquired of the hips, knees, and ankles of 93 patients (mean age, 13 ± 5 years; 52 males) were included and allocated to tr...
Article
Purpose: To develop and validate a deep learning-based method for automatic quantitative analysis of lower-extremity alignment. Materials and methods: In this retrospective study, bilateral long-leg radiographs (LLRs) from 255 patients that were obtained between January and September of 2018 were included. For training data (n = 109), a U-Net co...
Article
Full-text available
The topological structure of RDF graphs inherently differs from other types of graphs, like social graphs, due to the pervasive existence of hierarchical relations (TBox), which complement transversal relations (ABox). Graph measures capture such particularities through descriptive statistics. Besides the classical set of measures established in th...
Chapter
We present a multistage method for deep semantic segmentation of bone structures based on a landmark-based shape regression and subsequent local segmentation of relevant areas. Our solution covers the entire pipeline from 2D-based pre-segmentation, a method for fast deep 3D shape regression and subsequent patch-based 3D semantic segmentation for fi...
Chapter
Since the amount of sequentially recorded data is constantly increasing, the analysis of time series (TS), and especially the identification of anomalous points and subsequences, is nowadays an important field of research. Many approaches consider only a single TS, but in some cases multiple sequences need to be investigated. In 2019 we presented a...
Conference Paper
Full-text available
The automatic annotation of coral images is important for researching the underwater ecosystem, which is the focus of the Image-CLEFcoral task. We participated by refining our approaches from the last years challenge for localization and classification of corals within images of sea floor. Underwater images bear multiple difficulties which we tackl...
Conference Paper
Full-text available
Clustering of time series data is a major part of data mining. In this paper, we consider multiple multivariate time series and the clustering of their data points per timestamp. One of the major problems of this approach is that the temporal connection of clusterings at different times can neither be guaranteed nor tracked. For this reason we pres...
Preprint
Alpha matting aims to estimate the translucency of an object in a given image. The resulting alpha matte describes pixel-wise to what amount foreground and background colors contribute to the color of the composite image. While most methods in literature focus on estimating the alpha matte, the process of estimating the foreground colors given the...
Chapter
The discovery of knowledge by analyzing time series is an important field of research. In this paper we investigate multiple multivariate time series, because we assume a higher information value than regarding only one time series at a time. There are several approaches which make use of the granger causality or the cross correlation in order to a...
Preprint
An important step of many image editing tasks is to extract specific objects from an image in order to place them in a scene of a movie or compose them onto another background. Alpha matting describes the problem of separating the objects in the foreground from the background of an image given only a rough sketch. We introduce the PyMatting package...
Chapter
Time series analysis is a part of data mining and nowadays an important field of research due to the increasing amount of data that is recorded sequentially by various systems. Especially the identification of anomalous subsequences arouses great interest, since a manual search for errors or malfunctions is not possible in most cases. Often outlier...
Chapter
Full-text available
The analysis of time series is an important field of research in data mining. This includes different sub areas like trend analysis, outlier detection, forecasting or simply the comparison of multiple time series. Clustering is also an equally important and vast field in time series analysis. Different clustering algorithms provide different analys...
Conference Paper
The paper presents two approaches for automatic Computed Tomography (CT) report and tuberculosis (TB) severity scoring which were two subtasks of ImageCLEFtuberculosis 2019 challenge. While our first approach uses image processing techniques for feature extraction from CT scans, our second approach uses artificial neural networks (ANN) for predicti...
Conference Paper
Full-text available
In this paper we present the approaches that achieved the first place in this years ImageCLEFcoral challenge. The task of the challenge was the localization and classification of corals within images of sea ground. Therefore we had to extract bounding boxes for each coral and labeling them with the specific type of substrate. We applied a state-of-...
Preprint
Full-text available
As the availability and the inter-connectivity of RDF datasets grow, so does the necessity to understand the structure of the data. Understanding the topology of RDF graphs can guide and inform the development of, e.g. synthetic dataset generators, sampling methods, index structures, or query optimizers. In this work, we propose two resources: (i)...
Chapter
Full-text available
As the availability and the inter-connectivity of RDF datasets grow, so does the necessity to understand the structure of the data. Understanding the topology of RDF graphs can guide and inform the development of, e.g. synthetic dataset generators, sampling methods, index structures, or query optimizers. In this work, we propose two resources: (i)...
Chapter
Clustering high dimensional data is a challenging problem for fuzzy clustering algorithms because of so-called concentration of distance phenomenon. The most fuzzy clustering algorithms fail to work on high dimensional data producing cluster prototypes close to the center of gravity of the data set. The presence of noise and outliers in data is an...
Conference Paper
Nowadays tuberculosis is still a widespread disease that causes worldwide more than one million deaths and ten million new infections every year. As part of ImageCLEF 2018, we investigated whether the severity of the disease can be determined from CT scans, only. We therefore extracted features from the images which we then tested with several clas...
Conference Paper
Nowadays tuberculosis is still a widespread disease that causes worldwide more than one million deaths and ten million new infections every year. As part of ImageCLEF 2018, we investigated whether the severity of the disease can be determined from CT scans, only. We therefore extracted features from the images which we then tested with several clas...
Conference Paper
Full-text available
In 2018, tuberculosis was one of the top 10 causes of death worldwide. Especially patients that develop a multidrug-resistance are endangered and need special medical treatment. Within the ImageCLEF 2018 challenge the automatic distinction between drug-sensitive and multidrug-resistant tuberculosis was investigated by only using the CT scan, age an...
Conference Paper
Full-text available
To store Linked Data one may choose from a growing number of available database systems: from traditional relational databases to RDF triple stores, not to mention the area of NoSQL technologies. Comparisons of database systems often use benchmarks to evaluate systems with the best overall runtime performance. However, the structure of data and que...
Conference Paper
Tuberculosis is a widespread disease and one of the top causes of death worldwide. Especially the distinction between drug-sensitive and multidrug-resistant tuberculosis is still problematic. The ImageCLEF 2017 Tuberculosis Task 1 aims at the development of a machine learning system able to decide whether a patient suffers from multidrugresistant t...
Article
Full-text available
Zusammenfassung Online-Partizipationsverfahren werden in den letzten Jahren vermehrt von Städten und Gemeinden eingesetzt, um ihre Bürger in politische Entscheidungsprozesse einzubeziehen. Der vorliegende Beitrag beginnt mit einer Kategorisierung von Online-Partizipationsverfahren im politischen Kontext in Deutschland und fokussiert auf das Beteili...
Article
The 17th Conference on Database Systems for Business, Technology, and Web (BTW2017) of the German Informatics Society (GI) took place in March 2017 at the University of Stuttgart in Germany. A Data Science Challenge was organized for the first time at a BTW conference by the University of Stuttgart and Sponsor IBM. We challenged the participants to...
Conference Paper
Many outlier detection tasks involve a classification of outliers of di erent types. Most standard procedures solve this problem in two steps: First, an outlier detection algorithm is carried out, which is normally trained on outlier free data, only, since the samples of outliers are limited. Second, the outliers detected in that step, are classifi...
Article
In this work, we present the results of a systematic study to investigate the (commercial) benefits of automatic text summarization systems in a real world scenario. More specifically, we define a use case in the context of media monitoring and media response analysis and claim that even using a simple query-based extractive approach can dramatical...
Conference Paper
ive single document summarization is considered as a challenging problem in the field of artificial intelligence and natural language processing. Meanwhile and specifically in the last two years, several deep learning summarization approaches were proposed that once again attracted the attention of researchers to this field. It is a well-known issu...
Conference Paper
Given a set of sentences, a sentence orderer permutes the sentences in a way that the final text is linguistically coherent and semantically understandable. In this work, we focus on the binary and ternary tasks of ordering a pair of sentences regarding their linguistic coherence. We propose a methodology to automatically collect and annotate sente...
Conference Paper
In this paper, we present a new possibilistic multivariate fuzzy c-means (PMFCM) clustering algorithm. PMFCM is a combination of multivariate fuzzy c-means (MFCM) and possibilistic fuzzy c-means (PFCM) that produces membership degrees of data objects to each cluster according to each feature and typicality values of data objects to each cluster. In...
Conference Paper
Full-text available
This paper focuses on the automated extraction of argument components from user content in the German online participation project Tempelhofer Feld. We adapt existing argumentation models into a new model for decision-oriented online participation. Our model consists of three categories: major positions, claims, and premises. We create a new German...
Conference Paper
Clustering is an important technique for identifying groups of similar data objects within a data set. Since problems during the data collection and data preprocessing steps often lead to missing values in the data sets, there is a need for clustering methods that can deal with such imperfect data. Approaches proposed in the literature for adapting...
Chapter
Modern image sharing platforms such as instagram or flickr support an easy publication of photos to the internet, thus leading to great numbers of available photos. However, many of the images are not properly tagged so that there is no notion of what they are showing. For the example of mountain recognition it is advisable to create reference silh...
Conference Paper
The lexical similarity measure is used for calculating the similarities between strings. Existing lexical-based methods usually base on either n-grams or Dice’s approaches. These measures have a good performance and could be extended by adjusting the parameter. However, they do not return reasonable results in some situations where strings are quit...
Article
http://www.aclweb.org/anthology/P/P15/P15-2068.pdf Nowadays, there are a lot of natural lan-guage processing pipelines that are based on training data created by a few experts. This paper examines how the prolifera-tion of the internet and its collaborative application possibilities can be practically used for NLP. For that purpose, we ex-amine ho...
Article
Images on the Web appear with other textual contents—referred to as Web Image Context —providing valuable information to the image semantics. Unfortunately, HTML documents are usually cluttered with multiple different contents to different topics and therefore the right image context needs to be precisely determined in order to deliver high quality...
Article
In this paper we expose how state of the art background subtraction models can be optimized for moving camera recordings. During our research work we found out that none of the commonly used background subtraction models is able to subtract the background accurately, when the camera is moving. Camera motion leads to motion areas in the background,...
Conference Paper
This paper presents an automatic ontology matching approach (called LSSOM - Lexical Structural Semantic-based Ontology Matching method) which brings a final alignment by combining three kinds of different similarity measures: lexical-based, structure-based, and semantic-based techniques as well as using information in ontologies including names, la...
Article
Automatic keyphrase extraction aims at extracting a compact representation of a single document which can be used for various applications such as indexing, classification or summarization. Existing methods for keyphrase extraction usually define the set of phrases of a document as a crisp set and by scoring the phrases, they select the keyphrases...
Article
Measurement of similarity plays an important role in data mining and information retrieval. Several techniques for calculating the similarities between objects have been proposed so far, for example, lexical-based, structure-based and instance-based measures. Existing lexical similarity measures usually base on either ngrams or Dice's approaches to...
Article
The recognition of landmarks in images can help to manage large image collections and thus is desirable for many image retrieval ap- plications. A practical system has to be scalable with an increasing num- ber of landmarks. For the domain of landmark recognition we investigate state-of-the-art CBIR methods on an image dataset of 900 landmarks. Our...
Conference Paper
Several approaches for computing semantic similarity and relatedness measures between terms have been developed. This paper proposes a new semantic similarity measure between two nodes concentrating on nouns as well as their hypernym/hyponym relationships based on the structure of Wordnet. In particular, the similarity of two given nouns depends no...
Conference Paper
Full-text available
A very valuable piece of information in newspaper articles is the tonality of extracted statements. For the analysis of tonality of newspaper articles either a big human effort is needed, when it is carried out by media analysts, or an automated approach which has to be as accurate as possible for a Media Response Analysis (MRA). To this end, we wi...
Conference Paper
In this paper we introduce a repeating motion based video classification system. Videos from certain topical areas like sports, home improvement, or mechanical motion often show specific repeating movements. Main and side frequencies of these repetitions can be considered as motion features. We receive these features by the Fourier transform of spa...
Conference Paper
The extraction of statements is an essential step in a Media Response Analysis (MRA), because statements in news represent the most important information for a customer of a MRA and can be used as the underlying data for Opinion Mining in newspaper articles. We propose a machine learning approach to tackle this problem. For each sentence, our metho...
Conference Paper
The sentiment in news articles is not created only through single words, also linguistic factors, which are invoked by different contexts, influence the opinion-bearing words. In this paper, we apply various commonly used approaches for sentiment analysis and expand research by analysing semantic features and their influence to the sentiment. We us...
Conference Paper
The great development of semantic web in the distributed environment leads to the different forms of ontologies. Therefore, ontology matching is an important task in order to share knowledge among applications more easily. In this paper, we propose an automatic ontology matching method by combining lexical and structure-based measures. A basic lexi...
Article
Subspace clustering is an extension of traditional clustering that enables finding clusters in subspaces within a data set, which means subspace clustering is more suitable for detecting clusters in high-dimensional data sets. However, most subspace clustering methods usually require many complicated parameter settings, which are almost troublesome...
Article
In this paper, we propose a general framework to track and collect user interactions with dynamic webpages. Using the AJAX, PHP, and MySQL technologies, we implement and realize the client-side-scripting framework to collect client paradata in a seamlessly manner. Being stored in a persistent storage at the server, the data were then structured and...
Conference Paper
In this study, we address the problem of finding the optimal number of clusters on incomplete data using cluster validity functions. Experiments were performed on different data sets in order to analyze to what extent cluster validity indices adapted to incomplete data can be used for validation of clustering results. Moreover we analyze which fuzz...
Conference Paper
This contribution introduces a new corpus of a German Media Response Analysis called the pressrelations dataset which can be used in several tasks of Opinion Mining: Sentiment Analysis, Opinion Extraction and the determination of viewpoints. Professional Media Analysts created a corpus of 617 documents which contains 1,521 statements. The statement...
Conference Paper
Adapting opinion mining for news articles is a challenging field and at the same time it is very interesting for many analyses, applications and systems in the field of media monitoring. In this paper, we illustrate specifics in this area in comparison with sentiment analysis of product reviews. Likewise, we introduce new methods for the determinat...
Article
Full-text available
Presentation modeling, which captures the layout of an HTML page, is a very important aspect of modeling Web Applications (WAs). However, presentation modeling is often neglected during forward engineering of Web Applications; therefore, most of these applications are poorly modeled or not modeled at all. This paper discusses the design, implementa...
Article
In this article we present an application of data mining to the medical domain sleep research, an approach for automatic sleep stage scoring and apnea-hypopnea detection. By several combined techniques (Fourier and wavelet transform, derivative dynamic time warping, and waveform recognition), our approach extracts meaningful features (frequencies a...
Conference Paper
This paper presents an approach for time series prediction using a Hidden Markov Model, which bases on inter-time-serial correlations. These correlations between time series of a given database are automatically discovered by hierarchically clustering motif-based time series representations, which can be used for the prediction of the future develo...
Article
Full-text available
Malicious JavaScript code has been actively and recently utilized as a vehicle for Web-based security attacks. By exploiting vulnerabilities such as cross-site scripting (XSS), attackers are able to spread worms, conduct Phishing attacks, and do Web page redirection to “typically” porn Web sites. These attacks can be preemptively prevented if the m...
Article
This paper presents a new approach of classification in which multiple decision trees are combined together for achieving better accuracy compared to that achieved by each of the individual constituent decision trees. A major unit of the proposed system is the combination unit for which we present two algorithms; one is based on pre-pruning and tru...
Article
Various ontology matching solutions have been proposed so far. In this paper, we present a method to match two ontologies using a basic lexical similarity measure (edit-distance) in order to obtain initial mappings and a new structure-based similarity measure as well as to find correspondences among the concepts of the given ontologies. The structu...
Article
This paper discusses an approach, which allows classifying videos by frequency spectra. Many videos contain activities with repeating movements. Sports videos, home improvement videos, or videos showing mechanical motion are some example areas. Motion of these areas usually repeats with a certain main frequency and several side frequencies. Transfo...
Article
Clustering techniques in data mining aim to find interesting patterns in data sets. However, traditional clustering methods are not suitable for large, high-dimensional data. Subspace clustering is an extension of traditional clustering that enables finding clusters in subspaces within a data set, which means subspace clustering is more suitable fo...

Network

Cited By