Steffen Staab

Steffen Staab
Universität Stuttgart · Institute of Parallel and Distributed Systems

Prof. Dr. rer. nat.

About

654
Publications
164,129
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
25,131
Citations
Additional affiliations
October 2004 - present
Universität Koblenz-Landau
Position
  • Professor
October 2004 - present
Universität Koblenz-Landau
Position
  • Professor (Full)
November 1998 - September 2004
Karlsruhe Institute of Technology
Position
  • Lecturer

Publications

Publications (654)
Preprint
Full-text available
Current advance of internet allows rapid dissemination of information, accelerating the progress on wide spectrum of society. This has been done mainly through the use of website interface with inherent unique human interactions. In this regards the usability analysis becomes a central part to improve the human interactions. However, This analysis...
Chapter
As humans, we can deduce more from the data graph of Figure 2.1 than what the edges explicitly indicate. We may deduce, for example, that the am festival ((eidis)) will be located in Santiago, even though the graph does not contain an edge (eidis)— location →(santiago). We may further deduce that the cities connected by flights must have some airpo...
Chapter
While deductive knowledge is characterized by precise logical consequences, inductively acquiring knowledge involves generalizing patterns from a given set of input observations, which can then be used to generate novel but potentially imprecise predictions. For example, from a large data graph with geographical and flight information, we may obser...
Article
The notion of Knowledge Graph stems from scientific advancements in diverse research areas such as Semantic Web, databases, knowledge representation and reasoning, NLP, and machine learning, among others. The integration of ideas and techniques from such disparate disciplines presents a challenge to practitioners and researchers to know how current...
Chapter
In this chapter, we discuss some of the most prominent knowledge graphs that have emerged in the past years. We begin by discussing open knowledge graphs, most of which have been published on the Web per the guidelines and protocols described in Chapter 9. We later discuss enterprise knowledge graphs that have been created by companies from diverse...
Chapter
In this chapter we describe extensions of the data graph–relating to schema, identity, and context–that provide additional structures for accumulating knowledge. Henceforth, we refer to a data graph as a collection of data represented as nodes and edges using one of the models discussed in Chapter 2. We refer to a knowledge graph as a data graph po...
Chapter
Independent of the (kinds of) source(s) from which a knowledge graph is created, the resulting initial knowledge graph will usually be incomplete, and will often contain duplicate, contradictory or even incorrect statements, especially when taken from multiple sources. After the initial creation and enrichment of a knowledge graph from external sou...
Chapter
At the foundation of any knowledge graph is the principle of first applying a graph abstraction to data, resulting in an initial data graph. We now discuss a selection of graph-structured data models that are commonly used in practice to represent data graphs. We then discuss the primitives that form the basis of graph query languages used to inter...
Chapter
In this chapter, we discuss the principal techniques by which knowledge graphs can be created and subsequently enriched from diverse sources of legacy data that range from plain text to structured formats (and anything in between). The appropriate methodology to follow when creating a knowledge graph depends on the actors involved, the domain, the...
Chapter
Beyond assessing the quality of a knowledge graph, there exist techniques to refine the knowledge graph, in particular to (semi-)automatically complete and correct the knowledge graph [Paul-heim, 2017], aka knowledge graph completion and knowledge graph correction, respectively. As distinguished from the creation and enrichment tasks outlined in Ch...
Chapter
While it may not always be desirable to publish knowledge graphs (for example, those that offer a competitive advantage to a company [Noy et al., 2019]), it maybe desirable or even required to publish other knowledge graphs, such as those produced by volunteers [Lehmann et al., 2015, Mahdisoltani et al., 2015, Vrandecic and Krotzsch, 2014], by publ...
Preprint
Full-text available
Wikidata is the largest general-interest knowledge base that is openly available. It is collaboratively edited by thousands of volunteer editors and has thus evolved considerably since its inception in 2012. In this paper, we present Wikidated 1.0, a dataset of Wikidata's full revision history, which encodes changes between Wikidata revisions as se...
Chapter
Knowledge graphs such as Wikidata are created by a diversity of contributors and a range of sources leaving them prone to two types of errors. The first type of error, falsity of facts, is addressed by property graphs through the representation of provenance and validity, making triples occur as first-order objects in subject position of metadata t...
Preprint
Property graphs constitute data models for representing knowledge graphs. They allow for the convenient representation of facts, including facts about facts, represented by triples in subject or object position of other triples. Knowledge graphs such as Wikidata are created by a diversity of contributors and a range of sources leaving them prone to...
Article
Full-text available
In this article, we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models, as well as lang...
Preprint
Full-text available
Physical motion models offer interpretable predictions for the motion of vehicles. However, some model parameters, such as those related to aero- and hydrodynamics, are expensive to measure and are often only roughly approximated reducing prediction accuracy. Recurrent neural networks achieve high prediction accuracy at low cost, as they can use ch...
Conference Paper
Text entry by gaze is a useful means of hands-free interaction that is applicable in settings where dictation suffers from poor voice recognition or where spoken words and sentences jeopardize privacy or confidentiality. However, text entry by gaze still shows inferior performance and it quickly exhausts its users. We introduce text entry by gaze a...
Conference Paper
Knowledge graphs such asWikidata are created by a diversity of contributors and a range of sources leaving them prone to two types of errors. The first type of error, falsity of facts, is addressed by property graphs through the representation of provenance and validity, making triples occur as first-order objects in subject position of metadata tr...
Preprint
Full-text available
Processing sequential multi-sensor data becomes important in many tasks due to the dramatic increase in the availability of sensors that can acquire sequential data over time. Human Activity Recognition (HAR) is one of the fields which are actively benefiting from this availability. Unlike most of the approaches addressing HAR by considering predef...
Chapter
We introduce an approach to semantically represent and query raster data in a Semantic Web graph. We extend the GeoSPARQL vocabulary and query language to support raster data as a new type of geospatial data. We define new filter functions and illustrate our approach using several use cases on real-world data sets. Finally, we describe a prototypic...
Chapter
The Shapes Constraint Language (SHACL) allows for formalizing constraints over RDF data graphs. A shape groups a set of constraints that may be fulfilled by nodes in the RDF graph. We investigate the problem of containment between SHACL shapes. One shape is contained in a second shape if every graph node meeting the constraints of the first shape a...
Article
Full-text available
Requirements are inherently prone to conflicts. Security, data-minimization, and fairness requirements are no exception. Importantly, undetected conflicts between such requirements can lead to severe effects, including privacy infringement and legal sanctions. Detecting conflicts between security, data-minimization, and fairness requirements is a c...
Preprint
The Shapes Constraint Language (SHACL) allows for formalizing constraints over RDF data graphs. A shape groups a set of constraints that may be fulfilled by nodes in the RDF graph. We investigate the problem of containment between SHACL shapes. One shape is contained in a second shape if every graph node meeting the constraints of the first shape a...
Conference Paper
In this paper we study the problem of concept contraction for the description logic EL. Concept contraction is concerned with the following question: Given two concepts C and D (with the interesting case being that D subsumes C) how can we find a generalisation of C that is not subsumed by D but is otherwise as similar as possible to C? We take an...
Conference Paper
Full-text available
Usability analysis plays a significant role in optimizing Web interaction by understanding the behavior of end users. To support such analysis, we present a tool to visualize gaze and mouse data of Web site interactions. The proposed tool provides not only the traditional visualizations with fixations, scanpath, and heatmap, but allows for more det...
Article
Full-text available
Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for...
Article
Full-text available
Topic modeling is a popular technique for clustering large collections of text documents. A variety of different types of regularization is implemented in topic modeling. In this paper, we propose a novel approach for analyzing the influence of different regularization types on results of topic modeling. Based on Renyi entropy, this approach is ins...
Preprint
Full-text available
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languag...
Article
Full-text available
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languag...
Article
Preferential attachment drives the evolution of many complex networks. Its analytical studies mostly consider the simplest case of a network that grows uniformly in time despite the accelerating growth of many real networks. Motivated by the observation that the average degree growth of nodes is time invariant in empirical network data, we study th...
Conference Paper
Universal embeddings, such as BERT or ELMo, are useful for a broad set of natural language processing tasks like text classification or sentiment analysis. Moreover, specialized embeddings also exist for tasks like topic modeling or named entity disambiguation. We study if we can complement these universal embeddings with specialized embeddings. We...
Preprint
Full-text available
Preferential attachment drives the evolution of many complex networks. Its analytical studies mostly consider the simplest case of a network that grows uniformly in time despite the accelerating growth of many real networks. Motivated by the observation that the average degree growth of nodes is time-invariant in empirical network data, we study th...
Preprint
Full-text available
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and...
Conference Paper
Full-text available
We present the design and evaluation of Talk-and-Gaze (TaG), a method for selecting and correcting errors with voice and gaze. TaG uses eye gaze to overcome the inability of voiceonly systems to provide spatial information. The user’s point of gaze is used to select an erroneous word either by dwelling on the word for 800 ms (D-TaG) or by uttering...
Conference Paper
Full-text available
The conventional dwell-based methods for text entry by gaze are typically slow and uncomfortable. A swipe-based method that maps gaze path into words offers an alternative. However, it requires the user to explicitly indicate the beginning and ending of a word, which is typically achieved by tedious gazeonly selection. This paper introduces TAGSwip...
Article
Eye tracking systems have greatly improved in recent years, being a viable and affordable option as digital communication channel, especially for people lacking fine motor skills. Using eye tracking as an input method is challenging due to accuracy and ambiguity issues, and therefore research in eye gaze interaction is mainly focused on better poin...
Chapter
It is a strength of graph-based data formats, like RDF, that they are very flexible with representing data. To avoid run-time errors, program code that processes highly-flexible data representations exhibits the difficulty that it must always include the most general case, in which attributes might be set-valued or possibly not available. The Shape...
Preprint
A detailed understanding of users contributes to the understanding of the Web's evolution, and to the development of Web applications. Although for new Web platforms such a study is especially important, it is often jeopardized by the lack of knowledge about novel phenomena due to the sparsity of data. Akin to human transfer of experiences from one...
Conference Paper
Full-text available
We present TouchGazePath, a multimodal method for entering personal identification numbers (PINs). Using a touch-sensitive display showing a virtual keypad, the user initiates input with a touch at any location, glances with their eye gaze on the keys bearing the PIN numbers, then terminates input by lifting their finger. TouchGazePath is not susce...
Conference Paper
Full-text available
Graph data models are interesting in various domains, in part because of the intuitiveness and flexibility they offer compared to relational models. Specialized query languages, such as Cypher for property graphs or SPARQL for RDF, facilitate their use. In this paper, we present an empirical study on the usage of graph-based query languages in open...
Preprint
Full-text available
It is a strength of graph-based data formats, like RDF, that they are very flexible with representing data. To avoid run-time errors, program code that processes highly-flexible data representations exhibits the difficulty that it must always include the most general case, in which attributes might be set-valued or possibly not available. The Shape...
Article
Full-text available
Vast amounts of information and knowledge is produced and stored within product design projects. Especially for reuse and adaptation there exists no suitable method for product designers to handle this information overload. Due to this, the selection of relevant information in a specific development situation is time-consuming and inefficient. To t...
Conference Paper
Text predictions play an important role in improving the performance of gaze-based text entry systems. However, visual search, scanning, and selection of text predictions require a shift in the user's attention from the keyboard layout. Hence the spatial positioning of predictions becomes an imperative aspect of the end-user experience. In this wor...
Conference Paper
Full-text available
Combating misinformation is a challenging task due to the fact that misinformation evolves in content and strategy. We describe the challenges of this task and propose a git-based framework for collaborative and open policy-making against ever-evolving misinformation. We present the setup for future test-runs where users receive tasks that conduct...
Conference Paper
Full-text available
This paper describes our submission to SemEval-2019 Task 7: RumourEval: Determining Rumor Veracity and Support for Rumors. We participated in both subtasks. The goal of subtask A is to classify the type of interaction between a rumorous social media post and a reply post as support, query, deny, or comment. The goal of subtask B is to predict the v...
Conference Paper
Full-text available
This paper addresses the problem of extracting and segmenting references from PDF documents. The novelty of the presented approach lies in its capability to discover highly varying references mainly in terms of content, length and location in the document. Unlike existing works, the proposed method does not follow the classical pipeline that consis...
Preprint
Full-text available
This paper addresses the problem of extracting and segmenting references from PDF documents. The novelty of the presented approach lies in its capability to discover highly varying references mainly in terms of content, length and location in the document. Unlike existing works, the proposed method does not follow the classical pipeline that consis...
Chapter
In 2013 property paths were introduced with the release of SPARQL 1.1. These property paths allow for describing complex queries in a more concise and comprehensive way. The W3C introduced a formal specification of the semantics of property paths, to which implementations should adhere. Most commonly used RDF stores claim to support property paths....
Chapter
Formal concept analysis (FCA) derives a hierarchy of concepts in a formal context that relates objects with attributes. This approach is very well aligned with the traditions of Frege, Saussure and Peirce, which relate a signifier (e.g. a word/an attribute) to a mental concept evoked by this word and meant to refer to a specific object in the real...
Preprint
Full-text available
Topic modelling is a popular approach for clustering text documents. A variety of different types of regularization is implemented in topic modelling. In this paper we propose a novel approach for analyzing the influence of different regularization types on results of topic modelling. Based on Renyi entropy, this approach is inspired by the concept...
Preprint
Full-text available
This paper describes our submission to SemEval-2019 Task 7: RumourEval: Determining Rumor Veracity and Support for Rumors. We participated in both subtasks. The goal of subtask A is to classify the type of interaction between a rumorous social media post and a reply post as support, query, deny, or comment. The goal of subtask B is to predict the v...
Conference Paper
Text predictions play an important role in improving the performance of gaze-based text entry systems. However, visual search, scanning, and selection of text predictions require a shift in the user's attention from the keyboard layout. Hence the spatial positioning of predictions becomes an imperative aspect of the end-user experience. In this wor...
Article
Full-text available
In this paper we apply multifractal formalism to the analysis of statistical behaviour of topic models under condition of varying number of topics. Our analysis reveals the existence of two self-similar regions and one transition region in the function of density-of-states depending on the number of topics. As earlier a function that can be express...
Preprint
Full-text available
Taxonomies are semantic hierarchies of concepts. One limitation of current taxonomy learning systems is that they define concepts as single words. This position paper argues that contextualized word representations, which recently achieved state-of-the-art results on many competitive NLP tasks, are a promising method to address this limitation. We...
Article
Full-text available
Graph-based data models allow for flexible data representation. In particular, semantic data based on RDF and OWL fuels use cases ranging from general knowledge graphs to domain specific knowledge in various technological or scientific domains. The flexibility of such approaches, however, makes programming with semantic data tedious and error-prone...
Preprint
Full-text available
1 Abstract Decision-Making Software (D-MS) may exhibit biases against people on grounds of protected characteristics such as gender and ethnicity. Such undesirable behavior should not only be detected but also explained. To avoid complicated explanations and expensive fixes, fairness awareness has to be proactively embedded in the design phase of t...
Article
p>In distributed RDF stores triples are assigned to one or several storage and compute nodes. In order to perform query planning and optimization, statistical information about the occurrences of IRIs and literals on the individual storage and compute nodes is needed. In this paper, we present our novel compressed storage format for statistical inf...
Chapter
Full-text available
Current trends, like digital transformation and ubiquitous computing, yield in massive increase in available data and information. In artificial intelligence (AI) systems, capacity of knowledge bases is limited due to computational complexity of many inference algorithms. Consequently, continuously sampling information and unfiltered storing in kno...