
Wolf-Tilo BalkeTechnische Universität Braunschweig · Institut für Informationssysteme
Wolf-Tilo Balke
Professor
About
254
Publications
24,013
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,021
Citations
Introduction
Additional affiliations
January 2009 - August 2020
Forschungszentrum L3S
Position
- Managing Director
April 2008 - August 2020
January 2005 - March 2008
Education
January 1998 - March 2001
November 1991 - December 1997
Publications
Publications (254)
Recently the usage of narratives as a means of fusing information from large knowledge graphs (KGs) into a coherent line of argumentation has been proposed. Narratives are especially useful in event-centric knowledge graphs in that they provide a means to categorize real-world events by well-known narrations. However, specifically for controversial...
The amount of information in digital libraries (DLs) has been experiencing rapid growth. With the intense competition for research breakthroughs, researchers often intentionally or unintentionally fail to adhere to scientific standards, leading to the retraction of scientific articles. When a paper gets retracted, all its citing articles have to be...
Digital libraries oftentimes provide access to historical newspaper archives via keyword-based search. Historical figures and their roles are particularly interesting cognitive access points in historical research. Structuring and clustering news articles would allow more sophisticated access for users to explore such information. However, real-wor...
Digital libraries oftentimes provide access to historical newspaper archives via keyword-based search. Historical figures and their roles are particularly interesting cognitive access points in historical research. Structuring and clustering news articles would allow more sophisticated access for users to explore such information. However, real-wor...
Background
Healthcare providers have to make ethically complex clinical decisions which may be a source of stress. Researchers have recently introduced Artificial Intelligence (AI)-based applications to assist in clinical ethical decision-making. However, the use of such tools is controversial. This review aims to provide a comprehensive overview o...
Information extraction can support novel and effective access paths for digital libraries. Nevertheless, designing reliable extraction workflows can be cost-intensive in practice. On the one hand, suitable extraction methods rely on domain-specific training data. On the other hand, unsupervised and open extraction methods usually produce not-canoni...
Finding relevant publications in the scientific domain can be quite tedious: Accessing large-scale document collections often means to formulate an initial keyword-based query followed by many refinements to retrieve a sufficiently complete, yet manageable set of documents to satisfy one’s information need. Since keyword-based search limits researc...
Providing effective access paths to content is a key task in digital libraries. Oftentimes, such access paths are realized through advanced query languages, which, on the one hand, users may find challenging to learn or use, and on the other, requires libraries to convert their content into a high quality structured representation. As a remedy, nar...
Designing keyword-based access paths is a common practice in digital libraries. They are easy to use and accepted by users and come with moderate costs for content providers. However, users usually have to break down the search into pieces if they search for stories of interest that are more complex than searching for a few keywords. After searchin...
Designing keyword-based access paths is a common practice in digital libraries. They are easy to use and accepted by users and come with moderate costs for content providers. However, users usually have to break down the search into pieces if they search for stories of interest that are more complex than searching for a few keywords. After searchin...
Our lives are ruled by events of varying importance ranging from simple everyday occurrences to incidents of societal dimension. And a lot of effort is taken to exchange information and discuss about such events: generally speaking, stringent narratives are formed to reduce complexity. But when considering complex events like the current conflict b...
Information extraction can support novel and effective access paths for digital libraries. Nevertheless, designing reliable extraction workflows can be cost-intensive in practice. On the one hand, suitable extraction methods rely on domain-specific training data. On the other hand, unsupervised and open extraction methods usually produce not-canoni...
Knowledge bases allow effective access paths in digital libraries. Here users can specify their information need as graph patterns for precise searches and structured overviews (by allowing variables in queries). But especially when considering textual sources that contain narrative information, i.e., short stories of interest, harvesting statement...
Finding relevant publications in the scientific domain can be quite tedious: Accessing large-scale document collections often means to formulate an initial keyword-based query followed by many refinements to retrieve a sufficiently complete, yet manageable set of documents to satisfy one’s information need. Since keyword-based search limits researc...
Knowledge graph embeddings that generate vector space representations of knowledge graph triples, have gained considerable popularity in past years. Several embedding models have been proposed that achieve state-of-the-art performance for the task of triple completion in knowledge graphs. Relying on the presumed semantic capabilities of the learned...
State-of-the-art approaches in the field of neural embedding models (NEMs) enable progress in the automatic extraction and prediction of semantic relations between important entities like active substances, diseases, and genes. In particular, the prediction property is making them valuable for important research-related tasks such as hypothesis gen...
Providing a plethora of entity-centric information Knowledge Graphs have become a vital building block for a variety of intelligent applications.
Indeed, modern knowledge graphs like Wikidata already capture several billions of RDF triples, yet they still lack a good coverage for most relations.
On the other hand, recent developments in NLP researc...
Providing a plethora of entity-centric information, Knowledge Graphs have become a vital building block for a variety of intelligent applications. Indeed, modern knowledge graphs like Wikidata already capture several billions of RDF triples, yet they still lack a good coverage for most relations. On the other hand, recent developments in NLP resear...
An important part of the scientific discourse is the exchange of knowledge in the form of stringent, well-arranged, and interconnected arguments. These ‘scientific storylines’ allow to put central entities, observations, experiments, etc. into perspective and thus ease the understanding of underlying mechanisms, dependencies, or theories. Moreover,...
State-of-the-art approaches in the field of neural-embedding models (NEMs) enable progress in the automatic extraction and prediction of semantic relations between important entities like active substances, diseases, and genes. In particular, the prediction property is making them valuable for important research-related tasks such as hypothesis gen...
State-of-the-art approaches in the field of neural-embedding models (NEMs) enable progress in the automatic extraction and prediction of semantic relations between important entities like active substances, diseases, and genes. In particular, the prediction property is making them valuable for important research-related tasks such as hypothesis gen...
Currently, a trend to augment document collections with entity-centric knowledge provided by knowledge graphs is clearly visible, especially in scientific digital libraries. Entity facts are either manually curated, or for higher scalability automatically harvested from large volumes of text documents. The often claimed benefit is that a collection...
Word embeddings enable state-of-the-art NLP workflows in important tasks including semantic similarity matching, NER, question answering, and document classification. Recently also the biomedical field started to use word embeddings to provide new access paths for a better understanding of pharmaceutical entities and their relationships, as well as...
State of the Art Neural Language Models (NLMs) such as Word2Vec are becoming increasingly successful for important biomedical tasks such as the literature-based prediction of complex chemical properties or for finding novel drug-disease associations (DDAs). However, NLMs have the disadvantage of being hard to interpret. Therefore, it is notoriously...
Knowledge graphs have become an essential source of entity-centric information for modern applications. Today’s KGs have reached a size of billions of RDF triples extracted from a variety of sources, including structured sources and text. While this definitely improves completeness, the inherent variety of sources leads to severe heterogeneity, neg...
State-of-the-art approaches in the field of neural-language models (NLMs) enable progress in the automatic extraction and prediction of semantic relations between important entities like active substances, diseases, and genes. In particular, the prediction property is making them valuable for important research-related tasks such as hypothesis gene...
Research into COVID-19 is a big challenge and highly relevant at the moment. New tools are required to assist medical experts in their research with relevant and valuable information. The COVID-19 Open Research Dataset Challenge (CORD-19) is a "call to action" for computer scientists to develop these innovative tools. Many of these applications are...
Every day large quantities of spatio-temporal data are captured, whether by Web-based companies for social data mining or by other industries for a variety of applications ranging from disaster relief to marine data analysis. Making sense of all this data dramatically increases the need for intelligent backend systems to provide realtime query resp...
Knowledge graphs have become an essential source of entity-centric information for modern applications. Today's KGs have reached a size of billions of RDF triples extracted from a variety of sources, including structured sources and text. While this definitely improves completeness , the inherent variety of sources leads to severe heterogeneity, ne...
This paper presents a formalization and extension of a novel approach to support high-quality content in digital libraries. Building on the concept of plausibility used in cognitive sciences, we aim at judging the plausibility of new scientific papers in light of prior knowledge. In particular, our work proposes a novel assessment of scientific pap...
Time series analysis is a technique widely employed in space science. In unpredictable environments like space, scientific analysis relies on large data sets to enable interpretation of observations. Artificial signal interferences caused by the spacecraft itself further impede this process. The most time consuming part of these studies is the effi...
Zusammenfassung
Die Universitätsbibliothek Braunschweig und das Institut für Informationssysteme (IfIS) der Technischen Universität Braunschweig kooperieren bei der Entwicklung innovativer Zugriffspfade für Informationsressourcen für den Fachbereich Pharmazie. Auf informationswissenschaftlicher Grundlagenforschung aufbauend werden Prototypen entwic...
Entity-centric information resources in the form of huge RDF knowledge graphs have become an important part of today’s information systems. But while the integration of independent sources promises rich information, their inherent heterogeneity also poses threats to the overall usefulness. To some degree challenges of heterogeneity have been addres...
Entity-centric information resources in the form of huge RDF knowledge graphs have become an important part of today's information systems. But while the integration of independent sources promises rich information, their inherent heterogeneity also poses threats to the overall usefulness. To some degree challenges of heterogeneity have been addres...
As more papers get included in Digital collections satisfying information needs is becoming harder. In particular, when the user searches for information beyond bibliographic metadata. The situation is even worse when the information need requires a key aspect of a paper that first needs to be annotated for indexing purposes and thus, allow searchi...
The exponential growth of publications in medical digital libraries requires new access paths that go beyond term-based searches, as these increasingly lead to thousands of results. An effective approach for this problem is to extract important pharmaceutical entities and their relations to each other in order to reveal the embedded knowledge in di...
We consider the novel problem of learning to rank claim-evidence pairs to ease the task of scientific argumentation. Researchers face daily scientific argumentation when writing research papers or project proposals. Once confronted with a sentence that requires a citation, they struggle to find the manuscript that can support it. In this work, we c...
The exponential increase of scientific publications in the bio-medical field challenges access to scientific information, which primarily is encoded by semantic relationships between medical entities, such as active ingredients, diseases, or genes. Neural language models, such as Word2Vec, offer new ways of automatically learning semantically meani...
The exponential increase of scientific publications in the bio-medical field challenges access to scientific information, which primarily is encoded by semantic relationships between medical entities, such as active ingredients, diseases, or genes. Neural language models, such as Word2Vec, offer new ways of automatically learning semantically meani...
The exponential increase of scientific publications in the medical field urgently calls for innovative access paths beyond the limits of a term-based search. As an example, the search term “diabetes” leads to a result of over 600,000 publications in the medical digital library PubMed. In such cases, the automatic extraction of semantic relations be...
With the increasing amount of user-generated content such as scientific blogs, questioning-answering archives (Quora or Stack Overflow), and Wikipedia, the challenge to evaluate quality naturally arises. Previous work has shown the potential to evaluate automatically such content focusing on syntactic and pragmatic levels such as conciseness, organ...
Graph database query languages feature expressive, yet computationally expensive pattern matching capabilities. Answering optional query clauses in SPARQL for instance renders the query evaluation problem immediately Pspace-complete. Therefore, light-weight graph pattern matching relations, such as simulation, have recently been investigated as pro...
In this paper, we promote the idea of automatic semantic characterization of scientific claims to explore entity-entity relationships in Digital collections. Our proposed approach aims at alleviating time-consuming analysis of query results when the information need is not just one document but an overview over a set of documents. With the semantic...
In contrast to heavy-handed ER-style data models in relational databases, knowledge graphs (or graph databases) capture entity semantics in terms of entity relationships and properties following a simple collect-as-you-go model. While this allows for a more flexible and dynamically adaptable knowledge representation, it comes at the price of more c...
The benefits of crowdsourcing for data science have furthered its widespread use over the past decade. Yet fraudulent workers undermine the emerging crowdsourcing economy: requestors face the choice of either risking low quality results or having to pay extra money for quality safeguards like e.g., gold questions or majority voting. Obviously, the...
Crowdsourcing's ability to forge new digital, and thus not location-bound, job opportunities spurred many visions of crowdsourcing's social impact as an answer to failing economies and recessions, especially in developing countries. Yet, did the digital solution take the business world by storm and redefine the classical business process? Did it in...
Today’s growth of linked open data (LOD) sources calls for summarization systems to help users to navigate through large volumes of data. A major task is entity summarization, where a most meaningful subset of all available information about entities has to be selected. In particular, the selected information has to characterize each entity with hi...
This paper introduces the novel problem of ‘claim-based queries’ and how digital libraries can be enabled to solve it. Claim-based queries need the identification of a key aspect of research papers: claims. Today, claims are hidden in its unstructured, free text representation within research documents. In this work, a claim is a sentence that cons...
Understanding the possible associations between two entities from a query is a hard problem. For instance, querying “coffee” and “cancer” even in a curated Digital Library is a challenge to the retrieval system that struggles to figure out the intention of the query. Maybe the user wants a consensus of what it is known? But how many different assoc...
Alternative access paths to literature beyond mere keyword or bibliographic search are a major success factor in today’s digital libraries. Especially in the sciences, users are in dire need of complex knowledge spaces and facettations where entities like e.g., chemical substances, genes, or mathematical formulae may play a central role. However, e...
Querying graph databases often amounts to some form of graph pattern matching. Finding (sub-)graphs isomorphic to a given graph pattern is common to many graph query languages, even though graph isomorphism often is too strict, since it requires a one-to-one correspondence between the nodes of the pattern and that of a match. We investigate the inf...
Presented herein is a novel approach to support high quality content in Digital Libraries by introducing the notion of Plausibility of new scientific papers when contrasted with prior knowledge. In particular, our work proposes a novel assessment of scientific papers to support the workload of reviewers. The proposed approach focus on a core compon...
Crowdsourcing have been gaining increasing popularity as a highly distributed digital solution that surpasses both borders and time-zones. Moreover, it extends economic opportunities to developing countries, thus answering the call of impact sourcing in alleviating the welfare of poor labor in need. Nevertheless, it is constantly criticized for the...
Der Fachinformationsdienst (FID) Pharmazie verfolgt das Ziel, die Informationsinfrastruktur und die Literaturversorgung für die pharmazeutische Hochschulforschung nachhaltig zu verbessern. Das Projekt wird seit dem 1. Januar 2015 von der Deutschen Forschungsgemeinschaft gefördert. Eine Besonderheit stellt die Kooperation zwischen der Universitätsbi...
Not only do the highly-distributed digital crowdsourcing solutions surpass both borders and time-zones, but they materialize the vision of impact sourcing, by tapping into new labor markets in developing countries. Unfortunately, crowdsourcing is associated with severe quality issues. To that end, many countermeasures have been designed to detect s...
Today, crowdsourcing has emerged as a promising paradigm for annotating, structuring, and managing Web data. Still, as long as the problem of the crowd workers’ trustworthiness in terms of result quality is not essentially solved, all these efforts remain doubtful. Therefore, in this paper we look at today’s dominant quality assurance techniques an...
The high abundance of genetic information enables researchers to gain new insights from the comparison of human genes according to their similarities. However, existing tools that allow the exploration of such gene-to-gene relationships, apply each similarity independently. To make use of multidimensional scoring, we developed a new search engine n...
Today Linked Open Data is a central trend in information provisioning. Data is collected in distributed data stores, individually curated with high quality, and made available over the Web for a wide variety of Web applications providing their own business logic for data utilization. Thus, the key promise of Linked Open Data is to provide a holisti...
Efforts to make highly specialized knowledge accessible through scientific digital libraries need to go beyond mere bibliographic metadata, since here information search is mostly entity-centric. Previous work has realized this trend and developed different methods to recognize and (to some degree even automatically) annotate several important type...
The most important goal for digital libraries is to ensure high quality search experience for all kinds of users. To attain this goal, it is necessary to have as much relevant metadata as possible at hand to assess the quality of publications. Recently, a new group of metrics appeared, that has the potential to raise the quality of publication meta...
Entity-centric search has become a demanding problem for many domains on the Web. In particular, the suitable contextualization of result documents poses challenges in terms of selecting most adequate indexing terms for later retrieval. This holds even more, if no generally recognized ontologies for the respective domain are available. In this pape...
Narrative interfaces promise to improve the user experience of interacting with information systems by adapting a powerful communication concept which comes natural in human interaction. In this paper, we outline how such a narrative information system which answers queries using elaborate stories can be realized. We show how to construct query-dep...
In recent years, crowd sourcing has emerged as a good solution for digitizing voluminous tasks. What’s more, it offers a social solution promising to extend economic opportunities to low-income countries, alleviating the welfare of poor, honest and yet uneducated labor. On the other hand, crowd sourcing’s virtual nature and anonymity encourages fra...
State-of-the-art faceted search graphical user interfaces for digital libraries provide a wide range of filters perfectly suitable for narrowing down results for well-defined user needs. However, they fail to deliver summarized overview information for users that need to familiarize themselves with a new scientific topic. In fact, exploratory searc...
This infographic gives statistical insights into the corpus used in the ICADL Paper "Large-Scale Experiments for Mathematical Document Classification" (2013)
Google’s Knowledge Graph offers structured summaries for entity searches. This provides a better user experience by focusing on the main aspects of the query entity only. But to do this Google relies on curated knowledge bases. In consequence, only entities included in such knowledge bases can benefit from such a feature. In this paper, we propose...
Living the economic dream of globalization in the form of a location- and time-independent world-wide employment market, today crowd sourcing companies offer affordable digital solutions to business problems. At the same time, highly accessible economic opportunities are offered to workers, who often live in low or middle income countries. Thus, cr...
In recent years, crowdsourcing has become a powerful tool to bring human intelligence into information processing. This is especially important forWeb data which in contrast to well-maintained databases is almost always incomplete and may be distributed over a variety of sources. Crowdsourcing allows to tackle many problems which are not yet attain...
Building databases and information systems over data extracted from heterogeneous sources like the Web poses a severe challenge: most data is incomplete and thus difficult to process in structured queries. This is especially true for sophisticated query techniques like Top-k querying where rankings are aggregated over several sources. The intellige...
To extend the scope of retrieval and reasoning spanning several linked data stores, it is necessary to find out whether information in different collections actually points to the same real world object. Thus, data stores are interlinked through owl:sameAs relations. Unfortunately, this cross-linkage is not as extensive as one would hope. To remedy...
Crowdsourcing continues to gain more momentum as its potential becomes more recognized. Nevertheless, the associated quality aspect remains a valid concern, which introduces uncertainty in the results obtained from the crowd. We identify the different aspects that dynamically affect the overall quality of a crowdsourcing task. Accordingly, we propo...
Backed up by major Web players schema.org is the latest broad initiative for structuring Web information. Unfortunately, a representative analysis on a corpus of 733 million Web documents shows that, a year after its introduction, only 1.56% of documents featured any schema.org annotations. A probable reason is that providing annotations is quite t...
The Web has become the primary source of information containing both structured and unstructured information. A good example is e-commerce where products are usually described by technical specifications structured data and textual user reviews unstructured data. Both sources of information complement each other, covering quantifiable as well as pe...
Companies today are expected to engage in corporate social responsibility (CSR) and they spend a lot of time, money, and other re- sources on these tasks. However, in relation to their investment, the gain for most compa- nies is marginal, because their efforts are only perceived by a small number of people. In this paper, our goal is to improve on...