Wolfgang Nejdl

Wolfgang Nejdl
Forschungszentrum L3S

About

589
Publications
78,659
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
16,123
Citations

Publications

Publications (589)
Preprint
Full-text available
Social media deliberations allow to explore refugee-related is-sues. AI-based studies have investigated refugee issues mostly around a specific event and considered unimodal approaches. Contrarily, we have employed a multimodal architecture for probing the refugee journeys from their home to host nations. We draw insights from Arnold van Gennep's a...
Preprint
Full-text available
Entity Resolution suffers from quadratic time complexity. To increase its time efficiency, three kinds of filtering techniques are typically used for restricting its search space: (i) blocking workflows, which group together entity profiles with identical or similar signatures, (ii) string similarity join algorithms, which quickly detect entities m...
Chapter
In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking...
Preprint
Networks are ubiquitous in the real world. Link prediction, as one of the key problems for network-structured data, aims to predict whether there exists a link between two nodes. The traditional approaches are based on the explicit similarity computation between the compact node representation by embedding each node into a low-dimensional space. In...
Preprint
Full-text available
Graph Neural Networks (GNNs), which generalize traditional deep neural networks or graph data, have achieved state of the art performance on several graph analytical tasks like node classification, link prediction or graph classification. We focus on how trained GNN models could leak information about the \emph{member} nodes that they were trained...
Article
Full-text available
Web archiving is the process of collecting portions of the Web to ensure that the information is preserved for future exploitation. However, despite the increasing number of web archives worldwide, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into a usable and useful information...
Article
Full-text available
Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for...
Chapter
We propose a novel approach for learning node representations in directed graphs, which maintains separate views or embedding spaces for the two distinct node roles induced by the directionality of the edges. We argue that the previous approaches either fail to encode the edge directionality or their encodings cannot be generalized across tasks. Wi...
Article
Multi-view clustering has received an increasing attention in many applications, where different views of objects can provide complementary information to each other. Existing approaches on multi-view clustering mainly focus on extending Non-negative Matrix Factorization (NMF) by enforcing the constraint over the coefficient matrices from different...
Preprint
Full-text available
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and...
Preprint
Curated web archive collections contain fo-cused digital content which is collected by archiving organizations, groups and individuals to provide a representative sample covering specific topics and events to preserve them for future exploration and analysis. In this paper, we discuss how to best support collabo-rative construction and exploration...
Conference Paper
We present an efficient graph-based method for filtering tweets relevant to a given breaking news from large tweet streams. Unlike existing models that either require manual effort, strong supervision, and/or not scalable, our method can automatically and effectively filter incoming relevant tweets starting from just a small number of past relevant...
Chapter
Twitter has been heavily used for users to report and share information about real-world events. However, understanding the multiple aspects of an event as it happens is a very challenging task due to the prevalent noise and redundant in tweets as well as the evolution of the event. In this paper, we present a graph-based method for summarizing evo...
Conference Paper
Full-text available
Today algorithmic decision-making (ADM) is prevalent in several fields including medicine, the criminal justice system, financial markets etc. On the one hand, this is testament to the ever improving performance and capabilities of complex machine learning models. On the other hand, the increased complexity has resulted in a lack of transparency an...
Preprint
Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of...
Preprint
Archived collections of documents (like newspaper archives) serve as important information sources for historians, journalists, sociologists and other interested parties. Semantic Layers over such digital archives allow describing and publishing metadata and semantic information about the archived documents in a standard format (RDF), which in turn...
Preprint
Full-text available
We propose a novel approach for learning node representations in directed graphs, which maintains separate views or embedding spaces for the two distinct node roles induced by the directionality of the edges. In order to achieve this, we propose an alternating random walk strategy to generate training samples from the directed graph while preservin...
Conference Paper
Topic detection and tracking in document streams is a critical task in many important applications, hence has been attracting research interest in recent decades. With the large size of data streams, there have been a number of works from different approaches that propose automatic methods for the task. However, there is only a few small benchmark...
Preprint
Full-text available
Measuring entity relatedness is a fundamental task for many natural language processing and information retrieval applications. Prior work often studies entity relatedness in static settings and an unsupervised manner. However, entities in real-world are often involved in many different relationships, consequently entity-relations are very dynamic...
Conference Paper
Full-text available
Measuring entity relatedness is a fundamental task for many natural language processing and information retrieval applications. Prior work often studies entity relatedness in static settings and an unsupervised manner. However, entities in real-world are often involved in many different relationships, consequently entity-relations are very dynamic...
Conference Paper
Full-text available
In this demo paper, we introduce LogCanvas, a platform for user search history visualization. Different from the existing visualization tools, LogCanvas focuses on helping users re-construct the semantic relationship among their search activities. LogCanvas segments a user's search history into different sessions and generates a knowledge graph to...
Conference Paper
Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into usable sources of...
Article
Entity aspect recommendation is an emerging task in semantic search that helps users discover serendipitous and prominent information with respect to an entity, of which salience (e.g., popularity) is the most important factor in previous work. However, entity aspects are temporally dynamic and often driven by events happening over time. For such c...
Conference Paper
Full-text available
Entity aspect recommendation is an emerging task in semantic search that helps users discover serendipitous and prominent information with respect to an entity, of which salience (e.g., popularity) is the only important factor in previous work. However, entity aspects are temporally dynamic and often driven by happening events. For such cases, aspe...
Article
Full-text available
Curated web archive collections contain focused digital content which is collected by archiving organizations, groups, and individuals to provide a representative sample covering specific topics and events to preserve them for future exploration and analysis. In this paper, we discuss how to best support collaborative construction and exploration o...
Chapter
The availability of new massive datasets about traffic, coming from Smart Sensor Networks composed of Vehicles, Mobile Phones, and other GPS-equipped devices, is enabling the development of novel Intelligent Applications for Mobility. Among these, a hot and recent research topic is to discover vehicular traffic patterns from these datasets, to prov...
Conference Paper
Temporally annotated corpora about historic events can be crucial to digital humanities research: they allow to extract and date events as well as reactions to them, and to construct timelines of events and of language use, among other applications. However, producing a precise corpus of a particular event in history is very challenging due to the...
Conference Paper
Web archives have been instrumental in digital preservation of the Web and provide great opportunity for the study of the societal past and evolution. These Web archives are massive collections, typically in the order of terabytes and petabytes. Due to this, search and exploration of archives has been limited as full-text indexing is both resource...
Article
Full-text available
Social networks are becoming a valuable source of information for applications in many domains. In particular, many studies have highlighted the potential of social networks for early detection of epidemic outbreaks, due to their capability to transmit information faster than traditional channels, thus leading to quicker reactions of public health...
Article
Full-text available
Social media services such as Twitter are a valuable source of information for decision support systems. Many studies have shown that this also holds for the medical domain, where Twitter is considered a viable tool for public health officials to sift through relevant information for the early detection, management, and control of epidemic outbreak...
Conference Paper
An important editing policy in Wikipedia is to provide citations for added statements in Wikipedia pages, where statements can be arbitrary pieces of text, ranging from a sentence to a paragraph. In many cases citations are either outdated or missing altogether. In this work we address the problem of finding and updating news citations for statemen...
Conference Paper
Full-text available
Significant parts of cultural heritage are produced on the web during the last decades. While easy accessibility to the current web is a good baseline, optimal access to the past web faces several challenges. This includes dealing with large-scale web archive collections and lacking of usage logs that contain implicit human feedback most relevant f...
Conference Paper
Curated web archive collections contain focused digital contents which are collected by archiving organizations to provide a representative sample covering specific topics and events to preserve them for future exploration and analysis. In this paper, we discuss how to best support collaborative construction and exploration of these collections thr...
Conference Paper
Web archives are large longitudinal collections that store webpages from the past, which might be missing on the current live Web. Consequently, temporal search over such collections is essential for finding prominent missing webpages and tasks like historical analysis. However, this has been challenging due to the lack of popularity information an...
Conference Paper
Archives are an important source of study for various scholars. Digitization and the web have made archives more accessible and led to the development of several time-aware exploratory search systems. However these systems have been designed for more general users rather than scholars. Scholars have more complex information needs in comparison to g...
Conference Paper
The Web has been around and maturing for 25 years. The popular websites of today have undergone vast changes during this period, with a few being there almost since the beginning and many new ones becoming popular over the years. This makes it worthwhile to take a look at how these sites have evolved and what they might tell us about the future of...
Conference Paper
Recent advances of preservation technologies have led to an increasing number of Web archive systems and collections. These collections are valuable to explore the past of the Web, but their value can only be uncovered with effective access and exploration mechanisms. Ideal search and ranking methods must be robust to the high redundancy and the te...
Conference Paper
Search engines are the most utilized tools to access information on the Web. The success of large companies such as Google owes to their capacity to conduct users through the vast troves of knowledge and information online. Recently, the concept of search as research has been used to shift the research focus from workings of information-seeking too...
Article
Recounts the career and contributions of Martin Wolpers.
Conference Paper
Full-text available
Longitudinal corpora like newspaper archives are of immense value to historical research, and time as an important factor for historians strongly influences their search behaviour in these archives. While searching for articles published over time, a key preference is to retrieve documents which cover the important aspects from important points in...
Conference Paper
The recent availability of large amount of mobility data has fostered many research efforts to improve mobility prediction. Lots of these studies are focused on learning the impact of influencing factors on traffic, such as rush hour or accidents. Nevertheless, only very few have investigated the impact of Planned Special Events (PSEs), such as con...
Conference Paper
Emotion is fundamental to human experience and impacts our daily activities and decision-making processes where, e.g., the affective state of a user influences whether or not she decides to consume a recommended item - movie, book, product or service. However, information retrieval and recommendation tasks have largely ignored emotion as a source o...
Article
Full-text available
The World Wide Web is well established as a global information and communication medium. New technologies regularly come along which expand the forms of use and permit even inexperienced users to publish content or take part in discussions. For this reason the Web can also be seen as a good documenter of present-day society. The dynamism of the Web...
Conference Paper
In many cases, a user turns to search engines to find information about real-world situations, namely, political elections, sport competitions, or natural disasters. Such temporal querying behavior can be observed through a significant number of event-related queries generated in web search. In this paper, we study the task of detecting event-relat...
Conference Paper
Full-text available
With the reflection of nearly all types of social cultural, societal and everyday processes of our lives in the web, web archives from organizations such as the Internet Archive have the potential of becoming huge gold-mines for temporal content analytics of many kinds (e.g., on politics, social issues, economics or media). First hand evidences for...
Article
Full-text available
More than 45 % of the pages that we visit on the Web are pages that we have visited before. Browsers support revisits with various tools, including bookmarks, history views and URL auto-completion. However, these tools only support revisits to a small number of frequently and recently visited pages. Several browser plugins and extensions have been...
Article
Full-text available
The World Wide Web is well established as a global information and communication medium. New technologies regularly come along which expand the forms of use and permit even inexperienced users to publish content or take part in discussions. For this reason the Web can also be seen as a good documenter of present-day society. The dynamism of the Web...
Article
Each year makes it easier to accumulate large numbers of photos and videos in the social and personal digital space. Their long-term existence is mostly driven by chance rather than by clear guidelines or rules for archiving them. Thus, unfortunately, cases of nonintended both the exposure and disappearance of personal photos happen much too often....
Conference Paper
Multi-matrix factorization models provide a scalable and effective approach for multi-relational learning tasks such as link prediction, Linked Open Data (LOD) mining, recommender systems and social network analysis. Such models are learned by optimizing the sum of the losses on all relations in the data. Early models address the problem where ther...
Article
The recent availability of datasets on transportation networks with higher spatial and temporal resolution is enabling new research activities in the fields of Territorial Intelligence and Smart Cities. Among these, many research efforts are aimed at predicting traffic congestions to alleviate their negative effects on society, mainly by learning r...
Article
Full-text available
Heterogeneous content is an inherent problem for cross-system search, recommendation and personalization. In this paper we investigate differences in topic coverage and the impact of topics in different kinds of Web services. We use entity extraction and categorization to create fingerprints that allow for meaningful comparison. As a basis taxonomy...
Conference Paper
Full-text available
The problem of near-duplicate detection consists in finding those elements within a data set which are closest to a new input element, according to a given distance function and a given closeness threshold. Solving such problem for high-dimensional data sets is computationally expensive, since the amount of computation required to assess the simila...
Article
Full-text available
—Entity Resolution is an inherently quadratic task that typically scales to large data collections through blocking. In the context of highly heterogeneous information spaces, blocking methods rely on redundancy in order to ensure high effectiveness at the cost of lower efficiency (i.e., more comparisons). This effect is partially ameliorated by co...
Chapter
In this chapter we describe a design of compensations using forward recovery within Web service transactions. We introduce an approach to model compensation capabilities and requirements using feature models, which are the basis for defining compensation rules. These rules can be executed in a Web service environment that we extend with the concept...
Article
Out of thousands of names to choose from, picking the right one for your child is a daunting task. In this work, our objective is to help parents making an informed decision while choosing a name for their baby. We follow a recommender system approach and combine, in an ensemble, the individual rankings produced by simple collaborative filtering al...
Article
Social tags are known to be a valuable source of information for image retrieval and organization. However, contrary to the conventional document retrieval, rich tag frequency information in social sharing systems, such as Flickr, is not available, thus we cannot directly use the tag frequency (analogous to the term frequency in a document) to repr...
Article
Full-text available
An analysis of the social video sharing platform YouTube and the news aggregator Yahoo! News reveals the presence of vast amounts of community feedback through comments for published videos and news stories, as well as through meta ratings for these comments. This paper presents an in-depth study of commenting and comment rating behavior on a sampl...
Article
Full-text available
Crowdsourcing has become ubiquitous in machine learning as a cost effective method to gather training labels. In this paper we examine the challenges that appear when employing crowdsourcing for active learning, in an integrated environment where an automatic method and human labelers work together towards improving their performance at a certain t...
Conference Paper
Full-text available
The increasing adoption of Linked Data principles has led to an abundance of datasets on the Web. However, take-up and reuse is hindered by the lack of descriptive information about the nature of the data, such as their topic coverage, dynamics or evolution. To address this issue, we propose an approach for creating linked dataset profiles. A profi...