Lora Aroyo

Lora Aroyo
Vrije Universiteit Amsterdam | VU · Department of Computer Science

associate professor

About

271
Publications
43,039
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,887
Citations
Additional affiliations
September 2006 - present
Vrije Universiteit Amsterdam
Position
  • Professor (Associate)
September 2001 - August 2006
Eindhoven University of Technology
Position
  • Professor (Assistant)
September 2001 - September 2003
University of Twente
Position
  • Research Associate

Publications

Publications (271)
Article
The rapid entry of machine learning approaches in our daily activities and high-stakes domains demands transparency and scrutiny of their fairness and reliability. To help gauge machine learning models' robustness, research typically focuses on the massive datasets used for their deployment, e.g., creating and maintaining documentation for understa...
Preprint
Full-text available
The rapid entry of machine learning approaches in our daily activities and high-stakes domains demands transparency and scrutiny of their fairness and reliability. To help gauge machine learning models' robustness, research typically focuses on the massive datasets used for their deployment, e.g., creating and maintaining documentation for understa...
Preprint
Full-text available
Many questions that we ask about the world do not have a single clear answer, yet typical human annotation set-ups in machine learning assume there must be a single ground truth label for all examples in every task. The divergence between reality and practice is stark, especially in cases with inherent ambiguity and where the range of different sub...
Preprint
Full-text available
Conversational AI systems exhibit a level of human-like behavior that promises to have profound impacts on many aspects of daily life -- how people access information, create content, and seek social support. Yet these models have also shown a propensity for biases, offensive language, and conveying false information. Consequently, understanding an...
Preprint
Full-text available
Machine learning approaches often require training and evaluation datasets with a clear separation between positive and negative examples. This risks simplifying and even obscuring the inherent subjectivity present in many tasks. Preserving such variance in content and diversity in datasets is often expensive and laborious. This is especially troub...
Preprint
Full-text available
The generative AI revolution in recent years has been spurred by an expansion in compute power and data quantity, which together enable extensive pre-training of powerful text-to-image (T2I) models. With their greater capabilities to generate realistic and creative content, these T2I models like DALL-E, MidJourney, Imagen or Stable Diffusion are re...
Preprint
Full-text available
In this paper, we present findings from an semi-experimental exploration of rater diversity and its influence on safety annotations of conversations generated by humans talking to a generative AI-chat bot. We find significant differences in judgments produced by raters from different geographic regions and annotation platforms, and correlate these...
Preprint
Full-text available
Machine learning (ML) research has generally focused on models, while the most prominent datasets have been employed for everyday ML tasks without regard for the breadth, difficulty, and faithfulness of these datasets to the underlying problem. Neglecting the fundamental importance of datasets has caused major problems involving data cascades in re...
Article
This forum provides a space to engage with the challenges of designing for intelligent algorithmic experiences. We invite articles that tackle the tensions between research and practice when integrating AI and UX design. We welcome interdisciplinary debate, artful critique, forward-looking research, case studies of AI in practice, and speculative d...
Article
Full-text available
Successful knowledge graphs (KGs) solved the historical knowledge acquisition bottleneck by supplanting the previous expert focus with a simple, crowd-friendly one: KG nodes represent popular people, places, organizations, etc., and the graph arcs represent common sense relations like affiliations, locations, etc. Techniques for more general, categ...
Preprint
Full-text available
The standard method of evaluating an ML model uses random (uniform or stratified) 1 sampling to form the test sets. This approach to test set design is the basis for 2 determining the state-of-the-art in publications and leaderboards, and has served 3 the community well through periods of early growth and development. However, 4 as these models are...
Preprint
Full-text available
The efficacy of machine learning (ML) models depends on both algorithms and data. Training data defines what we want our models to learn, and testing data provides the means by which their empirical progress is measured. Benchmark datasets define the entire world within which models exist and operate, yet research continues to focus on critiquing a...
Article
Successful knowledge graphs (KGs) solved the historical knowledge acquisition bottleneck by supplanting an expert focus with a simple, crowd-friendly one: KG nodes represent popular people, places, organizations, etc., and the graph arcs represent common sense relations like affiliations, locations, etc. Techniques for more general, categorical, KG...
Preprint
Full-text available
We present a new approach to interpreting IRR that is empirical and contextualized. It is based upon benchmarking IRR against baseline measures in a replication, one of which is a novel cross-replication reliability (xRR) measure based on Cohen's kappa. We call this approach the xRR framework. We opensource a replication dataset of 4 million human...
Article
The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotat...
Article
Full-text available
The AI Bookie column documents highlights from AI Bets, an online forum for the creation of adjudicatable predictions and bets about the future of artificial intelligence. Although it is easy to make a prediction about the future, this forum was created to help researchers craft predictions whose accuracy can be clearly and unambiguously judged whe...
Preprint
Video summaries or highlights are a compelling alternative for exploring and contextualizing unprecedented amounts of video material. However, the summarization process is commonly automatic, non-transparent and potentially biased towards particular aspects depicted in the original video. Therefore, our aim is to help users like archivists or colle...
Preprint
In this paper we present the first steps towards hardening the science of measuring AI systems, by adopting metrology, the science of measurement and its application, and applying it to human (crowd) powered evaluations. We begin with the intuitive observation that evaluating the performance of an AI system is a form of measurement. In all other sc...
Article
With the increase of cultural heritage data published online, the usefulness of data in this open context hinges on the quality and diversity of descriptions of collection objects. In many cases, existing descriptions are not sufficient for retrieval and research tasks, resulting in the need for more specific annotations. However, eliciting such an...
Conference Paper
In this tutorial, we introduce a novel crowdsourcing methodology called CrowdTruth [1, 9]. The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus provide more reliable, realistic and inclusive real-world annotated data for training and evaluating...
Conference Paper
Discussing things you care about can be difficult, especially via online platforms, where sharing your opinion leaves you open to the real and immediate threats of abuse and harassment. Due to these threats, people stop expressing themselves and give up on seeking different opinions. Recent research efforts focus on examining the strengths and weak...
Preprint
Full-text available
We present a resource for the task of FrameNet semantic frame disambiguation of over 5,000 word-sentence pairs from the Wikipedia corpus. The annotations were collected using a novel crowdsourcing approach with multiple workers per sentence to capture inter-annotator disagreement. In contrast to the typical approach of attributing the best single f...
Article
Full-text available
The AI Bookie column documents highlights from AI Bets, an online forum for the creation of adjudicatable predictions, in the form of bets, about the future of AI. While it is easy to make broad, generalized, or off-the-cuff predictions about the future, it is more difficult to develop predictions that are carefully thought out, concrete, and measu...
Article
The Workshop Program of the Association for the Advancement of Artificial Intelligence’s Sixth AAAI Conference on Human Computation and Crowdsourcing was held on the campus of the University of Zurich in Zurich, Switzerland on 5 July 2018. There were three full-day workshops in the program: CrowdBias: Disentangling the Relation between Crowdsourcin...
Conference Paper
Information Retrieval systems rely on large test collections to measure their effectiveness in retrieving relevant documents. While the demand is high, the task of creating such test collections is laborious due to the large amounts of data that need to be annotated, and due to the intrinsic subjectivity of the task itself. In this paper we study t...
Conference Paper
Full-text available
In this paper, automatic homophone-and ho-mograph detection are suggested as new useful features for humor recognition systems. The system combines style-features from previous studies on humor recognition in short text with ambiguity-based features. The performance of two potentially useful homograph detection methods is evaluated using crowdsourc...
Preprint
Full-text available
The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotat...
Conference Paper
Online video constitutes the largest, continuously growing portion of the Web content. Web users drive this growth by massively sharing their personal stories on social media platforms as compilations of their daily visual memories, or with animated GIFs and memes based on existing video material. Therefore, it is crucial to gain understanding of t...
Preprint
Full-text available
Distant supervision is a popular method for performing relation extraction from text that is known to produce noisy labels. Most progress in relation extraction and classification has been made with crowdsourced corrections to distant-supervised labels, and there is evidence that indicates still more would be better. In this paper, we explore the p...
Article
An increasing number of cultural heritage institutions publish data online. Ontologies can be used to structure published data, thereby increasing interoperability. To achieve widespread adoption of ontologies, institutions such as libraries, archives and museums have to be able to assess whether an ontology can adequately capture information about...
Preprint
Full-text available
Typically crowdsourcing-based approaches to gather annotated data use inter-annotator agreement as a measure of quality. However, in many domains, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. In this paper, we present ongoing work into the CrowdTruth metrics, that capture and interpret inter-an...
Article
FrameNet is a computational linguistics resource composed of semantic frames, high-level concepts that represent the meanings of words. In this paper, we present an approach to gather frame disambiguation annotations in sentences using a crowdsourcing approach with multiple workers per sentence to capture inter-annotator disagreement. We perform an...
Preprint
Full-text available
FrameNet is a computational linguistics resource composed of semantic frames, high-level concepts that represent the meanings of words. In this paper, we present an approach to gather frame disambiguation annotations in sentences using a crowdsourcing approach with multiple workers per sentence to capture inter-annotator disagreement. We perform an...
Conference Paper
Full-text available
It is our great pleasure to welcome you to the WWW 2018 Augmenting Intelligence with Humans-in-the-loop ([email protected]), http://w3id.org/huml/HumL-WWW2018/ The workshop program includes two invited talks. Praveen Paritosh (Google Research) explores the right incentives to motivate human contribution to create knowledge resources. Elena Simperl...
Conference Paper
AI and collective intelligence systems universally suffer from a deficiency of context. There are innumerable possible contexts that may possibly change the interpretation of some signal, that may change the proper response to some stimulus. For example, an image understanding system that does not recognize an arrest event in a zoomed image of a pe...
Article
Full-text available
This editorial paper introduces a special issue that solicited papers at the intersection of Semantic Web and Human Computation research. Research in that inter-disciplinary space dates back a decade, and has been acknowledged as a research line of its own by a seminal research manifesto published in 2015. But where do we stand in 2018? How did thi...
Article
Full-text available
The main challenge for cognitive computing systems, and specifically for their natural language processing, video and image analysis components, is to be provided with large amounts of training and evaluation data. The traditional process for gathering ground truth data is lengthy, costly, and time consuming: (i) expert annotators are not always av...
Conference Paper
Scholars currently have access to large heterogeneous media collections on the Web, which they use as sources for their research. Exploration of such collections is an important part in their research, where scholars make sense of these heterogeneous datasets. Knowledge graphs which relate media objects, people and places with historical events can...
Article
Full-text available
Distant supervision (DS) is a well-established method for relation extraction from text, based on the assumption that when a knowledge-base contains a relation between a term pair, then sentences that contain that pair are likely to express the relation. In this paper, we use the results of a crowdsourcing relation extraction task to identify two p...
Article
Full-text available
With more and more cultural heritage data being published online, their usefulness in this open context depends on the quality and diversity of descriptive metadata for collection objects. In many cases, existing metadata is not adequate for a variety of retrieval and research tasks and more specific annotations are necessary. However, eliciting su...
Conference Paper
Climate change, vaccination, abortion, Trump: Many topics are surrounded by fierce controversies. The nature of such heated debates and their elements have been studied extensively in the social science literature. More recently, various computational approaches to controversy analysis have appeared, using new data sources such as Wikipedia, which...
Conference Paper
In the digital era, personalisation systems are the typical way to deal with the massive amount of information on the Web. ese systems decide in our place what we like, possibly hiding us away from a complete world of potentially interesting content. ese systems do not challenge us to open our horizons of interest, trap- ping us more and more in ou...
Article
Full-text available
Climate change, vaccination, abortion, Trump: Many topics are surrounded by fierce controversies. The nature of such heated debates and their elements have been studied extensively in the social science literature. More recently, various computational approaches to controversy analysis have appeared, using new data sources such as Wikipedia, which...
Conference Paper
Over the last years, information extraction tools have gained a great popularity and brought significant performance improvement in extracting meaning from structured or unstructured data. For example, named entity recognition (NER) tools identify types such as people, organizations or places in text. However, despite their high F1 performance, NER...
Article
Full-text available
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. One such GWAP is Waisda?, a video labeling game where players tag streaming video and win points by reaching consensus on tags with other players. The open-ended and unconstrained manner of tagging in the fast-paced s...
Article
Games with a purpose (GWAPs) are increasingly used in audio-visual collections as a mechanism for annotating videos through tagging. One such GWAP is Waisda?, a video labeling game where players tag streaming video and win points by reaching consensus on tags with other players. The open-ended and unconstrained manner of tagging in the fast-paced s...
Conference Paper
In this paper, we propose a model to operationalise serendipity in content-based recommender systems. The model, called SIRUP, is inspired by the Silvia's curiosity theory, based on the fundamental theory of Berlyne, aims at (1) measuring the novelty of an item with respect to the user profile, and (2) assessing whether the user is able to manage s...
Article
Full-text available
In collaborativeWeb-based platforms, user reputation scores are generally computed according to two orthogonal perspectives: (a) helpfulness-based reputation (HBR) scores and (b) centrality-based reputation (CBR) scores. InHBR approaches, the most reputable users are those who post the most helpful reviews according to the opinion of the members of...
Article
Many museums are currently providing online access to their collections. The state of the art research in the last decade shows that it is beneficial for institutions to provide their datasets as Linked Data in order to achieve easy cross-referencing, interlinking and integration. In this paper, we present the Rijksmuseum linked dataset (accessible...
Article
Cognitive computing systems require human labeled data for evaluation, and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground...
Conference Paper
Automatic estimation of the quality of Web documents is a challenging task, especially because the definition of quality heavily depends on the individuals who define it, on the context where it applies, and on the nature of the tasks at hand. Our long-term goal is to allow automatic assessment of Web document quality tailored to specific user requ...
Conference Paper
A viewpoint is a triple consisting of an entity, a topic related to this entity and sentiment towards this topic. In time-aware multi-viewpoint summarization one monitors viewpoints for a running topic and selects a small set of informative documents. In this paper, we focus on time-aware multi-viewpoint summarization of multilingual social text st...
Conference Paper
We present a framework for assessing the quality of Web documents, and a baseline of three quality dimensions: trustworthiness, objectivity and basic scholarly quality. Assessing Web document quality is a "deep data" problem necessitating approaches to handle both data size and complexity.
Conference Paper
Crowdsourcing has proved to be a feasible way of harnessing human computation for solving complex problems. However, crowdsourcing frequently faces various challenges: data handling, task reusability, and platform selection. Domain scientists rely on eScientists to find solutions for these challenges. CrowdTruth is a framework that builds on existi...
Conference Paper
Gathering training and evaluation data for open domain tasks, such as general question answering, is a challenging task. Typically, ground truth data is provided by human expert annotators, however, in an open domain experts are difficult to define. Moreover, the overall process for annotating examples can be lengthy and expensive. Naturally, crowd...
Article
DIVE is a linked-data digital cultural heritage collection browser. It was developed to provide innovative access to heritage objects from heterogeneous collections, using historical events and narratives as the context for searching, browsing and presenting of individual and group of objects. This paper describes the DIVE Web Demonstrator1. We als...
Article
Full-text available
Since 2007, the PATCH workshop series (https://patchworkshopseries.wordpress.com/) have been gathering successfully researchers and professionals from various countries and institutions to discuss the topics of digital access to cultural heritage and specifically the personalization aspects in this process. Due to this rich history, the reach of th...
Article
Full-text available
Big data is having a disruptive impact across the sciences. Human annotation of semantic interpretation tasks is a critical part of big data semantics, but it is based on an antiquated ideal of a single correct truth that needs to be similarly disrupted. We expose seven myths about human annotation, most of which derive from that antiquated ideal o...
Conference Paper
In this study we consider wether, and to what extent, additional semantics in the form of Linked Data can help diversifying search results. We undertake this study in the domain of cultural heritage. The data consists of collection data of the Rijksmuseum Amsterdam together with a number of relevant external vocabularies, which are all published as...
Conference Paper
Full-text available
In this paper we introduce the CrowdTruth open-source software framework for machine-human computation, that implements a novel approach to gathering human annotation data for a variety of media (e.g. text, image, video). The CrowdTruth approach embodied in the software captures human semantics through a pipeline of four processes: a) combining var...
Article
Full-text available
Crowdsourcing is often used to gather annotated data for training and evaluating computational systems that attempt to solve cognitive problems, such as understanding Natural Language sentences. Crowd workers are asked to perform semantic interpretation of sentences to establish a ground truth. This has always been done under the assumption that ea...
Article
Large datasets such as Cultural Heritage collections require detailed annotations when digitised and made available online. Annotating different aspects of such collections requires a variety of knowledge and expertise which is not always possessed by the collection curators. Artwork annotation is an example of a knowledge intensive image annotatio...
Article
Recent years witnessed an explosion in the number and variety of data crowdsourcing initiatives. From OpenStreetMap to Amazon Mechanical Turk, developers and practitioners have been striving to create user interfaces able to effectively and efficiently support the creation, exploration, and analysis of crowdsourced information. The extensive usage...
Conference Paper
Full-text available
This paper presents a novel approach for Linked Data-based recommender systems by means of semantic patterns. We associate to each pattern the rating of the arrival book (0 or 1) and compute user profiles by aggregating, for each book in the user training set, the ratings of all the patterns pointing to that book. Ratings are aggregated by estimatin...
Conference Paper
Full-text available
The results of our exploratory study provide new insights to crowdsourcing knowledge intensive tasks. We designed and performed an annotation task on a print collection of the Rijksmuseum Amsterdam, involving experts and crowd workers in the domain-specific description of depicted flowers. We created a testbed to collect annotations from flower exp...
Conference Paper
Full-text available
Since 2007, the PATCH workshop series have been gathering successfully researchers and professionals from various countries and institutions to discuss the topics of digital access to cultural heritage and specifically the personalization aspects in this process. Due to this rich history, the reach of the PATCH workshop in various research communit...
Article
In this work we present an in-depth analysis of the user behaviors on different Social Sharing systems. We consider three popular platforms, Flickr, Delicious and StumbleUpon, and, by combining techniques from social network analysis with techniques from semantic analysis, we characterize the tagging behavior as well as the tendency to create frien...
Conference Paper
Full-text available
In this paper we explore the use of semantics to improve diversity in recommendations. We use semantic patterns extracted from Linked Data sources to surface new connections between items to provide diverse recommendations to the end users. We evaluate this methodology by adopting a bottom-up approach, i.e. we ask users of a crowdsourcing platform...