About
392
Publications
74,303
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,824
Citations
Publications
Publications (392)
We have built TIREx, the information retrieval experiment platform, to promote standardized, reproducible, scalable, and blinded retrieval experiments. Standardization is achieved through integration with PyTerrier's interfaces and compatibility with ir_datasets and ir_measures. Reproducibility and scalability are based on the underlying TIRA frame...
Evaluating the quality of arguments is a crucial aspect of any system leveraging argument mining. However, it is a challenge to obtain reliable and consistent annotations regarding argument quality, as this usually requires domain-specific expertise of the annotators. Even among experts, the assessment of argument quality is often inconsistent due...
Imagining future scenarios arising from events and (in)actions is crucial for democratic participation, but is often left to experts who have in-depth knowledge of, for example, social, political, environmental or technological trends. A widely accepted method for non-experts to think about future scenarios is to write fictional short stories set i...
Conversational search engines such as YouChat and Microsoft Copilot use large language models (LLMs) to generate responses to queries. It is only a small step to also let the same technology insert ads within the generated responses - instead of separately placing ads next to a response. Inserted ads would be reminiscent of native advertising and p...
Only few search engines index the Web at scale. Third parties who want to develop downstream applications based on web search fully depend on the terms and conditions of the few vendors. The public availability of the large-scale Common Crawl does not alleviate the situation, as it is often cheaper to crawl and index only a smaller collection focus...
Decision-making and opinion-forming are everyday tasks that involve weighing pro and con arguments. The goal of Touché is to foster the development of support-technologies for decision-making and opinion-forming and to improve our understanding of these processes. This fifth edition of the lab features three shared tasks: (1) Human value detection...
Many users of web search engines have been complaining in recent years about the supposedly decreasing quality of search results. This is often attributed to an increasing amount of search-engine-optimized but low-quality content. Evidence for this has always been anecdotal, yet it’s not unreasonable to think that popular online marketing strategie...
The paper gives a brief overview of the four shared tasks organized at the PAN 2024 lab on digital text forensics and stylometry to be hosted at CLEF 2024. The goal of the PAN lab is to advance the state-of-the-art in text forensics and stylometry through an objective evaluation of new and established methods on new benchmark datasets. Our four tas...
The ImageCLEF evaluation campaign was integrated with CLEF (Conference and Labs of the Evaluation Forum) for more than 20 years and represents a Multimedia Retrieval challenge aimed at evaluating the technologies for annotation, indexing, and retrieval of multimodal data. Thus, it provides information access to large data collections in usage scena...
Evaluating conversational search systems based on simulated user interactions is a potential approach to overcome one of the main problems of static conversational search test collections: the collections contain only very few of all the plausible conversations on a topic. Still, one of the challenges of user simulation is generating realistic foll...
A spectrum of human-artificial intelligence collaboration in assessing relevance.
Wikipedia’s influence in shaping public perceptions of science underscores the significance of scientists being recognized on the platform, as it can impact their careers. Although Wikipedia offers guidelines for determining when a scientist qualifies for their own article, it currently lacks guidance regarding whether a scientist should be acknowl...
Der Artikel diskutiert einerseits die KI-Nutzung in der Hochschullehre und andererseits das Forschende Lernen als hochschuldidaktische Methode. Beides wird zusammengebracht am Beispiel des Projeks SKILL (Sozialwissenschaftliches KI-Labor für Forschendes Lernen).
This report documents the program and the outcomes of Dagstuhl Seminar 23031 "Frontiers of Information Access Experimentation for Research and Education", which brought together 38 participants from 12 countries. The seminar addressed technology-enhanced information access (information retrieval, recommender systems, natural language processing) an...
Die Veröffentlichung von ChatGPT im Herbst 2022 heizte die Kontroverse um Künstliche Intelligenz an und führte zu einer seitdem unaufhörlichen Fragelust - verstärkt dadurch, dass dieselben Prompts schon kürzeste Zeit später andere Outputs generieren. In einem experimentellen Format präsentieren die Herausgeber*innen erste kommentierte Gespräche mit...
The paper gives a brief overview of three shared tasks which have been organized at the PAN 2023 lab on digital text forensics and stylometry hosted at the CLEF 2023 conference. The tasks include authorship verification across discourse types, multi-author writing style analysis, profiling cryptocurrency influencers with few-shot learning, and trig...
This paper is a condensed overview of Touché: the fourth edition of the lab on argument and causal retrieval that was held at CLEF 2023. With the goal to create a collaborative platform for research on computational argumentation and causality, we organized four shared tasks: (a) argument retrieval for controversial topics, where participants retri...
Priority conflicts and the attribution of contributions to important scientific breakthroughs to individuals and groups play an important role in science, its governance, and evaluation. Debates and dynamics around these processes are analyzed by science studies. Our objective is to transform Wikipedia into an accessible, traceable primary source f...
In this paper, we discuss the benefits and challenges of shared tasks as a teaching method. A shared task is a scientific event and a friendly competition to solve a research problem, the task. In terms of linking research and teaching, shared-task-based tutorials fulfill several faculty desires: they leverage students' interdisciplinary and hetero...
We integrate ir_datasets, ir_measures, and PyTerrier with TIRA in the Information Retrieval Experiment Platform (TIREx) to promote more standardized, reproducible, scalable, and even blinded retrieval experiments. Standardization is achieved when a retrieval approach implements PyTerrier's interfaces and the input and output of an experiment are co...
This paper analyzes Wikipedia’s representation of the Nobel Prize winning CRISPR/Cas9 technology, a method for gene editing. We propose and evaluate different heuristics to match publications from several publication corpora against Wikipedia’s central article on CRISPR and against the complete Wikipedia revision history in order to retrieve furthe...
When asked, current large language models (LLMs) like ChatGPT claim that they can assist us with relevance judgments. Many researchers think this would not lead to credible IR research. In this perspective paper, we discuss possible ways for LLMs to assist human experts along with concerns and issues that arise. We devise a human-machine collaborat...
The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search pr...
A major obstacle to the long-term impact of most shared tasks is their lack of reproducibility. Often only the test collections and the papers of the organizers and participants are published. Third parties who want to independently evaluate the state of the art for a task on other data must re-implement the participants’ software. The tools develo...
The goal of Touché is to foster and support the development of technologies for argument and causal retrieval and analysis. For the fourth time, we organize the Touché lab featuring four shared tasks: (a) argument retrieval for controversial topics, where participants retrieve web documents that contain high-quality argumentation and detect the arg...
This paper presents dynamic exploratory search technology for the analysis of scientific corpora. The unique dynamic features of the system allow users to analyze quantitative corpus statistics beyond document counts, and to switch between corpus exploration and corpus filtering. To demonstrate the innovation of our approach, we apply our technolog...
The paper gives a brief overview of the four shared tasks organized at the PAN 2023 lab on digital text forensics and stylometry to be hosted at the CLEF 2023 conference. The general goal of the PAN lab is to advance the state-of-the-art in text forensics and stylometry while ensuring objective evaluation of new and established methods on newly dev...
We present the Touch\'e23-ValueEval Dataset for Identifying Human Values behind Arguments. To investigate approaches for the automated detection of human values behind arguments, we collected 9324 arguments from 6 diverse sources, covering religious texts, political discussions, free-text arguments, newspaper editorials, and online democracy platfo...
We propose to use captions from the Web as a previously underutilized resource for paraphrases (i.e., texts with the same "message") and to create and analyze a corresponding dataset. When an image is reused on the Web, an original caption is often assigned. We hypothesize that different captions for the same image naturally form a set of mutual pa...
We present the Webis-STEREO-21 dataset, a massive collection of S cientific Te xt Re use in O pen-access publications. It contains 91 million cases of reused text passages found in 4.2 million unique open-access publications. Cases range from overlap of as few as eight words to near-duplicate publications and include a variety of reuse types, rangi...
Many computational argumentation tasks, like stance classification, are topic-dependent: the effectiveness of approaches to these tasks significantly depends on whether the approaches were trained on arguments from the same topics as those they are tested on. So, which are these topics that researchers train approaches on? This paper contributes th...
At least 5% of questions submitted to search engines ask about cause-effect relationships in some way. To support the development of tailored approaches that can answer such questions, we construct Webis-CausalQA-22, a benchmark corpus of 1.1 million causal questions with answers. We distinguish different types of causal questions using a novel typ...
The text-to-image model Stable Diffusion has recently become very popular. Only weeks after its open source release, millions are experimenting with image generation. This is due to its ease of use, since all it takes is a brief description of the desired image to "prompt" the generative model. Rarely do the images generated for a new prompt immedi...
With an ever-growing number of new publications each day, scientific writing poses an interesting domain for authorship analysis of both single-author and multi-author documents. Unfortunately, most existing corpora lack either material from the science domain or the required metadata. Hence, we present SMAuC, a new metadata-rich corpus designed sp...
Most research on natural language processing treats bias as an absolute concept: Based on a (probably complex) algorithmic analysis, a sentence, an article, or a text is classified as biased or not. Given the fact that for humans the question of whether a text is biased can be difficult to answer or is answered contradictory, we ask whether an "abs...
The large size of today’s web archives makes it impossible to manually assess the quality of each archived web page, i.e., to check whether a page can be reproduced faithfully from an archive. For automated web archive quality assessment, previous work proposed to measure the pixel difference between a screenshot of the original page and a screensh...
We present the first dataset and evaluation results on a newly defined computational task of trigger warning assignment. Labeled corpus data has been compiled from narrative works hosted on Archive of Our Own (AO3), a well-known fanfiction site. In this paper, we focus on the most frequently assigned trigger type--violence--and define a document-le...
This paper is a condensed report on the third year of the Touché lab on argument retrieval held at CLEF 2022. With the goal to foster and support the development of technologies for argument mining and argument analysis, we organized three shared tasks in the third edition of Touché: (a) argument retrieval for controversial topics, where participan...
The paper gives a brief overview of three shared tasks which have been organized at the PAN 2022 lab on digital text forensics and stylometry hosted at the CLEF 2022 conference. The tasks include authorship verification across discourse types, multi-author writing style analysis and author profiling. Some of the tasks continue and advance past edit...
We present the Webis-STEREO-21 dataset, a massive collection of Scientific Text Reuse in Open-access publications. It contains more than 91 million cases of reused text passages found in 4.2 million unique open-access publications. Featuring a high coverage of scientific disciplines and varieties of reuse, as well as comprehensive metadata to conte...
Web search and other large-scale web data analytics rely on processing archives of web pages stored in a standardized and efficient format. Since its introduction in 2008, the IIPC's Web ARCive (WARC) format has become the standard format for this purpose. As a list of individually compressed records of HTTP requests and responses, it allows for co...
Commercial web search engines employ near-duplicate detection to ensure that users see each relevant result only once, albeit the underlying web crawls typically include (near-)duplicates of many web pages. We revisit the risks and potential of near-duplicates with an information retrieval focus, motivating that current efforts toward an open and i...
Idiosyncrasies in human writing styles make it difficult to develop systems for authorship identification that scale well across individuals. In this year's edition of PAN, the authorship identification track focused on open-set authorship verification, so that systems are applied to unknown documents by previously unseen authors in a new domain. A...
A tailored model of a system is the prerequisite for various analysis tasks, such as anomaly detection, fault identification, or quality assurance. This paper deals with the algorithmic learning of a system’s behavior model given a sample of observations. In particular, we consider real-world production plants where the learned model must capture t...
The paper gives a brief overview of the three shared tasks organized at the PAN 2021 lab on digital text forensics and stylometry hosted at the CLEF conference. The tasks include authorship verification across domains, author profiling for hate speech spreaders, and style change detection for multi-author documents. In part the tasks are new and in...
Framing a news article means to portray the reported event from a specific perspective, e.g., from an economic or a health perspective. Reframing means to change this perspective. Depending on the audience or the submessage, reframing can become necessary to achieve the desired effect on the readers. Reframing is related to adapting style and senti...
This paper is a condensed report on the second year of the Touché shared task on argument retrieval held at CLEF 2021. With the goal to provide a collaborative platform for researchers, we organized two tasks: (1) supporting individuals in finding arguments on controversial topics of social importance and (2) supporting individuals with arguments i...
The exchange of meta-information has always formed part of information behavior. In this article, we show that this rule also extends to conversational search. Information about the user’s information need, their preferences, and the quality of search results are only some of the most salient examples of meta-information that are exchanged as a mat...
Many questions of public interest do not have a single answer but come with a set of choices, each of which with its pros and cons. An "objective" information system can help explore the underlying argument space, and, if equipped with a conversational interface, it can create the experience of lively discussions resembling those from our daily liv...
This research-in-progress paper analyzes Wikipedia’s representation of the Nobel Prize winning
CRISPR/Cas9 technology to explore to what extent and with what temporal dynamics Wikipedia cites
the most relevant and visible scientific literature on this topic. We use both verbatim and fuzzy matching
heuristics to match publications cited from a se...