Benno Stein

Benno Stein
Bauhaus-Universität Weimar ·  Faculty of Media

About

366
Publications
66,234
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,300
Citations

Publications

Publications (366)
Chapter
Full-text available
Die Veröffentlichung von ChatGPT im Herbst 2022 heizte die Kontroverse um Künstliche Intelligenz an und führte zu einer seitdem unaufhörlichen Fragelust - verstärkt dadurch, dass dieselben Prompts schon kürzeste Zeit später andere Outputs generieren. In einem experimentellen Format präsentieren die Herausgeber*innen erste kommentierte Gespräche mit...
Chapter
The paper gives a brief overview of three shared tasks which have been organized at the PAN 2023 lab on digital text forensics and stylometry hosted at the CLEF 2023 conference. The tasks include authorship verification across discourse types, multi-author writing style analysis, profiling cryptocurrency influencers with few-shot learning, and trig...
Chapter
Full-text available
This paper is a condensed overview of Touché: the fourth edition of the lab on argument and causal retrieval that was held at CLEF 2023. With the goal to create a collaborative platform for research on computational argumentation and causality, we organized four shared tasks: (a) argument retrieval for controversial topics, where participants retri...
Conference Paper
Full-text available
Priority conflicts and the attribution of contributions to important scientific breakthroughs to individuals and groups play an important role in science, its governance, and evaluation. Debates and dynamics around these processes are analyzed by science studies. Our objective is to transform Wikipedia into an accessible, traceable primary source f...
Article
Full-text available
In this paper, we discuss the benefits and challenges of shared tasks as a teaching method. A shared task is a scientific event and a friendly competition to solve a research problem, the task. In terms of linking research and teaching, shared-task-based tutorials fulfill several faculty desires: they leverage students' interdisciplinary and hetero...
Preprint
Full-text available
We integrate ir_datasets, ir_measures, and PyTerrier with TIRA in the Information Retrieval Experiment Platform (TIREx) to promote more standardized, reproducible, scalable, and even blinded retrieval experiments. Standardization is achieved when a retrieval approach implements PyTerrier's interfaces and the input and output of an experiment are co...
Article
Full-text available
This paper analyzes Wikipedia’s representation of the Nobel Prize winning CRISPR/Cas9 technology, a method for gene editing. We propose and evaluate different heuristics to match publications from several publication corpora against Wikipedia’s central article on CRISPR and against the complete Wikipedia revision history in order to retrieve furthe...
Preprint
Full-text available
When asked, current large language models (LLMs) like ChatGPT claim that they can assist us with relevance judgments. Many researchers think this would not lead to credible IR research. In this perspective paper, we discuss possible ways for LLMs to assist human experts along with concerns and issues that arise. We devise a human-machine collaborat...
Preprint
Full-text available
The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search pr...
Chapter
A major obstacle to the long-term impact of most shared tasks is their lack of reproducibility. Often only the test collections and the papers of the organizers and participants are published. Third parties who want to independently evaluate the state of the art for a task on other data must re-implement the participants’ software. The tools develo...
Chapter
Full-text available
The goal of Touché is to foster and support the development of technologies for argument and causal retrieval and analysis. For the fourth time, we organize the Touché lab featuring four shared tasks: (a) argument retrieval for controversial topics, where participants retrieve web documents that contain high-quality argumentation and detect the arg...
Chapter
This paper presents dynamic exploratory search technology for the analysis of scientific corpora. The unique dynamic features of the system allow users to analyze quantitative corpus statistics beyond document counts, and to switch between corpus exploration and corpus filtering. To demonstrate the innovation of our approach, we apply our technolog...
Chapter
The paper gives a brief overview of the four shared tasks organized at the PAN 2023 lab on digital text forensics and stylometry to be hosted at the CLEF 2023 conference. The general goal of the PAN lab is to advance the state-of-the-art in text forensics and stylometry while ensuring objective evaluation of new and established methods on newly dev...
Preprint
Full-text available
We present the Touch\'e23-ValueEval Dataset for Identifying Human Values behind Arguments. To investigate approaches for the automated detection of human values behind arguments, we collected 9324 arguments from 6 diverse sources, covering religious texts, political discussions, free-text arguments, newspaper editorials, and online democracy platfo...
Preprint
Full-text available
We propose to use captions from the Web as a previously underutilized resource for paraphrases (i.e., texts with the same "message") and to create and analyze a corresponding dataset. When an image is reused on the Web, an original caption is often assigned. We hypothesize that different captions for the same image naturally form a set of mutual pa...
Article
Full-text available
We present the Webis-STEREO-21 dataset, a massive collection of S cientific Te xt Re use in O pen-access publications. It contains 91 million cases of reused text passages found in 4.2 million unique open-access publications. Cases range from overlap of as few as eight words to near-duplicate publications and include a variety of reuse types, rangi...
Preprint
Full-text available
Many computational argumentation tasks, like stance classification, are topic-dependent: the effectiveness of approaches to these tasks significantly depends on whether the approaches were trained on arguments from the same topics as those they are tested on. So, which are these topics that researchers train approaches on? This paper contributes th...
Conference Paper
Full-text available
At least 5% of questions submitted to search engines ask about cause-effect relationships in some way. To support the development of tailored approaches that can answer such questions, we construct Webis-CausalQA-22, a benchmark corpus of 1.1 million causal questions with answers. We distinguish different types of causal questions using a novel typ...
Preprint
The text-to-image model Stable Diffusion has recently become very popular. Only weeks after its open source release, millions are experimenting with image generation. This is due to its ease of use, since all it takes is a brief description of the desired image to "prompt" the generative model. Rarely do the images generated for a new prompt immedi...
Preprint
Full-text available
With an ever-growing number of new publications each day, scientific writing poses an interesting domain for authorship analysis of both single-author and multi-author documents. Unfortunately, most existing corpora lack either material from the science domain or the required metadata. Hence, we present SMAuC, a new metadata-rich corpus designed sp...
Preprint
Most research on natural language processing treats bias as an absolute concept: Based on a (probably complex) algorithmic analysis, a sentence, an article, or a text is classified as biased or not. Given the fact that for humans the question of whether a text is biased can be difficult to answer or is answered contradictory, we ask whether an "abs...
Chapter
The large size of today’s web archives makes it impossible to manually assess the quality of each archived web page, i.e., to check whether a page can be reproduced faithfully from an archive. For automated web archive quality assessment, previous work proposed to measure the pixel difference between a screenshot of the original page and a screensh...
Preprint
We present the first dataset and evaluation results on a newly defined computational task of trigger warning assignment. Labeled corpus data has been compiled from narrative works hosted on Archive of Our Own (AO3), a well-known fanfiction site. In this paper, we focus on the most frequently assigned trigger type--violence--and define a document-le...
Chapter
Full-text available
This paper is a condensed report on the third year of the Touché lab on argument retrieval held at CLEF 2022. With the goal to foster and support the development of technologies for argument mining and argument analysis, we organized three shared tasks in the third edition of Touché: (a) argument retrieval for controversial topics, where participan...
Chapter
Full-text available
The paper gives a brief overview of three shared tasks which have been organized at the PAN 2022 lab on digital text forensics and stylometry hosted at the CLEF 2022 conference. The tasks include authorship verification across discourse types, multi-author writing style analysis and author profiling. Some of the tasks continue and advance past edit...
Preprint
Full-text available
We present the Webis-STEREO-21 dataset, a massive collection of Scientific Text Reuse in Open-access publications. It contains more than 91 million cases of reused text passages found in 4.2 million unique open-access publications. Featuring a high coverage of scientific disciplines and varieties of reuse, as well as comprehensive metadata to conte...
Preprint
Full-text available
Web search and other large-scale web data analytics rely on processing archives of web pages stored in a standardized and efficient format. Since its introduction in 2008, the IIPC's Web ARCive (WARC) format has become the standard format for this purpose. As a list of individually compressed records of HTTP requests and responses, it allows for co...
Preprint
Full-text available
Commercial web search engines employ near-duplicate detection to ensure that users see each relevant result only once, albeit the underlying web crawls typically include (near-)duplicates of many web pages. We revisit the risks and potential of near-duplicates with an information retrieval focus, motivating that current efforts toward an open and i...
Conference Paper
Full-text available
Idiosyncrasies in human writing styles make it difficult to develop systems for authorship identification that scale well across individuals. In this year's edition of PAN, the authorship identification track focused on open-set authorship verification, so that systems are applied to unknown documents by previously unseen authors in a new domain. A...
Article
A tailored model of a system is the prerequisite for various analysis tasks, such as anomaly detection, fault identification, or quality assurance. This paper deals with the algorithmic learning of a system’s behavior model given a sample of observations. In particular, we consider real-world production plants where the learned model must capture t...
Chapter
The paper gives a brief overview of the three shared tasks organized at the PAN 2021 lab on digital text forensics and stylometry hosted at the CLEF conference. The tasks include authorship verification across domains, author profiling for hate speech spreaders, and style change detection for multi-author documents. In part the tasks are new and in...
Preprint
Full-text available
Framing a news article means to portray the reported event from a specific perspective, e.g., from an economic or a health perspective. Reframing means to change this perspective. Depending on the audience or the submessage, reframing can become necessary to achieve the desired effect on the readers. Reframing is related to adapting style and senti...
Chapter
Full-text available
This paper is a condensed report on the second year of the Touché shared task on argument retrieval held at CLEF 2021. With the goal to provide a collaborative platform for researchers, we organized two tasks: (1) supporting individuals in finding arguments on controversial topics of social importance and (2) supporting individuals with arguments i...
Article
The exchange of meta-information has always formed part of information behavior. In this article, we show that this rule also extends to conversational search. Information about the user’s information need, their preferences, and the quality of search results are only some of the most salient examples of meta-information that are exchanged as a mat...
Conference Paper
Full-text available
Many questions of public interest do not have a single answer but come with a set of choices, each of which with its pros and cons. An "objective" information system can help explore the underlying argument space, and, if equipped with a conversational interface, it can create the experience of lively discussions resembling those from our daily liv...
Conference Paper
Full-text available
This research-in-progress paper analyzes Wikipedia’s representation of the Nobel Prize winning CRISPR/Cas9 technology to explore to what extent and with what temporal dynamics Wikipedia cites the most relevant and visible scientific literature on this topic. We use both verbatim and fuzzy matching heuristics to match publications cited from a se...
Preprint
Full-text available
Web archive analytics is the exploitation of publicly accessible web pages and their evolution for research purposes -- to the extent organizationally possible for researchers. In order to better understand the complexity of this task, the first part of this paper puts the entirety of the world's captured, created, and replicated data (the "Global...
Preprint
Full-text available
Recently, neural networks have been successfully employed to improve upon state-of-the-art performance in ad-hoc retrieval tasks via machine-learned ranking functions. While neural retrieval models grow in complexity and impact, little is understood about their correspondence with well-studied IR principles. Recent work on interpretability in machi...
Article
The Information Retrieval Anthology, IR Anthology for short, is an endeavor to create a comprehensive collection of metadata and full texts of IR-related publications. We report on its first release, the use cases it can serve, as well as the challenges lying ahead to develop it towards a resource that serves the IR community for years to come. The...
Article
Full-text available
Compiling and disseminating information about incidents and disasters are key to disaster management and relief. But due to inherent limitations of the acquisition process, the required information is often incomplete or missing altogether. To fill these gaps, citizen observations spread through social media are widely considered to be a promising...
Chapter
Full-text available
The paper gives a brief overview of the three shared tasks to be organized at the PAN 2021 lab on digital text forensics and stylometry hosted at the CLEF conference. The tasks include authorship verification across domains, author profiling for hate speech spreaders, and style change detection for multi-author documents. In part the tasks are new...
Chapter
Full-text available
Technologies for argument mining and argumentation analysis are maturing rapidly, so that, as a result, the retrieval of arguments in search scenarios becomes a feasible objective. For the second time, we organize the Touché lab on argument retrieval with two shared tasks: (1) argument retrieval for controversial questions, where arguments are to b...
Chapter
Over the past two decades, several algorithms have been developed to segment a web page into semantically coherent units, a task with several applications in web content analysis. However, these algorithms have hardly been compared empirically and it thus remains unclear which of them—or rather, which of their underlying paradigms—performs best. To...
Article
Full-text available
Few studies have investigated how search behavior affects complex writing tasks. We analyze a dataset of 150 long essays whose authors searched the ClueWeb09 corpus for source material, while all querying, clicking, and writing activity was meticulously recorded. We model the effect of search and writing behavior on essay quality using path analysi...
Conference Paper
Full-text available
The automatic summarization of argumentative texts has hardly been explored. This paper takes a further step in this direction, targeting news editorials, i.e., opinionated articles with a well-defined argumentation structure. With Webis-EditorialSum-2020, we present a corpus of 1330 carefully curated summaries for 266 news editorials. We evaluate...
Conference Paper
Full-text available
News editorials aim to shape the opinions of their readership and the general public on timely controversial issues. The impact of an editorial on the reader’s opinion does not only depend on its content and style, but also on the reader’s profile. Previous work has studied the effect of editorial style depending on general political ideologies (li...
Preprint
Media organizations bear great reponsibility because of their considerable influence on shaping beliefs and positions of our society. Any form of media can contain overly biased content, e.g., by reporting on political events in a selective or incomplete manner. A relevant question hence is whether and how such form of imbalanced news coverage can...
Preprint
Media plays an important role in shaping public opinion. Biased media can influence people in undesirable directions and hence should be unmasked as such. We observe that featurebased and neural text classification approaches which rely only on the distribution of low-level lexical information fail to detect media bias. This weakness becomes most n...
Article
This paper presents a visual analytics system for exploring, analyzing and comparing argument structures in essay corpora. We provide an overview of the corpus by a list of ArguLines which represent the argument units of each essay by a sequence of glyphs. Each glyph encodes the stance, the depth and the relative position of an argument unit. The o...
Conference Paper
Full-text available
Authorship identification remains a highly topical research problem in computational text analysis with many relevant applications in contemporary society and industry. For this edition of PAN, we focused on authorship verification , where the task is to assess whether a pair of documents has been authored by the same individual. Like in previous e...