Student at Martin-Luther-Universität Halle-Wittenberg, Research Assistant at the Webis Group
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
Interested in Information Retrieval using Natural Language Processing. Likes to work interdisciplinary, e.g., in Argumentative IR.
Learning to rank (LTR) is the de facto standard for web search, improving upon classical retrieval models by exploiting (in)direct relevance feedback from user judgments, interaction logs, etc. We investigate for the first time the effect of a sampling bias on LTR models due to the potential presence of near-duplicate web pages in the training data...
Near-duplicate documents are abundant in web corpora. Bernstein and Zobel have shown earlier that this redundancy reduces search e ectiveness under the novelty principle, i.e., if subsequent duplicates in rankings are marked irrelevant or removed. We examine the impact of near duplicates on learning to rank, nowadays the standard approach for ranki...