Charles Clarke

Charles Clarke
University of Waterloo | UWaterloo · David R. Cheriton School of Computer Science

About

206
Publications
16,056
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,561
Citations

Publications

Publications (206)
Preprint
Despite the advantages of their low-resource settings, traditional sparse retrievers depend on exact matching approaches between high-dimensional bag-of-words (BoW) representations of both the queries and the collection. As a result, retrieval performance is restricted by semantic discrepancies and vocabulary gaps. On the other hand, transformer-ba...
Preprint
Despite recent progress on conversational systems, they still do not perform smoothly and coherently when faced with ambiguous requests. When questions are unclear, conversational systems should have the ability to ask clarifying questions, rather than assuming a particular interpretation or simply responding that they do not understand. Previous s...
Article
Full-text available
Recent years have seen enormous gains in core information retrieval tasks, including document and passage ranking. Datasets and leaderboards, and in particular the MS MARCO datasets, illustrate the dramatic improvements achieved by modern neural rankers. When compared with traditional information retrieval test collections, such as those developed...
Preprint
The dramatic improvements in core information retrieval tasks engendered by neural rankers create a need for novel evaluation methods. If every ranker returns highly relevant items in the top ranks, it becomes difficult to recognize meaningful differences between them and to build reusable test collections. Several recent papers explore pairwise pr...
Article
This is a report on the NTCIR-15 conference held online in December 2020. NTCIR is a sesquiannual research project designed to evaluate various information access technologies, including information retrieval, information recommendation, question answering, natural language processing, etc. 55 active research groups from 22 countries\regions have p...
Preprint
Over the last few years, contextualized pre-trained transformer models such as BERT have provided substantial improvements on information retrieval tasks. Recent approaches based on pre-trained transformer models such as BERT, fine-tune dense low-dimensional contextualized representations of queries and documents in embedding space. While these den...
Preprint
Recent years have seen enormous gains in core IR tasks, including document and passage ranking. Datasets and leaderboards, and in particular the MS MARCO datasets, illustrate the dramatic improvements achieved by modern neural rankers. When compared with traditional test collections, the MS MARCO datasets employ substantially more queries with subs...
Article
Assessors make preference judgments faster and more consistently than graded judgments. Preference judgments can also recognize distinctions between items that appear equivalent under graded judgments. Unfortunately, preference judgments can require more than linear effort to fully order a pool of items, and evaluation measures for preference judgm...
Conference Paper
Full-text available
Voice-based assistants have become a popular tool for conducting web search, particularly for factoid question answering. However, for more complex web searches, their functionality remains limited, as does our understanding of the ways in which users can best interact with audio-based search results. In this paper, we compare and contrast user beh...
Preprint
Assessors make preference judgments faster and more consistently than graded relevance judgments. Preference judgments can also recognize distinctions between items that appear equivalent under graded judgments. Unfortunately, preference judgments can require more than linear effort to fully order a pool of items, and evaluation measures for prefer...
Book
This book constitutes the refereed proceedings of the 14th International Conference on NII Testbeds and Community for Information Access Research, NTCIR 2019, held in Tokyo, Japan, in June 2019. The 15 full papers presented in this book were carefully reviewed and selected from 55 submissions. This NTCIR 2019 proceedings was structured in the foll...
Article
This is a report on the NTCIR-13 conference held in December 2017, in Tokyo, Japan. NTCIR is a series of parallel and collective evaluation efforts designed to enhance research on diverse information access technologies, including, but not limited to, cross-language and multimedia information access, question-answering, text mining and summarizatio...
Article
Full-text available
The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on-or even over-the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community. The intent is that thi...
Conference Paper
Large scale retrieval systems often employ cascaded ranking architectures, in which an initial set of candidate documents are iteratively refined and re-ranked by increasingly sophisticated and expensive ranking models. In this paper, we propose a unified framework for predicting a range of performance-sensitive parameters based on minimizing end-t...
Conference Paper
People regularly use web search engines to investigate the efficacy of medical treatments. Search results can contain documents that present incorrect information that contradicts current established medical understanding on whether a treatment is helpful or not for a health issue. If people are influenced by the incorrect information found in sear...
Conference Paper
Full-text available
Chatbots and conversational assistants are becoming increasingly popular. However, for information seeking scenarios, these systems still have very limited conversational abilities, and primarily serve as proxies to existing web search engines. In this work, we ask: what would conversational search look like with a truly intelligent assistant? To b...
Article
Full-text available
Scalable web search systems typically employ multi-stage retrieval architectures, where an initial stage generates a set of candidate documents that are then pruned and re-ranked. Since subsequent stages typically exploit a multitude of features of varying costs using machine-learned models, reducing the number of documents that are considered at e...
Conference Paper
This paper explores a simple question: How would we provide a high-quality search experience on Mars, where the fundamental physical limit is speed-of-light propagation delays on the order of tens of minutes? On Earth, users are accustomed to nearly instantaneous responses from web services. Is it possible to overcome orders-of-magnitude longer lat...
Article
We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using similar space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing meth...
Conference Paper
Modern multi-stage retrieval systems are comprised of a candidate generation stage followed by one or more reranking stages. In such an architecture, the quality of the final ranked list may not be sensitive to the quality of the initial candidate pool, especially in terms of early precision. This provides several opportunities to increase retrieva...
Article
This paper explores a simple question: How would we provide a high-quality search experience on Mars, where the fundamental physical limit is speed-of-light propagation delays on the order of tens of minutes? On Earth, users are accustomed to nearly instantaneous response times from search engines. Is it possible to overcome orders-of-magnitude lon...
Article
Modern multi-stage retrieval systems are comprised of a candidate generation stage followed by one or more reranking stages. In such an architecture, the quality of the final ranked list may not be sensitive to the quality of initial candidate pool, especially in terms of early precision. This provides several opportunities to increase retrieval ef...
Conference Paper
There are presently plans to create permanent colonies on Mars so that humanity will have a second home. These colonists will need search, email, entertainment, and indeed most services provided on the modern web. The primary challenge is network latencies, since the two planets are anywhere from 4 to 24 light minutes apart. A recent article sketch...
Chapter
Personalized (mobile) devices are radically changing information access tools, with rich context allowing for far more powerful, personalized search. Rather than retrieving a " document " on the topic of a " query, " the rich contextual information allows for tailored search and recommendation, and solve user's complex tasks by taking into account...
Article
We examine the effects of different latency penalties in the evaluation of push notification systems, as operationalized in the TREC 2015 Microblog track evaluation. The purpose of this study is to inform the design of metrics for the TREC 2016 Real-Time Summarization track, which is largely modeled after the TREC 2015 evaluation design.
Conference Paper
Recently developed retrieval effectiveness measures have incorporated models of user behavior, but have limited themselves to predicting user performance over a single query and response. Accurate prediction of user performance with search systems must incorporate a means to model how users switch between different information sources. For example,...
Conference Paper
Adult content is pervasive on the web, has been a driving factor in the adoption of the Internet medium, and is responsible for a significant fraction of traffic and revenues, yet rarely attracts attention in research. The research questions surrounding adult content access behaviors are unique, and interesting and valuable research in this area ca...
Conference Paper
This half-day tutorial on IR evaluation combines an introduction to classical IR evaluation methods with material on more recent user-oriented approaches. We primarily focus on off-line evaluation, but some material on on-line evaluation is also covered. The broad goal of the tutorial is to equip researchers with an understanding of modern approach...
Conference Paper
We are concerned with the effect of using a surrogate assessor to train a passive (i.e., batch) supervised-learning method to rank documents for subsequent review, where the effectiveness of the ranking will be evaluated using a different assessor deemed to be authoritative. Previous studies suggest that surrogate assessments may be a reasonable pr...
Conference Paper
Full-text available
Creating test collections for modern search tasks is increasingly more challenging due to the growing scale and dynamic nature of content, and need for richer contextualization of the statements of request. To address these issues, the TREC Contextual Suggestion Track explored an open test collection, where participants were allowed to submit any w...
Conference Paper
People track news events according to their interests and available time. For a major event of great personal interest, they might check for updates several times an hour, taking time to keep abreast of all aspects of the evolving event. For minor events of more marginal interest, they might check back once or twice a day for a few minutes to learn...
Article
Modern cities are increasingly becoming smart where a digital knowledge infrastructure is deployed by local authorities (e.g. City councils and municipalities) to better serve the information needs of their citizens, and to ensure the sustainability and efficient use of power and resources. This knowledge infrastructure consists of a wide range of...
Article
Time-biased gain provides a general framework for predicting user performance on information retrieval systems, capturing the impact of the user's interaction with the system's interface. Our prior work investigated an instantiation of time-biased gain aimed at traditional search interfaces utilizing clickable result summaries, with gain realized f...
Conference Paper
Modern cities are becoming smart where a digital knowledge infrastructure is deployed by local authorities (e.g. City councils and municipalities) to better serve the information needs of their citizens, and to ensure sustainability and efficient use of power and resources. This knowledge infrastructure consists of a wide range of systems from low-...
Article
An online advertisement’s clickthrough rate provides a fundamental measure of its quality, which is widely used in ad selection strategies. Unfortunately, ads placed in contexts where they are rarely viewed—or where users are unlikely to be interested in commercial results—may receive few clicks regardless of their quality. In this article, we mode...
Conference Paper
The evaluation of clustering quality has proven to be a difficult task. While it is generally agreed that application specific human assessment can provide a reasonable gold standard for clustering evaluation, the use of human assessors is not practical in many real situations. As a result, machine computable internal clustering quality measures (C...
Conference Paper
While supervised learning-to-rank algorithms have largely supplanted unsupervised query-document similarity measures for search, the exploration of query-document measures by many researchers over many years produced insights that might be exploited in other domains. For example, the BM25 measure substantially and consistently outperforms cosine ac...
Article
Many queries have multiple interpretations; they are ambiguous or underspecified. This is especially true in the context of Web search. To account for this, much recent research has focused on creating systems that produce diverse ranked lists. In order to validate these systems, several new evaluation measures have been created to quantify diversi...
Conference Paper
The SIGIR 2013 Workshop on Modeling User Behavior for Information Retrieval Evaluation (MUBE 2013) brings together people to discuss existing and new approaches, ways to collaborate, and other ideas and issues involved in improving information retrieval evaluation through the modeling of user behavior.
Conference Paper
To construct a diversified search test collection, a set of possible subtopics (or intents) needs to be determined for each topic, in one way or another, and perintent relevance assessments need to be obtained. In the TREC Web Track Diversity Task, subtopics are manually developed at NIST, based on results of automatic click log analysis; in the NT...
Conference Paper
We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods...
Article
The Workshop on Search and Exploration of X-Rated Information (SEXI) was presented for the first time at the Conference on Web Search and Data Mining (WSDM) 2013 in Rome, Italy. It represents a first attempt to study adult content from the perspective of the research communities in Web Search and Data Mining. To this end, five short papers were pre...
Article
Inspired by requirements traceability problems, we present a method for implementing fast and effective hypertext links to specific locations within documents. These soft links do not depend on tags, markup, or closed tool sets, yet they can generally survive extensive edits to a document collection, allowing the targets of these links to be locate...
Conference Paper
Adult content is pervasive on the Web, has been a driving factor in the adoption of the Internet medium. It is responsible for a significant fraction of traffic and revenues, yet rarely attracts attention in research. We propose that the research questions surrounding adult content access behaviors are unique, and we believe interesting and valuabl...
Article
The SIGIR 2013 Workshop on Modeling User Behavior of Information Retrieval Evaluation brought together researchers interested in improving Cranfield-style evaluation of information retrieval through the modeling of user behavior. The workshop included two invited talks, ten short paper presentations, and breakout groups. Workshop participants brain...
Article
On August 16, 2012 the SIGIR 2012 Workshop on Open Source Information Retrieval was held as part of the SIGIR 2012 conference in Portland, Oregon, USA. There were 2 invited talks, one from industry and one from academia. There were 6 full papers and 6 short papers presented as well as demonstrations of 4 open source tools. Finally there was a livel...
Conference Paper
Clickthrough rate provides a fundamental measure of advertising quality, which is widely used in ad selection strategies. However, ads placed in contexts where they are rarely viewed, or where users are unlikely to be interested in commercial results, may receive few clicks regardless of their quality. In this paper, we gain insight into user brows...
Conference Paper
Time-biased gain provides a unifying framework for information retrieval evaluation, generalizing many traditional effectiveness measures while accommodating aspects of user behavior not captured by these measures. By using time as a basis for calibration against actual user data, time-biased gain can reflect aspects of the search process that dire...
Conference Paper
Cranfield-style information retrieval evaluation considers variance in user information needs by evaluating retrieval systems over a set of search topics. For each search topic, traditional metrics model all users searching ranked lists in exactly the same manner and thus have zero variance in their per-topic estimate of effectiveness. Metrics that...
Article
Many current effectiveness measures incorporate simplifying assumptions about user behavior. These assumptions prevent the measures from reflecting aspects of the search process that directly impact the quality of retrieval results as experienced by the user. In particular, these measures implicitly model users as working down a list of retrieval r...
Article
We develop and discuss a news comment miner that presents distinct viewpoints on a given theme or event. Given a query, the system uses metasearch techniques to find relevant news articles. Relevant articles are then scraped for both article content and comments. Snippets from the comments are sampled and presented to the user, based on theme popul...
Conference Paper
When an ambiguous query is received, a sensible approach is for the information retrieval (IR) system to diversify the results retrieved for this query, in the hope that at least one of the interpretations of the query intent will satisfy the user. Diversity is an increasingly important topic, of interest to both academic researchers (such as parti...
Article
Implicit feedback techniques may be used for query intent detection, taking advantage of user behavior to understand their interests and preferences. In sponsored search, a primary concern is the user’s interest in purchasing or utilizing a commercial service, or what is called online commercial intent. In this paper, we develop a methodology for e...
Article
We investigate the effect of feature weighting on document clustering, including a novel investigation of Okapi BM25 feature weighting. Using eight document datasets and 17 well-established clustering algorithms we show that the benefit of tf-idf weighting over tf weighting is heavily dependent on both the dataset being clustered and the algorithm...
Conference Paper
Full-text available
Current measures of novelty and diversity in information retrieval evaluation require explicit subtopic judgments, adding complexity to the manual assessment process. In some sense, these subtopic judgments may be viewed as providing a crude indication of document similarity, since we might expect documents relevant to common subtopics to be more s...
Conference Paper
We present a novel investigation of email clustering, demonstrating that clustering can be a powerful tool for email spam filtering. We first extend the well-known notion that ham and spam emails can be divided into clusters, showing the striking result that almost any reasonable clustering algorithm will naturally partition an email dataset into a...
Conference Paper
Full-text available
We explore statistical properties of links within Wikipedia. We demonstrate that a simple algorithm can predict many of the links that would normally be added to a new article, without considering the topic of the article itself. We then explore a variant of topic-oriented PageRank, which can effectively identify topical links within existing artic...
Article
Full-text available
Searchers with a complex information need typically slice-and-dice their problem into several queries and subqueries, and laboriously combine the answers post hoc to solve their tasks. Consider planning a social event at the last day of SIGIR, in the unknown city of Beijing, factoring in distances, timing, and preferences on budget, cuisine, and en...
Article
When an ambiguous query is received, a sensible approach is for the information retrieval (IR) system to diversify the results retrieved for this query, in the hope that at least one of the interpretations of the query intent will satisfy the user. Diversity is an increasingly important topic, of interest to both academic researchers (such as parti...
Conference Paper
The Maximum Entropy Method provides one technique for validating search engine effectiveness measures. Under this method, the value of an effectiveness measure is used as a constraint to estimate the most likely distribution of relevant documents under a maximum entropy assumption. This inferred distribution may then be compared to the actual distr...
Conference Paper
Traditional editorial effectiveness measures, such as nDCG, remain standard for Web search evaluation. Unfortunately, these traditional measures can inappropriately reward redundant information and can fail to reflect the broad range of user needs that can underlie a Web query. To address these deficiencies, several researchers have recently propos...
Article
The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general Web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam --- pages designed to deceive search engines so as to deliver an unwanted payload. We examine the effec...
Conference Paper
We evaluate a framework for BM25F-based XML element retrieval. The framework gathers contextual information associated with each XML element into an associated field, which we call a characteristic field. The contents of the element and the contents of the characteristic field are then treated as distinct fields for BM25F weighting purposes. Eviden...
Conference Paper
Full-text available
Examining large-scale, long-term application use is critical to understanding the degree to which an application meets the needs of its user community. However, there has been limited published analysis of this type of data, none of which pertains to applications that support creating and modifying content using direct manipulation. In this paper,...
Conference Paper
We present a method that introduces diversity into document retrieval using clusters of top-m terms obtained from the top-k retrieved documents through pseudo-relevance feedback. Terms from each cluster are used to automatically expand the original query. We evaluate the effectiveness of our method using a non-traditional effectiveness evaluation m...
Article
Full-text available
Clickthrough on ads and search results have been success-fully used to infer user interest and preferences, but these indicators are typically most effective for modeling the "dom-inant" or most popular intent for a query. In this paper we begin to explore rich client-side instrumentation for inferring personalized commercial intent of users. In pa...
Conference Paper
This year, University ofWaterloo participated in four tracks; Ad Hoc, Book, Entity Ranking, and Link-the-Wiki tracks. In Ad Hoc and Book tracks, we implemented a variation of Okapi BM25F [20, 5, 18, 15] that gave substantial improvements over the baseline BM25 that ranked first in the previous year [12, 13], during the training and in the official...
Article
Experiments were conducted to explore the impact of combining various components of eight leading information retrieval systems. Each system demonstrated improved effectiveness through the use of blind feedback, also known as pseudo-relevance feedback, a form of query expansion. Blind feedback uses the results of a preliminary retrieval step to aug...
Conference Paper
Building upon simple models of user needs and behavior, we propose a new measure of novelty and diversity for information retrieval evaluation. We combine ideas from three recently proposed effectiveness measures in an attempt to achieve a balance between the complexity of genuine users needs and the simplicity required for feasible evaluation.
Conference Paper
Full-text available
Understanding the intent underlying users’ queries may help personalize search results and improve user satisfaction. In this paper, we develop a methodology for using ad clickthrough logs, query specific information, and the content of search engine result pages to study characteristics of query intents, specially commercial intent. The findings o...
Conference Paper
In this paper, we report on our TREC experiments with the ClueWeb09 document collection. We par- ticipated in the relevance feedback and web tracks. While our phase 1 relevance feedback run's perfor- mance was good, our other relevance feedback and web track submissions' performances were lacking. We suspect this performance difference is caused by...
Conference Paper
Full-text available
Clickthrough rate and cost-per-click are known to be among the factors that impact the rank of an ad shown on a search result page. Hence, search engines can benefit from estimating ad clickthrough in order to determine the qual- ity of ads and maximize their revenue. In this paper, a methodology is developed to estimate ad clickthrough rate by exp...