Paul Clough

Paul Clough
The University of Sheffield | Sheffield · Department of Information

About

264
Publications
36,713
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,966
Citations
Citations since 2017
12 Research Items
1873 Citations
20172018201920202021202220230100200300
20172018201920202021202220230100200300
20172018201920202021202220230100200300
20172018201920202021202220230100200300
Introduction
Skills and Expertise

Publications

Publications (264)
Chapter
Full-text available
Museum websites have been designed to provide access for different types of users, such as museum staff, teachers and the general public. Therefore, understanding user needs and demographics is paramount to the provision of user-centred features, services and design. Various approaches exist for studying and grouping users, with a more recent empha...
Article
Full-text available
Museums are increasing access to their collections and providing richer user experiences via web-based interfaces. However, they are seeing high numbers of users looking at only one or two pages within 10 s and then leaving. To reduce this rate, a better understanding of the type of user who visits a museum website is required. Existing models for...
Chapter
The websites of Cultural Heritage institutions attract the full range of users, from professionals to novices, for a variety of tasks. However, many institutions are reporting high bounce rates and therefore seeking ways to better engage users. The analysis of transaction logs can provide insights into users’ searching and navigational behaviours a...
Chapter
The concept of comparability, or linguistic relatedness, or closeness between textual units or corpora has many possible applications in computational linguistics. Consequently, the task of measuring comparability has increasingly become a core technological challenge in the field, and needs to be developed and evaluated systematically. Many practi...
Chapter
The tools that were developed through the ACCURAT project and are presented in this book are packed into the ACCURAT toolkit (Pinnis et al. 2012a)—a collection of tools that are capable of collecting comparable corpora, analysing and extracting parallel data. The ACCURAT toolkit produces
Chapter
The availability of parallel corpora is limited, especially for under-resourced languages and narrow domains. On the other hand, the number of comparable documents in these areas that are freely available on the Web is continuously increasing. Algorithmic approaches to identify these documents from the Web are needed for the purpose of automaticall...
Article
This article proposes that the value of information is a topic worth revisiting in the contemporary era. Although the topic has been of perennial interest to information professionals and others, since at the least the early 1980s, we believe that it is timely to revisit this question in the context of a more connected and networked environment of...
Conference Paper
Full-text available
Museums are increasing access to their collections via web-based interfaces, but are seeing high numbers of users looking at only one or two pages within 10 s and then leaving. To decrease this rate, a better understanding of the type of user who visits a museum web-site is required. Existing models for museum web-site users tend to focus on a smal...
Conference Paper
People use digital cultural heritage sites in different ways and for various purposes. In this paper we explore what information people search for and why when using Europeana, one of the world’s largest aggregators of cultural heritage. We gathered a probability sample of 240 search requests from users via an online survey and used qualitative con...
Article
Full-text available
This paper presents the first large-scale investigation of the users and uses of WorldCat.org, the world's largest bibliographic database and global union catalog. Using a mixed-methods approach involving focus group interviews with 120 participants, an online survey with 2,918 responses, and an analysis of transaction logs of approximately 15 mill...
Conference Paper
Measuring the similarity of interlanguage-linked Wikipedia articles often requires the use of suitable language resources (e.g., dictionaries and MT systems) which can be problematic for languages with limited or poor translation resources. The size of Wikipedia can also present computational demands when computing similarity. This paper presents a...
Conference Paper
2016 ACM.Data-driven approaches have become increasingly popular as a means for analyzing transaction logs from web search engines and digital libraries, for example using cluster analysis to identify common patterns of search and navigation behavior. However, steps must be taken to ensure that results are reliable and repeatable. Although clusteri...
Conference Paper
Information Retrieval (IR) research has traditionally focused on serving the best results for a single query - so-called ad hoc retrieval. However, users typically search iteratively, refining and reformulating their queries during a session. A key challenge in the study of this interaction is the creation of suitable evaluation resources to assess...
Article
Full-text available
The 38th European Conference on Information Retrieval took place from the 20th to the 23rd of March 2016 in Padua, Italy. This report summarizes the conference in terms of the presented keynotes, scientific and social programme, industry day, tutorials, workshops and student support.
Conference Paper
Full-text available
Increasingly information systems and services are being tailored to the needs of individuals and groups through the use of user-centred design techniques. In this paper we consider the ways in which the users of digital cultural heritage have been previously characterised and grouped. Despite recognising the importance of adopting user-centred tech...
Conference Paper
The workshop aims to bring together researchers and practitioners to review and discuss ways of providing effective access to large-scale collections of cultural heritage content. The scale, variety and availability of cultural heritage content, combined with the variety of user groups with respect to background knowledge, specialist experience and...
Article
Purpose The purpose of this paper is to describe a new supervised machine learning study on the prediction of meeting participant’s personal note-taking from spoken dialogue acts uttered shortly before writing. Design/methodology/approach This novel approach of providing cues for finding important meeting events that would be worth recording in...
Conference Paper
Transaction log analysis at the level of a session is commonly used as a means of understanding user-system interactions. A key practical issue in the process of conducting session level analysis is the segmentation of the logs into appropriate user sessions (i.e., sessionisation). Methods based on time intervals are frequently used as a simple and...
Article
The identification of duplicated and plagiarised passages of text has become an increasingly active area of research. In this paper we investigate methods for plagiarism detection that aim to identify potential sources of plagiarism from MEDLINE, particularly when the original text has been modified through the replacement of words or phrases. A sc...
Conference Paper
Evaluation of IR systems has typically focused on the system and specifically assessing the quality of a ranked list of results with respect to a query. However, IR functionality is typically just one component amongst many that are used to help support users' wider information seeking activities. Many systems that include a search box also provide...
Conference Paper
The evaluation of information access systems is increasingly making use of multiple evaluation methods. While such studies represent forms of mixed-methods research, they are rarely acknowledged as such. This means that researchers are potentially failing to recognise the challenges and opportunities offered by multi-phase research, particularly in...
Conference Paper
Full-text available
The study of plagiarism and its detection is a highly popular field of research that has witnessed increased attention over recent years. In this paper we describe the range of problems that exist within academe in the area of ‘unfair means’, which encompasses a wider range of issues of attribution, ownership and originality. Unfair means offers a...
Article
On July 11th 2014 the First Workshop on the Gathering Efficient Assessments of Relevance (GEAR 2014) was held as part of the SIGIR 2014 conference at the Gold Coast, Australia. An invited talk was given by Dr Nicola Ferro. Three full papers were presented, in addition to a design activity which lead to a lively discussion on gathering relevance ass...
Conference Paper
This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation. Method. An in-depth study analysing the interactions of twelve children carrying out search tasks in a primary school lesson; including...
Chapter
Introduction Cultural heritage involves rich and highly heterogeneous collections that are challenging to archive and convey to the general public. Hardman et al., 2009, 23 This statement describes two aspects that make access to cultural heritage information challenging: the heterogeneous nature of many cultural heritage collections and the growin...
Conference Paper
We examine the accuracy of first story detection on traditional news collections and on a re-purposed source of academic material. The impact on accuracy of detecting an early rather than the first story is examined, showing that accuracy increases under a broader time window, however, the increases on some collections are small. Even on collection...
Article
Recent work on searching the Semantic Web has yielded a wide range of approaches with respect to the underlying search mechanisms, results management and presentation, and style of input. Each approach impacts upon the quality of the information retrieved and the user’s experience of the search process. However, despite the wealth of experience acc...
Article
Full-text available
Purpose – The purpose of this paper is to investigate the effects of cognitive style on navigating a large digital library of cultural heritage information; specifically, the paper focus on the wholist/analytic dimension as experienced in the field of educational informatics. The hypothesis is that wholist and analytic users have characteristically...
Conference Paper
Full-text available
Recent research into the functionality of Online Public Access Catalogues (OPACs) has led to a call for such systems to incorporate functionality to facilitate resource discovery, and replicate the information search experience users encounter elsewhere on the Web. Recommendations represent one such feature. Developments so far in this area indicat...
Article
Full-text available
The session is a common unit of interaction that is used in search log analysis. By analysing sessions, it is possible to identify distinct classes of searcher behaviour that can be used to design search applications that better support groups of users based on their expected behaviours. This paper describes an online card sort experiment to invest...
Article
Full-text available
Search boxes providing simple keyword-based search are insufficient when users have complex information needs or are unfamiliar with a collection, for example in large digital libraries. Browsing hierarchies can support these richer interactions, but many collections do not have a suitable hierarchy available. In this paper we present a number of a...
Article
Full-text available
Evaluation is a fundamental part of Information Retrieval, and in the conventional Cranfield evaluation paradigm, sets of relevance assessments are a fundamental part of test collections. This workshop revisits how relevance assessments can be efficiently created, seeking to provide a forum for discussion and exploration of the topic.
Conference Paper
Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Machine Translation and Cross-Language Information Retrieval. Articles written in different languages on the same topic are often connected through inter-language-links. However, the extent to which these articles are similar is highly variable and thi...
Article
We describe a novel user evaluation study on the value of people's personal meeting notes for a question-answering task involving meeting data taken from the Augmented Multiparty Interaction (AMI) corpus. A survey on task perceptions and note-taking strategies in meetings was also conducted. The results suggest that written notes taken by multiple...
Conference Paper
There is ample evidence of the influence of individual differences on information-seeking behaviours. Trailways and paths are increasingly important objects to support internet navigation. The EU-funded PATHS (Personalised Access to Cultural Heritage Spaces) project is investigating ways of assisting users with exploring a large collection of cultu...
Article
Full-text available
Considerable attention is being paid tomethods for gathering and evaluating comparable corpora, not only to improve Statistical Machine Translation (SMT) but for other applications as well, e.g. the extraction of paraphrases. The potential value of such corpora requires efficient and effective methods for gathering and evaluating them. Most of thes...
Conference Paper
In this paper we compare the results of the user-centred evaluation of two iterations of the PATHS system, which aims at supporting exploration, navigation and use of information in cultural heritage online collections. We focus on two path creation exercises, and examine the format and content of the paths according to available functionality and...
Article
In this paper we present results from an investigation of religious information searching based on analyzing log files from a large general-purpose search engine. From approximately 15 million queries, we identified 124,422 that were part of 60,759 user sessions. We present a method for categorizing queries based on related terms and show differenc...
Conference Paper
This paper explores the relationship between Information Literacy (IL) and the features of Social Tagging Systems (STS). We identify which of the underlying functionalities of STS can assist the IL skills of users with respect to retrieving, managing and sharing information in academic libraries. The study develops a conceptual framework that combi...
Conference Paper
Full-text available
User-centered design and evaluation of a system to improve information access and assist the wider information activities of users in cultural heritage digital collections is described. Extending beyond simple, standalone information seeking and retrieval tasks, the system aims to enhance content ‘findability’ and to support users’ cognitive proces...
Conference Paper
Multiple methods exist for evaluating search systems, ranging from more user-oriented approaches to those more focused on evaluating system performance. When preparing an evaluation, key questions include: (i) why conduct the evaluation, (ii) what should be evaluated, and (iii) how the evaluation should be conducted. Over recent years there has bee...
Conference Paper
In this paper we describe the design and implementation of non-personalized recommendations in the PATHS system. This system allows users to explore items from Europeana in new ways. Recommendations of the type "people who viewed this item also viewed this item" are powered by pairs of viewed items mined from Europeana. However, due to limited usag...
Conference Paper
This paper describes an in-depth study of the effects of geographic region on search patterns; particularly query reformulations, in a large query log from the UK National Archives (TNA). A total of 1,700 sessions involving 9,447 queries from 17 countries were manually analyzed for their semantic composition and pairs of queries for their reformula...
Conference Paper
Full-text available
In this paper we describe a novel approach for exploring large document collections using a map-based visualisation. We use hierarchically structured semantic concepts that are attached to the documents to create a visualisation of the semantic space that resembles a Google Map. The approach is novel in that we exploit the hierarchical structure to...
Conference Paper
This paper describes a system for navigating large collections of information about cultural heritage which is applied to Europeana, the European Library. Europeana contains over 20 million artefacts with metadata in a wide range of European languages. The system currently provides access to Europeana content with metadata in English and Spanish. T...
Conference Paper
Current Information Retrieval systems for digital cultural heritage support only the actual search aspect of the information seeking process. This demonstration presents the second PATHS system which provides the exploration, analysis, and sense-making features to support the full information seeking process.
Conference Paper
Full-text available
The Exploration, Navigation and Retrieval of Information in Cultural Heritage Workshop (ENRICH 2013) offers a forum to 1) discuss the challenges and opportunities in Information Retrieval research in the area of Cultural Heritage; 2) encourage collaboration between researchers engaged in work in this specialist area of Information Retrieval, and to...
Article
Full-text available
Evaluation is instrumental to developing and managing effective information retrieval systems. For this process, enlisting crowdsourcing has proven viable. However, less understood are crowdsourcing's limits for evaluation, particularly for domain-specific search. The authors compare relevance assessments gathered using crowdsourcing with those fro...
Article
Full-text available
-Evaluation is highly important for designing, developing and maintaining effective information retrieval or search systems as it allows the measurement of how successfully an information retrieval system meets its goal of helping users fulfil their information needs. But what does it mean to be successful? It might refer to whether an information...
Article
Objective We aim to identify duplicate pairs of Medline citations, particularly when the documents are not identical but contain similar information. Materials and methods Duplicate pairs of citations are identified by comparing word n-grams in pairs of documents. N-grams are modified using two approaches which take account of the fact that the doc...
Article
Paper has been the format of choice for disseminating geographic information for millennia; however the arrival of the internet and mobile technologies has created new modes of map consumption. This paper investigates the future role of paper mapping in a society where access to online digital mapping is freely available. The research consists of a...
Article
Full-text available
People are increasingly investigating their family history (or genealogy) as part of their everyday information-seeking activities. This paper provides insight into this behaviour and presents a new conceptual model that captures the stages of activity carried out during people’s lifelong family history research. The model offers a multi-phase view...
Article
Full-text available
On August 1st, 2013 the First Workshop on the Exploration, Navigation and Retrieval of Information in Cultural Heritage (ENRICH 2013) was held as part of the SIGIR 2013 conference in Dublin, Ireland. An invited talk was given by Prof. Jaap Kamps. There were 3 full papers and 3 short papers presented in addition to a poster and demonstration session...
Article
Full-text available
Erratum: Due to an error in the editorial process, an earlier version of the above paper was inadvertently published in a previous issue of Aslib Proceedings (Vol. 64 No. 4, 2012, pp. 437‐56). We would like to express our sincere apologies to the authors for this oversight and for any inconvenience caused. This project is funded by the Iranian Mini...
Chapter
The development of models for automatic detection of text re-use and plagiarism across languages has received increasing attention in recent years. However, the lack of an evaluation framework composed of annotated datasets has caused these efforts to be isolated. In this paper we present the CL!TR 2011 corpus, the first manually created corpus for...
Article
Public access to cultural heritage collections is a challenging and ongoing research issue, not least due to the range of different reasons a user may want to access materials. For example, for a virtual museum website users may vary from professionals or experts, to interested members of the public visiting on a whim. In this paper, we are interes...
Article
Large amounts of cultural heritage content have now been digitized and are available in digital libraries. However, these are often unstructured and difficult to navigate. Automatic techniques for identifying similar items in these collections could be used to improve navigation since it would allow items that are implicitly connected to be linked...
Conference Paper
There is a demand for taxonomies to organise large collections of documents into categories for browsing and exploration. This paper examines four existing taxonomies that have been manually created, along with two methods for deriving taxonomies automatically from data items. We use these taxonomies to organise items from a large online cultural h...
Conference Paper
Large amounts of digital cultural heritage (CH) information have become available over the past years, requiring more powerful exploration systems than just a search box. The PATHS system aims to provide an environment in which users can successfully explore a large, unknown collection through two modalities: following existing paths to learn about...
Conference Paper
Full-text available
Large digital libraries have become available over the past years through digitisation and aggregation projects. These large collections present a challenge to the new user who wishes to discover what is available in the collections. Subject classification can help in this task, however in large collections it is frequently incomplete or inconsiste...
Conference Paper
Digitisation of the cultural heritage means that a significant amount of material is now available through online digital library portals. However, the vast quantity of cultural heritage material can also be overwhelming for many users who lack knowledge of the collections, subject knowledge and the specialist language used to describe this content...
Chapter
This chapter discusses IR system evaluation with particular reference to the multilingual context, and presents the most commonly used measures and models. The main focus is on system performance from the viewpoint of retrieval effectiveness. However, we also discuss evaluation from a user-oriented perspective and address questions such as how to a...
Chapter
Increasingly people are required to interact or communicate with Information Retrieval (IR) applications in order to find useful information. This interaction commonly takes place through the user interface, which should help users to formulate their queries, refine their searches and understand and examine search results. In multilingual informati...
Chapter
The information retrieval system stands at the core of many information acquisition cycles. Its task is the retrieval of relevant information from document collections in response to a coded query based on an information need. In its general form, when searching unstructured, natural language text produced by a large range of authors, this is a dif...
Article
Full-text available
Purpose – This paper aims to report the results of a study investigating the relevance criteria used by health care professionals when seeking medical images. Design/methodology/approach – Data were collected from 29 participants using a think‐aloud protocol and face‐to‐face interviews and analysed using the Straussian version of grounded theory (...
Conference Paper
Text reuse is common in many scenarios and documents are often based, at least in part, on existing documents. This paper reports an approach to detecting text reuse which identifies not only documents which have been reused verbatim but is also designed to identify cases of reuse when the original has been rewritten. The approach identifies reuse...
Article
Full-text available
Lifelogging is a technically inspired approach that attempts to address the problem of human forgetting by developing systems that “record everything.” Uptake of lifelogging systems has generally been disappointing, however. One reason for this lack of uptake is the absence of design principles for developing digital systems to support memory. Synt...
Article
Purpose This paper aims to describe a study of the queries generated from a user experiment for cross‐language information retrieval (CLIR) from a historic image archive. Design/methodology/approach A controlled lab‐based user study was carried out using a prototype Italian‐English image retrieval system. Participants were asked to carry out searc...
Conference Paper
External plagiarism detection systems compare suspicious texts against a reference collection to identify the original one(s). The suspicious text may not contain a verbatim copy of the reference collection since plagiarists often try to disguise their behaviour by altering the text. For large reference collections, such as those accessible via the...
Article
Full-text available
Purpose – Moves towards more interactive services on the web have led libraries to add an increasing range of functionality to their OPACS. Given the prevalence of recommender systems on the wider web, especially in e-commerce environments, this paper aims to review current research in this area that is of particular relevance to the library commun...
Article
This paper reports the results of novel quantitative research on multiple people’s personal note-taking in meetings with the long-term aim of aiding the creation of innovative meeting understanding applications. We present three experiments using a large number of group meetings taken from the Augmented Multi-party Interaction meeting corpus. Stati...
Conference Paper
Wikipedia articles in different languages have been mined to support various tasks, such as Cross-Language Information Retrieval (CLIR) and Statistical Machine Translation (SMT). Articles on the same topic in different languages are often connected by inter-language links, which can be used to identify similar or comparable content. In this work, w...
Book
We are living in a multilingual world and the diversity in languages which are used to interact with information access systems has generated a wide variety of challenges to be addressed by computer and information scientists. The growing amount of non-English information accessible globally and the increased worldwide exposure of enterprises also...
Conference Paper
The Cultural Heritage in CLEF 2012 (CHiC) pilot evaluation included these tasks: ad-hoc retrieval, semantic enrichment and variability tasks. At CHiC 2012, the University of She?eld and the University of the Basque Country submitted a joint entry, attempting the three English monolingual tasks. For the ad-hoc task, the baseline approach used the In...
Article
In this paper, we present the results of the user requirements and interface design phase for a prototype system, designed to enhance interaction with cultural heritage collections online through means of a pathway metaphor. We present a single user interaction model that supports various work and information seeking tasks undertaken by both expert...