About
80
Publications
9,395
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,299
Citations
Introduction
Current institution
Publications
Publications (80)
A new layer of complexity, constituted of networks of information token recurrence, has been identified in socio-technical systems such as the Wikipedia online community and the Zooniverse citizen science platform. The identification of this complexity reveals that our current understanding of the actual structure of those systems, and consequently...
The use of game elements within virtual citizen science is increasingly common, promising to bring increased user activity, motivation, and engagement to large-scale scientific projects. However, there is an ongoing debate about whether or not gamifying systems such as these is actually an effective means by which to increase motivation and engagem...
A new layer of complexity, constituted of networks of information token recurrence, has been identified in socio-technical systems such as the Wikipedia online community and the Zooniverse citizen science platform. The identification of this complexity reveals that our current understanding of the actual structure of those systems, and consequently...
A new layer of complexity, constituted of networks of information token recurrence, has been identified in socio-technical systems such as the Wikipedia online community and the Zooniverse citizen science platform. The identification of this complexity reveals that our current understanding of the actual structure of those systems, and consequently...
Social media data have provoked a mixed response from researchers. While there is great enthusiasm for this new source of social data – Twitter data in particular – concerns are also expressed about their biases and unknown provenance and, consequently, their credibility for social research. This article seeks a middle path, arguing that we must de...
As the speed, volume, and heterogeneity of data produced on the Web increases, we are faced with developing more intelligent and efficient strategies for storing and archiving data. The archiving of Web data involves many technical, governance, and policy related challenges, however one of the most prominent and timely challenges that archivists fa...
skipped tweets, average time spent to complete the tasks, and interaction with the user interface) and analyse their impact on correct and incorrect human annotations. We then carried out further studies on the impact of extended annotation instructions and disambiguation guidelines on the factors listed above. This was all done using CrowdFlower a...
In this paper we investigate the implications of providing a real-time messaging interface in a Web-based citizen science game. Our study draws on data from two weeks of chat messages and survey responses collected from Eyewire, a highly successful citizen science game which enables players to take part in scientific enquiries, within a semi-gamifi...
In this paper we investigate the implications of providing a real-time messaging interface in a Web-based citizen science game. Our study draws on data from two weeks of chat messages and survey responses collected from Eyewire, a highly successful citizen science game which enables players to take part in scientific enquiries, within a semi-gamifi...
In this position paper we wish to propose and discuss several open research questions associated with the IoT. In particular, we wish to consider how crowdsourcing can be used as a scalable, reliable, and sustainable approach to support various computationally difficult and ambiguous tasks recognised in IoT research. We illustrate our work by exami...
The ongoing growth in research data publication supports global intra-disciplinary and inter-disciplinary research collaboration but the current generation of archive-centric research data repositories do not address some of the key practical obstacles to research data sharing and re-use, specifically: discovering relevant data on a global scale is...
Motivated by the increasing amount of voices who ask for careful consideration of what context-rich data analysis methods can tell us about the activities of human collectives, we contribute an argumentation that employs a dialectic of literature on the philosophy of truth and science as well as analytical methods for the study of information diffu...
Online citizen science projects have been increasingly used in a variety of disciplines and contexts to enable large-scale scientific research. The successes of such projects have encouraged the development of customisable platforms to enable anyone to run their own citizen science project. However, the process of designing and building a citizen s...
Wikipedia represents a successful peer-produced knowledge-resource constructed via the endeav- ours of millions of volunteers. We examine the activity of Wikipedia by analysing WikiProjects, an community-driven feature which allows communities of Wikipedians to coordinate their efforts in order to improve or produce Wikipedia articles. We harvested...
Sustained engagement of participants is essential for the success of a citizen science project. However, the motivations of why people engage with such activities can be idiosyncratic, varied, and evolving. In this article we examine player participation in Eyewire, a citizen science game. We undertake an investigation of why Eyewire players take p...
“Filter bubble”, “echo chambers”, “information diet” – the metaphors to describe today’s information dynamics on social media platforms are fairly diverse (Tufekci, 2016). People use them to describe the impact of the viral spread of fake, biased or purposeless content online, as witnessed during the recent race for the US presidency or the latest...
As a result of the development of Big Data and cloud databases, a huge amount of data are available on the Web, not only as dump files but also in databases. Due to the volume and heterogeneity of these data, it is a challenging task to find and consume them. To reduce the barrier of data sharing and reuse on the Web, we propose a data cataloguing...
This study explains how bots interact with human users and influence conversational networks on Twitter. We analyze a high-stakes political environment, the UK general election of May 2015, asking human volunteers to tweet from purpose-made Twitter accounts—half of which had bots attached—during three events: the last Prime Minister’s Question Time...
This is an attempt to develop a universal socio-technical computing machine that captures and coordinates human input to let collective problem solving activities emerge on the Web without the need for an a priori composition of a dedicated task or human collective.
In this paper, we examine the motivations for participation in Eye-Wire, a Web-based gamified citizen science platform. Our study is based on a large-scale survey to which we conducted a qualitative analysis of survey responses in order to understand what drives individuals to participate. Based on our analysis, we derive 18 motivations related to...
Web science relies on an interdisciplinary approach that seeks to go beyond what any one subject can say about the World Wide Web. By incorporating numerous disciplinary perspectives and relying heavily on domain knowledge and expertise, data science has emerged as an important new area that integrates statistics with computational knowledge, data...
This paper documents a study of the real-time Wikipedia edit stream containing over 6 million edits on 1.5 million English Wikipedia articles, during 2015. We focus on answering questions related to identification and use of information cascades between Wikipedia articles, based on author editing activity. Our findings show that by constructing inf...
The Web is barely 25 years old but in that time it has changed every aspect of our lives. Because of its sociotechnical rather than purely engineered nature, not only is the Web changing society but also we shape the way the technology evolves. The whole process is inherently co-constituted and as such its evolution is unlike any other system. In o...
In recent years there has been a growing interest toward the application of Web-based citizen science platforms. Such platforms use crowdsourcing techniques to support scientific advancements, and in several cases, have lead to new scientific discoveries which were not originally considered. Our work explores the highly successful Web-based citizen...
Real-time streams, personal devices, and sensor networks have the potential to unveil rich insights for researchers, commerce, and governments. With a vested interest in unlocking these potential benefits, extensive work has been conducted on developing technologies to process, integrate, and extract value from the data. However, exposing the value...
In this paper, we investigate a method for constructing cascades of information co-occurrence, which is suitable to trace emergent structures in information in scenarios where rich contextual features are unavailable. Our method relies only on the temporal order of content-sharing activities, and intrinsic properties of the shared content itself. W...
Citizen science is changing the process of scientific knowledge discovery. Successful projects rely on an active and able collection of volunteers. In order to attract, and sustain citizen scientists, designers are faced with the task of transforming complex scientific tasks into something accessible, interesting, and hopefully, engaging. In this p...
This paper explores the factors that influence the human component in hybrid approaches to named entity recognition (NER) in microblogs, which combine state-of-the-art automatic techniques with human and crowd computing. We identify a set of content and crowdsourcing-related features (number of entities in a post, types of entities, skipped true-po...
Over the past years, streaming Web services have become popular,
with many of the top Web platforms now offering near real-time
streams of user and machine activity. In light of this, Web Observatories
now are faced with the challenge of being able to process
and republish real-time, big data, Web streams, whilst maintaining
access control and data...
In this paper we examine WikiProjects, an emergent, community driven
feature of Wikipedia. We analysed 3.2 million Wikipedia
articles associated with 618 active Wikipedia projects. The dataset
contained the logs of over 115 million article revisions and 15 million
talk entries both representing the activity of 15 million unique
Wikipedians altogeth...
This paper is an attempt to lay out foundations for a general theory of coincidence in information spaces such as the World Wide Web, expanding on existing work on bursty structures in document streams and information cascades. We elaborate on the hypothesis that every resource that is published in an information space, enters a temporary interacti...
This paper is an attempt to lay out foundations for a general theory of coincidence in information spaces such as the World Wide Web, expanding on existing work on bursty structures in document streams and information cascades. We elaborate on the hypothesis that every resource that is published in an information space, enters a temporary interacti...
Citizen science is changing the process of scientific knowledge discovery. Successful projects rely on an active and able collection of volunteers. In order to attract, and sustain citizen scientists, designers are faced with the task of transforming complex scientific tasks into something accessible, interesting, and hopefully, engaging. In this p...
Citizen science is changing the process of scientific knowledge discovery. Successful projects rely on an active and able collection of volunteers. In order to attract, and sustain citizen scientists, designers are faced with the task of transforming complex scientific tasks into something accessible, interesting, and hopefully, engaging. In this p...
Designing an effective and sustainable citizen science (CS)project requires consideration of a great number of factors. This makes the overall process unpredictable, even when a sound, user-centred design approach is followed by an experienced team of UX designers. Moreover, when such systems are deployed, the complexity of the resulting interactio...
Designing an effective and sustainable citizen science (CS)
project requires consideration of a great number of factors.
This makes the overall process unpredictable, even when a
sound, user-centred design approach is followed by an experienced
team of UX designers. Moreover, when such systems
are deployed, the complexity of the resulting interacti...
Motivated by the significant amount of successful collaborative problem solving activity on the Web, we ask: Can the accumulated information propagation behavior on the Web be conceived as a giant machine, and reasoned about accordingly? In this paper we elaborate a thesis about the computational capability embodied in information sharing activitie...
‘Crowdsourcing’ describes the diverse practices of online distributed knowledge production whereby the Web is used to host collaborative platforms that focus large volumes of information, from multiple users, towards specific tasks. Sometimes described as the ‘wisdom of the crowd’ or ‘collective intelligence’ crowdsourcing was originally conceived...
In this paper, we address the problem of finding Named Entities in very large micropost datasets. We propose methods to generate a sample of representative microposts by discovering tweets that are likely to refer to new entities. Our approach is able to significantly speed-up the semantic analysis process by discarding retweets, tweets without pre...
The emergence of Big Data is both promising and challenging for social research. This article suggests that realising this promise has been restricted by the methods applied in social science research, which undermine our potential to apprehend the qualities that make Big Data so appealing, not least in relation to the sociology of networks and flo...
The recent emergence of online citizen science is illustrative of an
efficient and effective means to harness the crowd in order to achieve a range
of scientific discoveries. Fundamentally, citizen science projects draw upon
crowds of non-expert volunteers to complete short Tasks, which can vary in
domain and complexity. However, unlike most human-...
In this paper we outline some of the challenges for social media analytics and - at the same time - challenge existing approaches to social media analysis. Specifically, we suggest that there is an unhelpful gulf between social scientific approaches, which offer rich theoretical and methodological understandings of the social; and computational app...
We conducted a quantitative analysis of ten citizen science projects hosted on the Zooniverse platform, using a data set of over 50 million activity records and more than 250; 000 users, collected between December 2010 and July 2013. We examined the level of participation of users in Zooniverse discussion forums in relation to their contributions t...
In this paper, we address the problem of finding Named Entities in very large micropost datasets. We propose methods to generate a sample of representative microposts by discovering tweets that are likely to refer to new entities. Our approach is able to significantly speed-up the semantic analysis process by discarding retweets, tweets without pre...
Web Observatories aim to develop techniques and methods to allow researchers to interrogate and answer questions about society through the multitudes of digital traces people now create. In this paper, we propose that a possible path towards surmounting the inevitable obstacle of personal privacy towards such a goal, is to keep data with individual...
In this paper we present a socio-technical framework for understanding the Web, which attempts to re-integrate the micro perspective of engineered activity with the macro perspective of emergent global phenomena. Our conceptualization of the Web’s growth is grounded in a social theoretical approach to the interactions between humans and technologie...
The Web has grown to be an integral part of modern society offering novel ways for humans to communicate, interact, and share information. New collaborative platforms are forming which are providing individuals with new communities and knowledge bases and, at the same time, offering insights into human activity for researchers, policy-makers and en...
‘Big data’ is the latest buzz phrase in health services research and health care policy circles. Dramatic transformations and advances are promised but is the hyperbole justified?
Linked data technologies provide advantages in terms of interoperability and integration, which, in certain cases, come at the cost of performance. The Web Observatory, a global Web Science research project, is providing a benchmark infrastructure to understand and address the challenges of analytics on distributed Linked Data infrastructures.
Wikipedia has grown to become the most successful online encyclopedia on the Web, containing over 24 million articles, offered in over 240 languages. In just over 10 years Wikipedia has transformed from being just an encyclopedia of knowledge, to a wealth of facts and information, from articles discussing trivia, political issues, geographies and d...
The Web represents a collection of socio-technical activities inter-operating using a set of common protocols and standards. Online banking, web TV, internet shopping, e-government and social networking are all different kinds of human interaction that have recently leveraged the capabilities of the Web architecture. Activities that have human and...
A new kind of activity – fuelled by the capabilities that not only modern Web technologies offer, but also as a change in social practices and expectations – has recently become the centre of much attention and discussion; it involves the curation and publication of Government data in free, open format. Open Government is set to become a major aspe...
This framework introduced in this paper aims to reflect the characteristics that social machines have been described to have. The framework uses a mixed methods approach underpinned by social theory to provide a detailed and rich understanding of the socio-technical nature of a social machine. The strength of this lies in the diversity of the data...
Web Science is now well recognized as an interdisciplinary field, drawing on research from the computational, natural and social sciences. These disciplines bring diverse theoretical and methodological approaches, providing alternative perspectives and insight into Web activity. Consequently, Web Science faces the challenge of developing research m...
Studies have identified scale free networks – a real- world and man-made phenomena – in networks such as the human brain, protein networks, market investments networks, journal co-citation networks and the World Wide Web. Common properties such as preferential attachment and growth enable these networks to be classified as scale-free, which belong...
Social networks provide a new and exciting way for individuals, businesses, organizations and governments to create and share information. Specific social networks such as the popular micro-blogging social network site, Twitter, provide individuals with an opportunity to disseminate information to a potentially global audience. In this paper we des...
In recent years, there have been a rising number of Open Government Data (OGD) initiatives; a political, social and technical movement armed with a common goal of publishing government data in open, re-usable formats in order to improve citizen-to-government transparency, efficiency, and democracy. As a sign of commitment, the Open Government Partn...
Twitter has redefined the way social activities can be coordinated; used for mobilizing people during natural disasters, studying health epidemics, and recently, as a communication platform during social and political change. As a large scale system, the volume of data transmitted per day presents Twitter users with a problem: how can valuable cont...
Twitter has redefined the way social activities can be coordinated; used for mobilizing people during natural disasters, studying health epidemics, and recently, as a communication platform during social and political change. As a large scale system, the volume of data transmitted per day presents Twitter users with a problem: how can valuable cont...
From a technical perspective, the Web is a distributed information architecture that is based on the concepts of interaction (HTTP), format (HTML/RDF) and identification (URI) [5]. "Browsing", "navigating" and "information discovery" are the kinds of generic activities that web developers and information scientists concern themselves with, but the...
Whilst it is widely understood that the Web is a socio-technical phenomenon – produced by both human and non-human actors – existing research tends to emphasize either the social or the technical rather than offering an integrative analytical framework. In contrast, this paper examines the affordances of Actor Network Theory (ANT) – derived from So...
This paper examines the process of adoption of ‘Linked Open Data’ within the UK Open Public Sector Community. We use a social science approach – Actor Network Theory (ANT) – as an analytical framework which enables us to explore the formation and stabilisation of networks of actors. The analysis details the actors involved within the PSI community,...
The role of ‘the user’ is critical to the development of Web Science, a discipline that seeks to promote a multi-disciplinary understanding of the Web with regards to its evolution and its future. In this paper, we address the formulation of ‘the user’ has in computer science and social science. Our aim is to explore how we might bring these differ...
The role of ‘the user’ is critical to the development of Web Science, a discipline that seeks to promote a multi-disciplinary understanding of the Web with regards to its evolution and its future. In this paper, we address the formulation of ‘the user’ has in computer science and social science. Our aim is to explore how we might bring these differ...
1. ABSTRACT This paper examines the process of adoption of 'Linked Open Data' within the UK Open Public Sector Community. We use a social science approach – Actor Network Theory (ANT) – as an analytical framework which enables us to explore the formation and stabilisation of networks of actors. The analysis details the actors involved within the PS...
Whilst it is widely understood that the Web is a socio-technical phenomenon – produced by both human and non-human actors – existing research tends to emphasize either the social or the technical rather than offering an integrative analytical framework. In contrast, this paper examines the affordances of Actor Network Theory (ANT) – derived from So...