About
20
Publications
2,874
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
196
Citations
Publications
Publications (20)
The communities that we live in affect our health in ways that are complex and hard to define. Moreover, our understanding of the place-based processes affecting health and inequalities is limited. This undermines the development of robust policy interventions to improve local health and well-being. News media provides social and community informat...
We present a demonstrator that visualizes the Twittersphere debate on whether the UK should remain in or leave the European Union. Data is collected using three strategies: hashtag search terms, extraction from the full stream and following specific users. The demonstrator can be used to show the different discussion topics identified by the differ...
Twitter has identified 2,752 accounts that it believes are linked to the Internet Research Agency (IRA), a Russian company that creates online propaganda. These accounts are known to have tweeted about the US 2016 Elections and the list was submitted as evidence by Twitter to the United States Senate Judiciary Subcommittee on Crime and Terrorism. T...
In a review into automated and malicious activity Twitter released a list of accounts that they believed were connected to state sponsored manipulation of the 2016 American Election. This list details 2,752 accounts Twitter believed to be controlled by Russian operatives. In the absence of a similar list of operatives active within the debate on th...
This work is produced by researchers at the Neuropolitics Research Lab, School of Social and Political Science and the School of Informatics at the University of Edinburgh. In this report we provide an analysis of the social media posts on the British general election 2017 over the month running up to the vote. We find that pro-Labour sentiment dom...
Various methods can be used for searching or streaming Twitter data to gather a sample on a specific topic. All of these methods introduce a bias into the resulting datasets. Here we examine, and try to define, the bias that the different strategies introduce. Understanding the bias means that we can extrapolate wider meaning from the data in a mor...
We investigate methods for collecting data to form an archive on the debate within Twitter surrounding the UK's inclusion in the EU. We use three strategies, gathering data using hashtags, extracting data from the random stream and collecting from users known to be discussing the debate. We explore the various bias in the resulting datasets.
Datasets extracted from the microblogging service Twitter are often generated using specific query terms or hashtags. We describe how a dataset produced using the query term ‘syria’ can be increased in size to include tweets on the topic of Syria that do not contain that query term. We compare three methods for this task, using the top hashtags fro...
An increasing amount of news content is produced in audio-video form every day. To effectively analyse and monitoring this multilingual data stream, we require methods to extract and present audio content in accessible ways. In this paper, we describe an end-to-end system for processing and browsing audio news data. This fully automated system brin...
Using text analysis tools to study large data sets is currently an area of popular interest. Prompted by the success of several big data research initiatives, researchers from a variety of disciplines wish to gather and analyse textual data [7]. Communication between members of diverse teams can present a problem and developing a shared language an...
This work investigates summarizing the conversations that occur in the comments section of the UK newspaper the Guardian. In the comment summarization task comments are clustered and ranked within the cluster. The top comments from each cluster are used to give an overview of that cluster. It was found that topic model clustering gave the most agre...
Automatic text analysis tools have significant potential to improve the productivity of those who organise large collections of data. However, to be effective, they have to be both technically efficient and provide a productive interaction with the user. Geographic referencing of historical botanical data is difficult, time consuming and relies hea...
JSTOR is a not for profit organization dedicated to helping the scholarly community discover, use and build upon a large range
of intellectual content in a trusted digital archive. JSTOR has created a new tool called “Data for Research” that allows
users to interact with the corpus in new ways. Using DfR researchers can now explore the content visu...
This poster evaluates the OAI-ORE specifications through experiments providing access to the JSTOR digital archive and the Flickr website. A browser-based dynamic graph visualization tool was designed and tested to determine if making the topology of the information available would provide end-user benefits in terms of navigation and discovery.
The strengths within six library collections were automatically determined through automated enrichment and analysis of bibliographic level metadata records, with a view towards efficient resource sharing and collaborative collection management. This involved very large scale deduplicantion, enrichment and automatic reclassification of records usin...
We describe a curated harvesting approach to creating and maintaining a subject portal, comprising selected records harvested from remote services via information retrieval standards such as SRU, Z39.50 and OAI-PMH. The result was a web-based data curation interface where administrative users can configure access to remote resources, queries to be...