Ben Blamey

Ben Blamey
Cardiff University | CU · School of Computer Science and Informatics

Doctor of Philosophy

About

17
Publications
2,433
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
64
Citations
Introduction
Skills and Expertise

Publications

Publications (17)
Conference Paper
Full-text available
Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative ch...
Article
Full-text available
Background Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered “data hierarchy". We introduce the HASTE Toolkit, a p...
Preprint
Full-text available
This paper introduces the HASTE Toolkit , a cloud-native software toolkit capable of partitioning data streams in order to prioritize usage of limited resources. This in turn enables more efficient data-intensive experiments. We propose a model that introduces automated, autonomous decision making in data pipelines, such that a stream of data can b...
Chapter
Full-text available
Many scientific computing applications generate streams where message sizes exceed one megabyte, in contrast with smaller message sizes in enterprise contexts (order kilobytes, often XML or JSON). Furthermore, the processing cost of messages in scientific computing applications are usually an order of magnitude higher than in typical enterprise app...
Preprint
Full-text available
Data stream processing frameworks provide reliable and efficient mechanisms for executing complex workflows over large datasets. A common challenge for the majority of currently available streaming frameworks is efficient utilization of resources. Most frameworks use static or semi-static settings for resource utilization that work well for establi...
Preprint
Full-text available
Whilst computational resources at the cloud edge can be leveraged to improve latency and reduce the costs of cloud services for a wide variety mobile, web, and IoT applications; such resources are naturally constrained. For distributed stream processing applications, there are clear advantages to offloading some processing work to the cloud edge. M...
Preprint
Full-text available
Top-K queries are an established heuristic in information retrieval. This paper presents an approach for optimal tiered storage allocation under stream processing workloads using this heuristic: those requiring the analysis of only the top-$K$ ranked most relevant, or most interesting, documents from a fixed-length stream, stream window, or batch j...
Preprint
Full-text available
Studies have demonstrated that Apache Spark, Flink and related frameworks can perform stream processing at very high frequencies, whilst tending to focus on small messages with a computationally light `map' stage for each message; a common enterprise use case. We add to these benchmarks by broadening the domain to include loads with larger messages...
Article
Full-text available
Detecting and understanding temporal expressions are key tasks in natural language processing (NLP), and are important for event detection and information retrieval. In the existing approaches, temporal semantics are typically represented as discrete ranges or specific dates, and the task is restricted to text that conforms to this representation....
Chapter
Full-text available
Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative ch...

Network

Cited By