About
370
Publications
46,122
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,177
Citations
Current institution
Additional affiliations
November 2015 - present
STFC Rutherford Appleton Laboratory, Harwell, United Kingdom
Position
- Researcher
March 2001 - June 2005
January 1986 - July 2001
Publications
Publications (370)
With machine learning (ML) becoming a transformative tool for science, the scientific community needs a clear catalogue of ML techniques, and their relative benefits on various scientific problems, if they were to make significant advances in science using AI. Although this comes under the purview of benchmarking, conventional benchmarking initiati...
With the Square Kilometer Array (SKA) project and the new Multi-Purpose Reactor (MPR) soon coming on-line, South Africa and other collaborating countries in Africa will need to make the management, analysis, publication, and curation of “Big Scientific Data” a priority. In addition, the recent draft Open Science policy from the South African Depart...
Deep learning has transformed the use of machine learning technologies for the analysis of large experimental datasets. In science, such datasets are typically generated by large-scale experimental facilities, and machine learning focuses on the identification of patterns, trends and anomalies to extract meaningful scientific insights from the data...
Knowing the redshift of galaxies is one of the first requirements of many cosmological experiments, and as it’s impossible to perform spectroscopy for every galaxy being observed, photometric redshift (photo-z) estimations are still of particular interest. Here, we investigate different deep learning methods for obtaining photo-z estimates directly...
The breakthrough in Deep Learning neural networks has transformed the use of AI and machine learning technologies for the analysis of very large experimental datasets. These datasets are typically generated by large-scale experimental facilities at national laboratories. In the context of science, scientific machine learning focuses on training mac...
Knowing the redshift of galaxies is one of the first requirements of many cosmological experiments, and as it's impossible to perform spectroscopy for every galaxy being observed, photometric redshift (photo-z) estimations are still of particular interest. Here, we investigate different deep learning methods for obtaining photo-z estimates directly...
Obtaining accurate photometric redshift (photo-z) estimations is an important aspect of cosmology, remaining a prerequisite of many analyses. In creating novel methods to produce photo-z estimations, there has been a shift towards using machine learning techniques. However, there has not been as much of a focus on how well different machine learnin...
Obtaining accurate photometric redshift estimations is an important aspect of cosmology, remaining a prerequisite of many analyses. In creating novel methods to produce redshift estimations, there has been a shift towards using machine learning techniques. However, there has not been as much of a focus on how well different machine learning methods...
Digital technology is having a major impact on many areas of society, and there is equal opportunity for impact on science. This is particularly true in the environmental sciences as we seek to understand the complexities of the natural environment under climate change. This perspective presents the outcomes of a summit in this area, a unique cross...
There is now broad recognition within the scientific community that the ongoing deluge of scientific data is fundamentally transforming academic research. Turing Award winner Jim Gray referred to this revolution as “The Fourth Paradigm: Data Intensive Scientific Discovery’. Researchers now need tools and technologies to manipulate, analyze, visuali...
This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory (RAL) site at Harwell near Oxford. Such ‘Big Scientific Data’ comes from the Diamond Light Source and Electron Microscopy Facilities, the I...
This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory site at Harwell near Oxford. Such "Big Scientific Data" comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Ne...
Computers now impact almost every aspect of our lives, from our social interactions to the safety and performance of our cars. How did this happen in such a short time? And this is just the beginning … In this book, Tony Hey and Gyuri Pápay lead us on a journey from the early days of computers in the 1930s to the cutting-edge research of the presen...
Volume 1 of this revised and updated edition provides an accessible and practical introduction to the first gauge theory included in the Standard Model of particle physics: quantum electrodynamics (QED). The book includes self-contained presentations of electromagnetism as a gauge theory as well as relativistic quantum mechanics. It provides a uniq...
This presentation will set out the eScience agenda by explaining the current scientific data deluge and the case for a “Fourth Paradigm” for scientific exploration. Examples of data intensive science will be used to illustrate the explosion of data and the associated new challenges for data capture, curation, analysis, and sharing. The role of clou...
Data-intensive science is now taking a place alongside theoretical science, experimental science, and computational science as a fundamental research paradigm.
Underpinning all the other branches of science, physics affects the way we live our lives, and ultimately how life itself functions. Recent scientific advances have led to dramatic reassessment of our understanding of the world around us, and made a significant impact on our lifestyle. In this book, leading international experts, including Nobel pr...
We live in an era in which scientific discovery is increasingly driven by data exploration of massive datasets. Scientists today are envisioning diverse data analyses and computations that scale from the desktop to supercomputers, yet often have difficulty designing and constructing software architectures to accommodate the heterogeneous and often...
We are now seeing governments and funding agencies looking at ways to increase the value and pace of scientific research through increased or open access to both data and publications. In this point of view article, we wish to look at another aspect of these twin revolutions, namely, how to enable developers, designers and researchers to build intu...
Facilitating time to discovery by scientists through computing is an idea that has been around for a significant amount of time. Often, however, domain scientists and computer scientists work in parallel rather than in collaboration with one another. It has been a goal of this workshop (the fifth of its kind) to showcase the examples of such collab...
For decades, computer scientists have tried to teach computers to think like human experts. Until recently, most of those efforts have failed to come close to generating the creative insights and solutions that seem to come naturally to the best researchers, doctors, and engineers. But now, Tony Hey, a VP of Microsoft Research, says we're witnessin...
This talk will review the revolutionary opportunities that access to Cloud Computing Resources will bring to Businesses and Governments, Universities and Consumers. It will include a survey of the different types of Cloud Services and give some specific examples of Microsoft Cloud Services including Azure. The talk will conclude by looking at some...
A number of researchers associated with the field of computing share their views on a smart semantic computing and its role in research. These researchers believe that there is an opportunity to involve users, who have emerged as producers and consumers of information on the Web. BioInformatics research demonstrate the potential benefits of semanti...
The vision of a network of seamlessly integrated distributed services that provide access to computation, data, and other resources is known simply as the 'Grid'. This paper proposes an evolutionary process to standardizing services and protocols necessary to support applications on the Grid. An architecture with a minimal set of services to suppor...
The demands of data-intensive science represent a challenge for diverse scientific communities.
Data intensive computing presents a significant challenge for traditional supercomputing architectures that maximize FLOPS since CPU speed has surpassed IO capabilities of HPC systems and BeoWulf clusters. We present the architecture for a three tier commodity component cluster designed for a range of data intensive computations operating on petasc...
This chapter finds that computational science has evolved into a more scientific methodology. Computer science professionals are developing software tools and applications to develop a cyberinfrastructure necessary for carrying out scientific research in collaboration with the global research network. The efforts of these professionals have resulte...
Will the possibilities for mass creativity on the Internet be realized or squandered, asks Tony Hey.
This talk will trace the origins of MPI from the early message-passing, distributed memory, parallel computers in the 1980’s, to today’s parallel supercomputers. In these early days, parallel computing companies implemented proprietary message-passing libraries to support distributed memory parallel programs. At the time, there was also great insta...
The 20th century brought about an "information revolution" which has forever altered the way we work, communicate, and live. In the 21st century, it is hard to imagine working without an increasingly broad array of supporting technologies and the digital data they provide.The care, management, and preservation of this tidal wave of data has become...
Performance visualization is the use of graphical display techniques for the analysis of performance data in order to improve understanding of complex performance phenomena. Performance visualization systems for parallel programs have been helpful in the past and they are commonly used in order to improve parallel program performance. However, desp...
Purpose
The purpose of this article is to explain the nature of the “e‐Science’ revolution in twenty‐first century scientific research and its consequences for the library community.
Design/methodology/approach
The concepts of e‐Science are illustrated by a discussion of the CombeChem, eBank and SmartTea projects. The issue of open access is then...
The panel will discuss various aspects related to an invitational meeting held at the Mellon Foundation On April 20th and 21st 2006 aimed at identifying concrete steps that can be taken to reach new levels of interoperability across scholarly repositories. The focus of the meeting was specifically on repository interfaces that support locating, ide...
In this paper we investigate some of the important factors which affect the message-passing performance on parallel computers. Variations of low-level communication benchmarks that approximate realistic cases are tested and compared with the available Parkbench codes and benchmarking techniques. The results demonstrate a substantial divergence betw...
The paper discusses the parallel programming lessons learnt from the ESPRIT SuperNode project that developed the T800 Transputer. After a brief review of some purportedly portable parallel programming environments, the Genesis parallel benchmarking project is described. The next generation of Transputer components are being developed in the ESPRIT-...
Structural performance analysis of the NAS parallel benchmarks is used to time code sections and specific classes of activity,
such as communication or data movements. This technique is called whitebox benchmarking because, similarly to white-box methodologies used in program testing, the programs are not treated as black boxes. The timing
methodol...
The Internet was the inspiration of J.C.R.Licklider when he was at the Advanced Research Projects Agency in the 1960's. In those pre-Moore's Law days, Licklider imagined a future in which researchers could access and use computers and data from anywhere in the world. Today, as everyone knows, the killer applications for the Internet were email in t...
Here we describe the requirements of an e-Infrastructure to enable faster,
better, and different scientific research capabilities. We use two
application exemplars taken from the United Kingdom’s e-Science Programme
to illustrate these requirements and make the case for a service-oriented infrastructure.We provide a brief overview of the UK ‘‘pl...
This editorial describes four papers that summarize key Grid technology capabilities to support distributed e-Science applications. These papers discuss the Condor system supporting computing communities, the OGSA-DAI service interfaces for databases, the WS-I+ Grid Service profile and finally WS-GAF (the Web Service Grid Application Framework). We...
The UK e-Science Programme is a £250M, 5 year initiative which has funded over 100 projects. These application-led projects are under-pinned by an emerging set of core middleware services that allow the coordinated, collaborative use of distributed resources. This set of middleware services runs on top of the research network and beneath the applic...
Performance engineering can be described as a collection of techniques and methodologies whose aim is to provide reliable prediction, measurement and validation of the performance of applications on a variety of computing platforms. This paper reviews techniques for performance estimation and performance engineering developed at the University of S...
The U.K. e-Science Programme is a £250 million, five-year initiative which has funded over 100 projects. These application-led projects are underpinned by an emerging set of core middleware services that allow the coordinated, collaborative use of distributed resources. This set of middleware services runs on top of the research network and beneath...
Applications on Grids require scalable and online performance analysis tools. The execution environment of such applications includes a large number of processors. In addition, some of the resources such as the network will be shared with other applications. ...
Performance engineering can be described as a collection of techniques and methodologies whose aim is to provide reliable prediction, measurement and validation of the performance of applications on a variety of computing platforms. This paper reviews techniques for performance estimation and performance engineering developed at the University of S...
The U.K. e-Science Programme is a £250 million, five-year initiative which has funded over 100 projects. These application-led projects are underpinned by an emerging set of core middleware services that allow the coordinated, collaborative use of distributed resources. This set of middleware services runs on top of the research network and beneath...
Berlin Declaration (2003) defines open access contributions as including: 'original scientific research results, raw data and metadata, source materials, digital representations of pictorial and graphical materials and scholarly multimedia material'. This talk is mainly concerned with open access to data. The purpose of the UK e-Science initiative...
This chapter discusses the revolutionary changes in technology and methodology driving scientific and engineering communities to embrace Grid technologies. Today, the scientific community still leads the way as early attempts in Grid computing evolve to the more sophisticated and ubiquitous “virtual organization.” The UK e-Science concept, the NSF...
This panel will provide a forum for a discussion of important and timely issues surrounding the global deployment of cyberinfrastructure to support science and engineering research activities. Representatives of funding agencies, existing cyberinfrastructure projects, specific technologies and social scientists involved in the evaluation of these t...
The talk will introduce the concept of e-Science and briefly describe some of the main features of the £250M 5 year UK e-Science Programme. This review will include examples of e-Science applications not only for science and engineering but also for e-Health and the e-Enterprise. The importance of data curation will be emphasized and the move towar...
In November 2000, the government announced a new e-Science initiative in the UK with some £120m funding. In this paper we describe a range of the applications that are now on-going within the e-Science Programme and we detail one of the focuses of the e-Science Core Programme, that of building a UK e-Science Grid. We briefly discuss some of the mid...
The 2004 UK Parliamentary Select Committee on Science and Technology essentially adopted the Southampton recommendations in full, recommending that Open-Access Provision through institutional self-archiving should be made mandatatory for all journal articles resulting from UK-funded research.
The UK should maximise the benefits to the British tax-payer from the research it funds by mandating not only (as it does now) that all findings should be published, but also that open access to them should be provided, for all potential users, through either of the two available means: (1) publishing them in open-access journals (whenever suitable...
After a definition of e-science and the Grid, the paper begins with an overview of the technological context of Grid developments. NASA's Information Power Grid is described as an early example of a 'prototype production Grid'. The discussion of e-science and the Grid is then set in the context of the UK e-Science Programme and is illustrated with...
Following the success of The Quantum Universe, first published in 1987, a host of exciting new discoveries have been made in the field of quantum mechanics. The New Quantum Universe provides an up-to-date and accessible introduction to the essential ideas of quantum physics, and demonstrates how it affects our everyday life. Quantum mechanics gives...
This paper previews the imminent flood of scientific data expected from the next generation of experiments, simulations, sensors and satellites. In order to be exploited by search engines and data mining software tools, such experimental data needs to be annotated with relevant metadata giving information as to provenance, content, conditions and s...
Summary of the BookPart A: OverviewPart B: Architecture and Technologies of the GridPart C: Grid Computing EnvironmentsPart D: Grid ApplicationsReferences
IntroductionGrid ApplicationsReferences
This paper describes the £120M UK ‘e-Science’ (http://www.research-councils.ac.uk/ and http://www.escience-grid.org.uk) initiative and begins by defining what is meant by the term e-Science. The majority of the £120M, some £75M, is funding large-scale e-Science pilot projects in many areas of science and engineering. The infrastructure needed to su...