Tony Hey

Tony Hey
Science and Technology Facilities Council | STFC · Scientific Computing Department

About

347
Publications
34,038
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,819
Citations
Additional affiliations
November 2015 - present
STFC Rutherford Appleton Laboratory, Harwell, United Kingdom
Position
  • Researcher
July 2005 - November 2014
Microsoft
Position
  • Vice President
March 2001 - June 2005
Engineering and Physical Sciences Research Council
Position
  • Managing Director

Publications

Publications (347)
Article
Full-text available
Deep learning has transformed the use of machine learning technologies for the analysis of large experimental datasets. In science, such datasets are typically generated by large-scale experimental facilities, and machine learning focuses on the identification of patterns, trends and anomalies to extract meaningful scientific insights from the data...
Article
Knowing the redshift of galaxies is one of the first requirements of many cosmological experiments, and as it’s impossible to perform spectroscopy for every galaxy being observed, photometric redshift (photo-z) estimations are still of particular interest. Here, we investigate different deep learning methods for obtaining photo-z estimates directly...
Preprint
Full-text available
The breakthrough in Deep Learning neural networks has transformed the use of AI and machine learning technologies for the analysis of very large experimental datasets. These datasets are typically generated by large-scale experimental facilities at national laboratories. In the context of science, scientific machine learning focuses on training mac...
Preprint
Full-text available
Knowing the redshift of galaxies is one of the first requirements of many cosmological experiments, and as it's impossible to perform spectroscopy for every galaxy being observed, photometric redshift (photo-z) estimations are still of particular interest. Here, we investigate different deep learning methods for obtaining photo-z estimates directly...
Article
Full-text available
Obtaining accurate photometric redshift (photo-z) estimations is an important aspect of cosmology, remaining a prerequisite of many analyses. In creating novel methods to produce photo-z estimations, there has been a shift towards using machine learning techniques. However, there has not been as much of a focus on how well different machine learnin...
Preprint
Full-text available
Obtaining accurate photometric redshift estimations is an important aspect of cosmology, remaining a prerequisite of many analyses. In creating novel methods to produce redshift estimations, there has been a shift towards using machine learning techniques. However, there has not been as much of a focus on how well different machine learning methods...
Article
Full-text available
Digital technology is having a major impact on many areas of society, and there is equal opportunity for impact on science. This is particularly true in the environmental sciences as we seek to understand the complexities of the natural environment under climate change. This perspective presents the outcomes of a summit in this area, a unique cross...
Conference Paper
There is now broad recognition within the scientific community that the ongoing deluge of scientific data is fundamentally transforming academic research. Turing Award winner Jim Gray referred to this revolution as “The Fourth Paradigm: Data Intensive Scientific Discovery’. Researchers now need tools and technologies to manipulate, analyze, visuali...
Article
Full-text available
This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory (RAL) site at Harwell near Oxford. Such ‘Big Scientific Data’ comes from the Diamond Light Source and Electron Microscopy Facilities, the I...
Preprint
Full-text available
This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory site at Harwell near Oxford. Such "Big Scientific Data" comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Ne...
Article
Full-text available
Book
Computers now impact almost every aspect of our lives, from our social interactions to the safety and performance of our cars. How did this happen in such a short time? And this is just the beginning … In this book, Tony Hey and Gyuri Pápay lead us on a journey from the early days of computers in the 1930s to the cutting-edge research of the presen...
Book
Volume 1 of this revised and updated edition provides an accessible and practical introduction to the first gauge theory included in the Standard Model of particle physics: quantum electrodynamics (QED). The book includes self-contained presentations of electromagnetism as a gauge theory as well as relativistic quantum mechanics. It provides a uniq...
Conference Paper
This presentation will set out the eScience agenda by explaining the current scientific data deluge and the case for a “Fourth Paradigm” for scientific exploration. Examples of data intensive science will be used to illustrate the explosion of data and the associated new challenges for data capture, curation, analysis, and sharing. The role of clou...
Article
Data-intensive science is now taking a place alongside theoretical science, experimental science, and computational science as a fundamental research paradigm.
Chapter
e-Science and Licklider It is no coincidence that it was at CERN, the particle-physics accelerator laboratory in Geneva, that Tim Berners-Lee invented the World Wide Web. Given the distributed nature of the multi-institute collaborations required for modern particle-physics experiments, the particle-physics community desperately needed a tool for e...
Article
We live in an era in which scientific discovery is increasingly driven by data exploration of massive datasets. Scientists today are envisioning diverse data analyses and computations that scale from the desktop to supercomputers, yet often have difficulty designing and constructing software architectures to accommodate the heterogeneous and often...
Article
Full-text available
We are now seeing governments and funding agencies looking at ways to increase the value and pace of scientific research through increased or open access to both data and publications. In this point of view article, we wish to look at another aspect of these twin revolutions, namely, how to enable developers, designers and researchers to build intu...
Article
Facilitating time to discovery by scientists through computing is an idea that has been around for a significant amount of time. Often, however, domain scientists and computer scientists work in parallel rather than in collaboration with one another. It has been a goal of this workshop (the fifth of its kind) to showcase the examples of such collab...
Article
For decades, computer scientists have tried to teach computers to think like human experts. Until recently, most of those efforts have failed to come close to generating the creative insights and solutions that seem to come naturally to the best researchers, doctors, and engineers. But now, Tony Hey, a VP of Microsoft Research, says we're witnessin...
Conference Paper
This talk will review the revolutionary opportunities that access to Cloud Computing Resources will bring to Businesses and Governments, Universities and Consumers. It will include a survey of the different types of Cloud Services and give some specific examples of Microsoft Cloud Services including Azure. The talk will conclude by looking at some...
Article
Full-text available
A number of researchers associated with the field of computing share their views on a smart semantic computing and its role in research. These researchers believe that there is an opportunity to involve users, who have emerged as producers and consumers of information on the Web. BioInformatics research demonstrate the potential benefits of semanti...
Article
Full-text available
The vision of a network of seamlessly integrated distributed services that provide access to computation, data, and other resources is known simply as the 'Grid'. This paper proposes an evolutionary process to standardizing services and protocols necessary to support applications on the Grid. An architecture with a minimal set of services to suppor...
Article
Full-text available
The demands of data-intensive science represent a challenge for diverse scientific communities.
Conference Paper
Full-text available
Data intensive computing presents a significant challenge for traditional supercomputing architectures that maximize FLOPS since CPU speed has surpassed IO capabilities of HPC systems and BeoWulf clusters. We present the architecture for a three tier commodity component cluster designed for a range of data intensive computations operating on petasc...
Article
This chapter finds that computational science has evolved into a more scientific methodology. Computer science professionals are developing software tools and applications to develop a cyberinfrastructure necessary for carrying out scientific research in collaboration with the global research network. The efforts of these professionals have resulte...
Article
Full-text available
Will the possibilities for mass creativity on the Internet be realized or squandered, asks Tony Hey.
Conference Paper
This talk will trace the origins of MPI from the early message-passing, distributed memory, parallel computers in the 1980’s, to today’s parallel supercomputers. In these early days, parallel computing companies implemented proprietary message-passing libraries to support distributed memory parallel programs. At the time, there was also great insta...
Conference Paper
The 20th century brought about an "information revolution" which has forever altered the way we work, communicate, and live. In the 21st century, it is hard to imagine working without an increasingly broad array of supporting technologies and the digital data they provide.The care, management, and preservation of this tidal wave of data has become...
Chapter
Performance visualization is the use of graphical display techniques for the analysis of performance data in order to improve understanding of complex performance phenomena. Performance visualization systems for parallel programs have been helpful in the past and they are commonly used in order to improve parallel program performance. However, desp...
Article
Full-text available
Explains the nature of the ‘e-Science’ revolution in 21st century scientific research and its consequences for the library community. The concepts of e-Science are illustrated by a discussion of the CombeChem, eBank and SmartTea projects. The issue of open access is then discussed with reference to arXiv, PubMed Central and EPrints. The challenges...
Chapter
In this paper we investigate some of the important factors which affect the message-passing performance on parallel computers. Variations of low-level communication benchmarks that approximate realistic cases are tested and compared with the available Parkbench codes and benchmarking techniques. The results demonstrate a substantial divergence betw...
Chapter
The paper discusses the parallel programming lessons learnt from the ESPRIT SuperNode project that developed the T800 Transputer. After a brief review of some purportedly portable parallel programming environments, the Genesis parallel benchmarking project is described. The next generation of Transputer components are being developed in the ESPRIT-...
Chapter
Structural performance analysis of the NAS parallel benchmarks is used to time code sections and specific classes of activity, such as communication or data movements. This technique is called whitebox benchmarking because, similarly to white-box methodologies used in program testing, the programs are not treated as black boxes. The timing methodol...
Conference Paper
Full-text available
The panel will discuss various aspects related to an invitational meeting held at the Mellon Foundation On April 20th and 21st 2006 aimed at identifying concrete steps that can be taken to reach new levels of interoperability across scholarly repositories. The focus of the meeting was specifically on repository interfaces that support locating, ide...
Conference Paper
The Internet was the inspiration of J.C.R.Licklider when he was at the Advanced Research Projects Agency in the 1960's. In those pre-Moore's Law days, Licklider imagined a future in which researchers could access and use computers and data from anywhere in the world. Today, as everyone knows, the killer applications for the Internet were email in t...
Article
Full-text available
Here we describe the requirements of an e-Infrastructure to enable faster, better, and different scientific research capabilities. We use two application exemplars taken from the United Kingdom’s e-Science Programme to illustrate these requirements and make the case for a service-oriented infrastructure.We provide a brief overview of the UK ‘‘pl...
Article
This editorial describes four papers that summarize key Grid technology capabilities to support distributed e-Science applications. These papers discuss the Condor system supporting computing communities, the OGSA-DAI service interfaces for databases, the WS-I+ Grid Service profile and finally WS-GAF (the Web Service Grid Application Framework). We...
Article
Full-text available
The UK e-Science Programme is a £250M, 5 year initiative which has funded over 100 projects. These application-led projects are under-pinned by an emerging set of core middleware services that allow the coordinated, collaborative use of distributed resources. This set of middleware services runs on top of the research network and beneath the applic...
Article
Performance engineering can be described as a collection of techniques and methodologies whose aim is to provide reliable prediction, measurement and validation of the performance of applications on a variety of computing platforms. This paper reviews techniques for performance estimation and performance engineering developed at the University of S...
Article
The U.K. e-Science Programme is a £250 million, five-year initiative which has funded over 100 projects. These application-led projects are underpinned by an emerging set of core middleware services that allow the coordinated, collaborative use of distributed resources. This set of middleware services runs on top of the research network and beneath...
Article
Applications on Grids require scalable and online performance analysis tools. The execution environment of such applications includes a large number of processors. In addition, some of the resources such as the network will be shared with other applications. ...
Conference Paper
Performance engineering can be described as a collection of techniques and methodologies whose aim is to provide reliable prediction, measurement and validation of the performance of applications on a variety of computing platforms. This paper reviews techniques for performance estimation and performance engineering developed at the University of S...
Article
Full-text available
The U.K. e-Science Programme is a £250 million, five-year initiative which has funded over 100 projects. These application-led projects are underpinned by an emerging set of core middleware services that allow the coordinated, collaborative use of distributed resources. This set of middleware services runs on top of the research network and beneath...
Article
Full-text available
Berlin Declaration (2003) defines open access contributions as including: 'original scientific research results, raw data and metadata, source materials, digital representations of pictorial and graphical materials and scholarly multimedia material'. This talk is mainly concerned with open access to data. The purpose of the UK e-Science initiative...
Article
This chapter discusses the revolutionary changes in technology and methodology driving scientific and engineering communities to embrace Grid technologies. Today, the scientific community still leads the way as early attempts in Grid computing evolve to the more sophisticated and ubiquitous “virtual organization.” The UK e-Science concept, the NSF...
Conference Paper
The talk will introduce the concept of e-Science and briefly describe some of the main features of the £250M 5 year UK e-Science Programme. This review will include examples of e-Science applications not only for science and engineering but also for e-Health and the e-Enterprise. The importance of data curation will be emphasized and the move towar...
Article
Full-text available
In November 2000, the government announced a new e-Science initiative in the UK with some £120m funding. In this paper we describe a range of the applications that are now on-going within the e-Science Programme and we detail one of the focuses of the e-Science Core Programme, that of building a UK e-Science Grid. We briefly discuss some of the mid...
Article
The 2004 UK Parliamentary Select Committee on Science and Technology essentially adopted the Southampton recommendations in full, recommending that Open-Access Provision through institutional self-archiving should be made mandatatory for all journal articles resulting from UK-funded research.
Article
The UK should maximise the benefits to the British tax-payer from the research it funds by mandating not only (as it does now) that all findings should be published, but also that open access to them should be provided, for all potential users, through either of the two available means: (1) publishing them in open-access journals (whenever suitable...
Conference Paper
This panel will provide a forum for a discussion of important and timely issues surrounding the global deployment of cyberinfrastructure to support science and engineering research activities. Representatives of funding agencies, existing cyberinfrastructure projects, specific technologies and social scientists involved in the evaluation of these t...
Article
Full-text available
After a definition of e-science and the Grid, the paper begins with an overview of the technological context of Grid developments. NASA's Information Power Grid is described as an early example of a 'prototype production Grid'. The discussion of e-science and the Grid is then set in the context of the UK e-Science Programme and is illustrated with...
Chapter
This paper previews the imminent flood of scientific data expected from the next generation of experiments, simulations, sensors and satellites. In order to be exploited by search engines and data mining software tools, such experimental data needs to be annotated with relevant metadata giving information as to provenance, content, conditions and s...
Chapter
Summary of the BookPart A: OverviewPart B: Architecture and Technologies of the GridPart C: Grid Computing EnvironmentsPart D: Grid ApplicationsReferences
Article
This paper describes the £120M UK ‘e-Science’ (http://www.research-councils.ac.uk/ and http://www.escience-grid.org.uk) initiative and begins by defining what is meant by the term e-Science. The majority of the £120M, some £75M, is funding large-scale e-Science pilot projects in many areas of science and engineering. The infrastructure needed to su...
Article
This paper describes the £120M UK `e-Science' (http://www.research-councils.ac.uk/ and http://www.escience-grid.org.uk) initiative and begins by defining what is meant by the term e-Science. The majority of the £120M, some £75M, is funding large-scale e-Science pilot projects in many areas of science and engineering. The infrastructure needed to su...
Conference Paper
Full-text available
The aim of the project described in this paper was to use modern software component technologies such as CORBA, Java and XML for the development of key modules which can be used for the rapid prototyping of application specific Problem Solving Environments (PSE). The software components developed in this project were a user interface, scheduler, mo...
Article
Full-text available
This paper describes a Web based document management system developed as a Lotus Domino application and the continuing research work of providing users with a variety of link services and agents that enhance the basic content of the system. The system is designed for use by administration personnel in an academic environment taking into account the...
Article
After defining what is meant by the term 'e-Science', this talk will survey the activity on e-Science and Grids in Europe. The two largest initiatives in Europe are the European Commission's portfolio of Grid projects and the UK e-Science program. The EU under its R Framework Program are funding nearly twenty Grid projects in a wide variety of appl...
Article
The article discusses that a UK programme is aiming to dramatically change the way scientific research is done by using grid computing to make collaboration easier and provide access to big computing resources. There are two major components to the programme: the science and the infrastructure to support that science. The infrastructure is generall...
Article
Full-text available
Conference Paper
Full-text available
The talk describes the £ 120M UK ‘e-Science’ initiative and begins by defining what is meant by the term e-Science. The majority of the £ 120M, some £ 85M, is for the support of large-scale e-Science projects in many areas of science and engineering. The infrastructure needed to support such projects needs to allow sharing of distributed and hetero...
Article
This paper describes the £120M UK ‘e-Science’ initiative and begins by defining what is meant by the term e-Science. The majority of the £120M, some £85M, is funding large-scale e-Science pilot projects in many areas of science and engineering. The infrastructure needed to support such projects must permit routine sharing of distributed and heterog...
Article
Full-text available
The paper examines the roles played by superposition and entanglement in quantum computing. The analysis is illustrated by discussion of a “classical” electronic implementation of Grover’s quantum search algorithm. It is shown explicitly that the absence of multi-particle entanglement leads to exponentially rising resources for implementing such qu...
Article
Full-text available
Performance Engineering is concerned with the reliable prediction and estimation of the performance of scientific and engineering applications on a variety of parallel and distributed hardware. This paper reviews the present state of the art in ‘Performance Engineering’ for both parallel computing and meta-computing environments and attempts to loo...