Dennis Gannon

Dennis Gannon
Microsoft · Microsoft Research

ph.d. University of Illinois

About

384
Publications
88,388
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,495
Citations
Introduction
Dennis Gannon recently retired from Microsoft Research, Microsoft. He is a Professor Emeritus at the School of Informatics, Computing and Engineering, Indiana University. Dennis does research in distributed and cloud computing, algorithms, machine learning and software systems. He recently co-authored the Book: Cloud Computing for Science and Engineering (https://Cloud4SciEng.org) and he publishes cloud and ML related projects to https://eScienceGroup.com.
Additional affiliations
January 2015 - August 2016
Microsoft
Position
  • Managing Director
August 2008 - present
August 1985 - August 2008
Indiana University Bloomington
Position
  • Professor
Education
August 1976 - May 1980
University of Illinois, Urbana-Champaign
Field of study
  • Computer Science
August 1972 - June 1976
University of California, Davis
Field of study
  • Mathematics

Publications

Publications (384)
Preprint
Full-text available
We wrote about generative neural networks in two previous blog posts where we promised to return to the topic in a future update. This is it. This article is a review of some of more advances in autoencoders over the last 10 years. We present examples of denoising autoencoders, variational and three different adversarial neural networks. The presen...
Technical Report
Full-text available
Research related to deep learning and its applications is now a substantial part of recent computer science. Much of this work involves building new, advanced models that outperform all others on well-regarded benchmarks. This is an extremely exciting period of basic research. However, for those data scientists and engineers involved in deploying d...
Technical Report
Full-text available
The world of computing is in rapid transition, now dominated by a world of smartphones and cloud services, with profound implications for the future of advanced scientific computing. Simply put, high-performance computing (HPC) is at an important inflection point. For the last 60 years, the world's fastest supercomputers were almost exclusively pro...
Preprint
Full-text available
The world of computing is in rapid transition, now dominated by a world of smartphones and cloud services, with profound implications for the future of advanced scientific computing. Simply put, high-performance computing (HPC) is at an important inflection point. For the last 60 years, the world's fastest supercomputers were almost exclusively pro...
Technical Report
Full-text available
This report describes some recent work on using AI to provide inspiration for mathematical intuition for solving hard problems. The paper describes the use of salience and integrated gradients in discovering the features of data that are related to the way a classifier makes decisions.
Technical Report
Full-text available
AI is the hottest topic in the tech industry. While it is unclear if this is a passing infatuation or a fundamental shift in the IT industry, it certainly has captured the attention of many. Much of the writing in the popular press about AI involves wild predictions or dire warnings. However, for enterprises and researchers the immediate impact of...
Technical Report
Full-text available
In part 1 of this series, we looked at Microsoft Azure Batch and how it can be used to run MPI parallel programs. In this second part we describe Amazon Web Services Batch service and how to use it to schedule MPI jobs using the AWS parallelcluster command pcluster create. We conclude with a brief summary comparing AWS Batch to Azure Batch.
Technical Report
Full-text available
In Chapter 7.2 of our book, we described how to deploy a cluster and run MPI-based parallel program in AWS and Azure. About the time we completed the book, Microsoft and Amazon introduced a suite of new technology that greatly improve the capabilities for mpi-based computation in the public cloud. Google Cloud has provided excellent support for Slu...
Technical Report
Full-text available
This short report describes the core features of the Ray Distributed Object framework. To illustrate Ray’s actor model, we construct a simple microservice-like application with a web server frontend. We compare Ray to similar features in Parsl and provide a small performance analysis. We also provide an illustration of how Ray Tune can optimize the...
Technical Report
Full-text available
In this note we provide a simple illustration of how the 'classic' deep learning tool LSTM can be used to model the solution to time dependent differential equations. We begin with an amazingly simple case: computing the trajectory of a projectile launched into the air, encountering wind resistance as it flies and returns to earth. Given a single s...
Technical Report
Full-text available
This short report is an addendum to the report "A Look at Parsl and FuncX: Two Excellent Parallel Scripting Tools for Clouds and Supercomputers." At the conclusion of that report it was stated that one of the missing pieces of our analysis was a description of how distributed FuncX function instances can communicate with other objects to handle rem...
Technical Report
Full-text available
This is a review of two new parallel and distributed computing tools: Parsl and FuncX.
Technical Report
Full-text available
Knowledge graphs (KGs) have become an important tool for representing knowledge and accelerating search tasks. Formally, a knowledge graph is a graph database formed from entity triples of the form (subject, relation, object) where the subject and object are entity nodes in the graph and the relation defines the edges. When combined with natural la...
Technical Report
Full-text available
This tutorial gives an overview of some of the basic work that has been done over the last five years on the application of deep learning techniques to data represented as graphs. Convolutional neural networks and transformers have been instrumental in the progress on computer vision and natural language understanding. Here we look at the generaliz...
Technical Report
Full-text available
This note provides a gentle introduction to streaming data regression and prediction using Recurrent Neural Networks and Gaussian processes. We look at two examples. In the first we look at daily high temperature for two years at three different, but nearby NOAA weather stations. In the second example we look at daily confirmed infections of the Co...
Technical Report
Full-text available
Over the last two years some very interesting research has emerged that illustrates a fascinating connection between Deep Neural Nets and differential equations. There are two aspects of these discoveries that will be described here. They are 1. Many differential equations (linear, elliptical, non-linear and even stochastic PDEs) can be solved with...
Technical Report
Full-text available
A persistent problem when using deep neural networks in production is the speed of evaluating the network (known as inference) on a single input. While neural network inference has ample opportunities for using parallelism to gain speedup, these techniques are not as easy to exploit as when training the network. In this short report we will look ho...
Research Proposal
Full-text available
In this short note we look at two cloud services that provide anomaly detection. One is the Azure Cognitive Service anomaly detector and the other is from the Amazon Sagemaker AI services. In both cases these services can be (mostly) installed as Docker Containers which can be deployed on a modestly endowed edge computer. We will illustrate them ea...
Technical Report
Full-text available
Chapter 10.4 of 'Cloud Computing for Science and Engineering" described the theory and construction of Recurrent Neural Networks for natural language processing. In the three years since the book's publication the field of language modeling has undergone a substantial revolution. Forget RNNs. Transformers are now in charge, so this report is an upd...
Technical Report
Full-text available
This is a short tutorial about deep learning in the cloud. it is added to our book as a suplimental chapter
Technical Report
Full-text available
In 2018 I published a blog about building a cloud-resident "Research Assistant" (RA) chatbot that would be the companion of each scientist. The RA would be responsible for managing scientific data, notes and publication drafts. It could create intelligent summaries and search for important related scientific articles. That post demonstrated a simpl...
Technical Report
Full-text available
Probabilistic programming languages (PPLs) allow us to model the observed behavior of probabilistic systems in terms its underlying latent variables. Using these models, the PPL provides tools to make inferences concerning the latent variables that give rise to specific observed behaviors. In this short report, we look at two such programming langu...
Technical Report
Full-text available
This was originally written as an invited "vision" presentation for the eScience 2019 conference, but I decided to present something more focused for that event. Never wanting to let words go to waste I decided to put a revised version here. It was written as a look back at the period of eScience from 2019 to 2050. Of course, as it is being publish...
Technical Report
Full-text available
This is a brief exploration of how to manage scientific workflows using Serverless computing in the cloud.
Technical Report
Full-text available
In Part 1 of this article we looked at how quantum computers are now beginning to appear as cloud hosted attached processors and we also looked at how one can go about programming them. The most important question that part 1 failed to address is why quantum computers are interesting? The most well-known uses of a quantum computer are for building...
Technical Report
Full-text available
This report provides a very basic overview of the programming tools for Quantum Computing from IBM and Microsoft.
Technical Report
Full-text available
This brief note is intended to illustrate why the programming language Julia is so interesting to a growing number of computational and data scientists. Julia is designed to deliver high performance on modern hardware while retaining the interactive capabilities that make it well suited for Jupyter-style scientific exploration. This paper illustrat...
Technical Report
Full-text available
Machine learning is a common tool used in all areas of science. Applications range from simple regression models used to explain the behavior of experimental data to novel applications of deep learning. One area that has emerged in the last few years is the use of generative neural networks to produce synthetic samples of data that fit the statisti...
Technical Report
Full-text available
We can use Alexa, Cortana and Siri to check the weather, stream our favorite music and lookup facts like "who directed Casablanca?" But if I want to find all the research on quantum entanglement in the design of topological quantum computers, these services will fall short. If, in addition, I want these articles cataloged in my personal archive and...
Technical Report
Full-text available
A look at the state of the cloud for science in 2018. This paper is based on a talk from the HPDC 2018 Science Cloud workshop.
Technical Report
Full-text available
I am always looking for better ways to write parallel programs. In chapter 7 of our book "Cloud Computing for Science and Engineering" we looked at various scalable parallel programming models that are used in the cloud. We broke these down into five models: (1) HPC-style "Single Program Multiple Data" (SPMD) in which a single program communicates...
Technical Report
Full-text available
Transfer learning is a method to adapt a machine learning model that has been trained for one task to a different task. It has a long history and it is now so well understood that it has been incorporated into cloud services that make it almost trivial to train a new computer vision network for a specialized application. We look at four such servic...
Technical Report
Full-text available
The commercial clouds are in a race to see who can provide the most interesting and useful AI services on their cloud platform. This work began in the research laboratories in universities and companies over the past 25 years, but the big breakthroughs came when deep learning models trained on massive data collections began to reach levels of human...
Working Paper
Full-text available
Technical Report
Full-text available
One area of great frustration encountered by application developers involves the challenge of integrating new algorithms into a code base. There are many reasons for this. For example, the algorithm may be described in a journal article where many details of the implementation are omitted or it is available only in a programming language different...
Technical Report
Full-text available
Cloud computing is going through an interesting evolution. It has gone from a platform for deploying virtual machines to planet-scale systems with extensive collections of data storage, analysis and machine learning services. Most recently we have seen the emergence of "cloud native" computing, which in its most basic form involves a design pattern...
Article
Full-text available
The term “cloud-native” refers to a set of technologies and design patterns that have become the standard for building large-scale cloud applications. In this editorial we describe basic properties of successful cloud applications including dynamic scalability, extreme fault tolerance, seamless upgradeability and maintenance and security. To make i...
Article
During the spring and summer of 2017, the guest editors of this special issue conducted an asynchronous panel discussion among several experts concerning the nature of cloud-native applications.
Technical Report
Full-text available
The Microsoft Azure team has recently released CosmosDB, an new entry to the cloud data storage management marketplace. Cosmos is actually the name of a data storage system that has been used internal to Microsoft for many years. That original Cosmos has now morphed into Azure Data Lake
Technical Report
Full-text available
One of the most useful ways to uncover structure in high dimensional data is to project it down to a subspace, such as a 2-D plane, were hidden features may become visible. Of course, the challenge is to find the plane that best illuminates the structure. Manifold learning is based on the assumption that the system you are trying to model lies on o...
Technical Report
Full-text available
Google has released a beta version of their Cloud Datalab which integrates the IPython Juypter notebook with the Google BigQuery SQL data warehouse. This report describes an introduction to Datalab and its capabilities using an example from the Center for Disease Control and Prevention dataset and another from the National Oceanographic and Atmosph...
Technical Report
Full-text available
Adapted from the blog www.esciencegroup.com) I recently had the pleasure of attending two excellent workshops on the topic of streaming data analytics and science. A goal of the workshops was to understand the state of the art of " big data " streaming applications in scientific research and, if possible, identify common themes and challenges. Call...
Chapter
The National Flood Interoperability Experiment (NFIE) is research initiative among government, academia, and industry to help demonstrate the next generation of national flood hydrology modeling to enable early warning systems and emergency response. The goal of NFIE is to answer the questions-What if it were easier to predict more accurately where...
Working Paper
Full-text available
Working Paper
Full-text available
Technical Report
Full-text available
This TR is derived from a blog post at www.esciecegroup.com. It has been revised to improve the k-means example.)
Technical Report
Full-text available
This TR is derived from a blog post at www.esciecegroup.com. It has been revised to improve the k-means example.)
Research
Full-text available
A study of machine learning applied to scientific document classification based on RSS feeds of science results. From the blog http://www.esciencegroup.com
Patent
Deployment and execution of a service in a multiple datacenter environment may be facilitated using datacenter execution templates. Developers, business managers, and other interested parties may select and/or modify a declarative execution template embodying multiple factors. The execution template may then be used to generate an execution plan, w...
Conference Paper
Full-text available
Microsoft Research is now in its fourth year of awarding Windows Azure cloud resources to the academic community. As of April 2014, over 200 research projects have started. In this paper we review the results of this effort to date. We also characterize the computational paradigms that work well in public cloud environments and those that are usual...
Conference Paper
We generalize MapReduce, Iterative MapReduce and data intensive MPI runtime as a layered Map-Collective architecture with Map-All Gather, Map-All Reduce, MapReduce Merge Broadcast and Map-Reduce Scatter patterns as the initial focus. Map-collectives improve the performance and efficiency of the computations while at the same time facilitating ease...
Data
Full-text available
Microsoft Research is now in its fourth year of awarding Windows Azure cloud resources to the academic community. As of April 2014, over 200 research projects have started. In this paper we review the results of this effort to date. We also characterize the computational paradigms that work well in public cloud environments and those that are usual...
Article
Full-text available
Microsoft Research is now in its fourth year of awarding Windows Azure cloud resources to the academic community. As of April 2014, over 200 research projects have started. In this paper we review the results of this effort to date. We also characterize the computational paradigms that work well in public cloud environments and those that are usual...
Book
Full-text available
This book is the first collection of papers and essays to present a comprehensive view of how parallel computing will transform the experience of using computers. It lays the groundwork for a new generation of systems and applications that will not only change the industry; it will usher in the next revolution in computing.
Article
Full-text available
The availability of workflow management systems and public cloud computing infrastructures have become a major breakthrough in the usage of computing resources for scientists. However, the combination of both approaches has shortcomings, such as the need to reduce administration effort to user, or the need for simple programming models for the tran...
Article
Full-text available
We discuss the use of cloud computing in technical (scientific) applications and identify characteristics such as loosely-coupled and data-intensive that lead to good performance. We give both general principles and several examples with an emphasis on use of the Azure cloud.
Article
Full-text available
Java RMI provides an elegant and powerful model for invoking member functions on objects that exist in remote address spaces. Unfortunately, it is a Java-to-Java communication model and many of the objects we would like java to interact with may be scientiic application written in C++ or Fortran. This paper explores the design of RMI and extracts a...
Article
Full-text available
The Special Issue of Distributed Parallel Databases journal, 2012, discusses novel data processing techniques for this new data-driven world. This data intensive eScience special issue encouraged researchers to submit and present original work related to the latest trends in preservation, movement, access and analysis of massive datasets that requi...
Article
Data-intensive science is now taking a place alongside theoretical science, experimental science, and computational science as a fundamental research paradigm.
Article
Full-text available
New and compelling ideas are transforming the future of computing, bringing about a plethora of changes that have significant implications for our profession and our society and raising some profound technical questions. This Web extra video interview features Dan Reed of Microsoft giving us a sense of how new cloud architectures and cloud capabili...
Article
Full-text available
New and compelling ideas are transforming the future of computing, bringing about a plethora of changes that have significant implications for our profession and our society and raising some profound technical questions. This Web extra video interview features Dan Reed of Microsoft giving us a sense of how new cloud architectures and cloud capabili...
Article
Deadline-sensitive workflows require careful coordination of user constraints with resource availability. Current distributed resource access models provide varying degrees of resource control: from limited or none in grid batch systems to explicit in cloud systems. Additionally applications experience variability due to competing user loads, perfo...
Article
Extending the capabilities of PC, Web, and mobile applications through on-demand cloud services will significantly broaden the research community's capabilities, accelerating the pace of engineering and scientific discovery in this age of data-driven research. The net effect will be the democratization of research capabilities that are now availabl...
Article
In this work we address the management of very large data sets, which need to be stored and processed across many computing sites. The motivation for our work is the ATLAS experiment for the Large Hadron Collider (LHC), where the authors have been involved ...
Conference Paper
Full-text available
Scientific workflows have become an integral part of cyber infrastructure as their computational complexity and data sizes have grown. However, the complexity of the distributed infrastructure makes design of new workflows, determining the right management policies, debugging, testing or reproduction of errors challenging. Today, workflow engines m...
Chapter
The increasing ability for the sciences to sense the world around us is resulting in a growing need for datadriven e-Science applications that are under the control of workflows composed of services on the Grid. The focus of our work is on provenance collection for these workflows that are necessary to validate the work-flow and to determine qualit...
Conference Paper
Full-text available
Today's scientific workflows use distributed heterogeneous resource s through diverse grid and cloud interfaces that are often hard to program. In addition, especially for time-sensitive critical appli- cations, predictable quality of service is necessary across these dis- tributed resources. VGrADS' virtual grid execution system (vgES) provides an...
Conference Paper
Full-text available
The ubiquity of information technology, technological advances, and utility computing trends have motivated large-scale systems, but managing and sustaining these systems is far from trivial. Automatic or semi-automatic monitoring and control are a potential solution to this problem. However, since management scenarios differ from system to system,...