Geoffrey Charles Fox

Geoffrey Charles Fox
University of Virginia | UVa · Department of Computer Science

Cambridge University PhD Physics
Applications of Deep Learning (for time series) and software systems to support integrated AI and data engineering.

About

1,433
Publications
242,036
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
25,981
Citations
Introduction
Fox has a Ph.D. in Theoretical Physics from Cambridge Univ. where he was Senior Wrangler. Biocomplexity Institute and Initiative and Computer Science Department, University of Virginia. He has supervised the Ph.D. of 75 students. He received the HPDC Achievement Award and the ACM - IEEE CS Ken Kennedy Award for Foundational contributions to parallel computing in 2019. He works on the interdisciplinary interface between computing and applications.
Additional affiliations
July 2021 - August 2021
University of Virginia
Position
  • Professor
Description
  • I am part of Biocomplexity Institue and Initiative in the division of Division of Network Systems Science and Advanced Computing NSSAC
July 2001 - July 2021
Indiana University Bloomington
Position
  • Professor (Full)
July 2000 - June 2001
Florida State University
Position
  • Professor (Full)
Education
July 1964 - June 1967
University of Cambridge
Field of study
  • Physics
September 1961 - June 1964
University of Cambridge
Field of study
  • Mathematics

Publications

Publications (1,433)
Preprint
Full-text available
Yes. Interval statistics have been used to conclude that major earthquakes are random events in time and cannot be anticipated or predicted. Machine learning is a powerful new technique that enhances our ability to understand the information content of earthquake catalogs. We show that catalogs contain significant information on current hazard and...
Preprint
Full-text available
Go to the ESSOAR preprint server https://www.essoar.org/doi/abs/10.1002/essoar.10510940.5
Article
Full-text available
Classical molecular dynamics simulations are based on solving Newton’s equations of motion. Using a small timestep, numerical integrators such as Verlet generate trajectories of particles as solutions to Newton’s equations. We introduce operators derived using recurrent neural networks that accurately solve Newton’s equations utilizing sequences of...
Article
We review previous approaches to nowcasting earthquakes and introduce new approaches based on deep learning using three distinct models based on recurrent neural networks and transformers. We discuss different choices for observables and measures presenting promising initial results for a region of Southern California from 1950–2020. Earthquake act...
Article
Full-text available
Deep learning has transformed the use of machine learning technologies for the analysis of large experimental datasets. In science, such datasets are typically generated by large-scale experimental facilities, and machine learning focuses on the identification of patterns, trends and anomalies to extract meaningful scientific insights from the data...
Article
Full-text available
The earthquake cycle of stress accumulation and release is associated with the elastic rebound hypothesis proposed by H.F. Reid following the M7.9 San Francisco earthquake of 1906. However, observing details of the actual values of time- and space-dependent tectonic stress is not possible at the present time. In two previous papers, we have propose...
Article
Full-text available
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definitio...
Preprint
Full-text available
Spatiotemporal time series nowcasting should preserve temporal and spatial dynamics in the sense that generated new sequences from models respect the covariance relationship from history. Conventional feature extractors are built with deep convolutional neural networks (CNN). However, CNN models have limits to image-like applications where data can...
Preprint
Full-text available
We review previous approaches to nowcasting earthquakes and introduce new approaches based on deep learning using three distinct models based on recurrent neural networks and transformers. We discuss different choices for observables and measures presenting promising initial results for a region of Southern California from 1950-2020. Earthquake act...
Article
Full-text available
We propose a new machine learning-based method for nowcasting earthquakes to image the time-dependent earthquake cycle. The result is a timeseries which may correspond to the process of stress accumulation and release. The timeseries is constructed by using Principal Component Analysis of regional seismicity. The patterns are found as eigenvectors...
Preprint
Full-text available
The breakthrough in Deep Learning neural networks has transformed the use of AI and machine learning technologies for the analysis of very large experimental datasets. These datasets are typically generated by large-scale experimental facilities at national laboratories. In the context of science, scientific machine learning focuses on training mac...
Preprint
Full-text available
Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair an...
Conference Paper
Full-text available
Data-intensive applications impact many domains, and their steadily increasing size and complexity demands highperformance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering frameworks. They employ a set of operators on specific data abstractions that include vectors, matrices, tensors, g...
Preprint
Full-text available
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definitio...
Article
Full-text available
Free Unified Rendering in pYthon (FURY), is a community-driven, open-source, and highperformance scientific visualization library that harnesses the graphics processing unit (GPU) for improved speed, precise interactivity, and visual clarity. FURY provides an integrated API in Python that allows UI elements and 3D graphics to be programmed together...
Preprint
Full-text available
Data-intensive applications impact many domains, and their steadily increasing size and complexity demands high-performance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering frameworks. They employ a set of operators on specific data abstractions that include vectors, matrices, tensors,...
Article
Full-text available
Twister2 is an open‐source big data hosting environment designed to process both batch and streaming data at scale. Twister2 runs jobs in both high‐performance computing (HPC) and big data clusters. It provides a cross‐platform resource scheduler to run jobs in diverse environments. Twister2 is designed with a layered architecture to support variou...
Article
Full-text available
In many mechanistic medical, biological, physical, and engineered spatiotemporal dynamic models the numerical solution of partial differential equations (PDEs), especially for diffusion, fluid flow and mechanical relaxation, can make simulations impractically slow. Biological models of tissues and organs often require the simultaneous calculation o...
Preprint
Full-text available
Multidimensional scaling of gene sequence data has long played a vital role in analysing gene sequence data to identify clusters and patterns. However the computation complexities and memory requirements of state-of-the-art dimensional scaling algorithms make it infeasible to scale to large datasets. In this paper we present an autoencoder-based di...
Article
Support vector machines (SVM) is a widely used machine learning algorithm. With the increasing amount of research data nowadays, understanding how to do efficient training is more important than ever. This article discusses the performance optimizations and benchmarks related to providing high‐performance support for SVM training. In this research,...
Article
Full-text available
Background In this work, we aimed to demonstrate how to utilize the lab test results and other clinical information to support precision medicine research and clinical decisions on complex diseases, with the support of electronic medical record facilities. We defined “clinotypes” as clinical information that could be observed and measured objective...
Preprint
Full-text available
In many mechanistic medical, biological, physical and engineered spatiotemporal dynamic models the numerical solution of partial differential equations (PDEs) can make simulations impractically slow. Biological models require the simultaneous calculation of the spatial variation of concentration of dozens of diffusing chemical species. Machine lear...
Preprint
Seismology from the past few decades has utilized the most advanced technologies and equipment to monitor seismic events globally. However, forecasting disasters like earthquakes is still an underdeveloped topic from the history. Recent researches in spatiotemporal forecasting have revealed some possibilities of successful predictions, which become...
Conference Paper
Full-text available
In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Reductions/ aggregations are an integral functionalit...
Preprint
Full-text available
We show that one can study several sets of sequences or time-series in terms of an underlying evolution operator which can be learned with a deep learning network. We use the language of geospatial time series as this is a common application type but the series can be any sequence and the sequences can be in any collection (bag)-not just Euclidean...
Preprint
Full-text available
In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Reductions/ aggregations are an integral functionalit...
Preprint
Full-text available
The SPIDAL (Scalable Parallel Interoperable Data Analytics Library) project was begun in Fall 2014 and has reached a technical completion in Fall 2020 with outreach activities continuing in 2021. The February Poster summarizes the 2020 status and activity very well with previous work through September 2018 summarized in a book chapter with extensiv...
Preprint
Full-text available
Billions of text analysis requests containing private emails, personal text messages, and sensitive online reviews, are processed by recurrent neural networks (RNNs) deployed on public clouds every day. Although prior secure networks combine homomorphic encryption (HE) and garbled circuit (GC) to preserve users' privacy, naively adopting the HE and...
Preprint
Full-text available
Understanding the structure of the ice at the Earth's poles is important for modeling how global warming will impact polar ice and, in turn, the Earth's climate. Ground-penetrating radar is able to collect observations of the internal structure of snow and ice, but the process of manually labeling these observations with layer boundaries is slow an...
Article
The dataflow model is gradually becoming the de facto standard for big data applications. While many popular frameworks are built around this model, very little research has been done on understanding its inner workings, which in turn has led to inefficiencies in existing frameworks. It is important to note that understanding the relationship betwe...
Preprint
Full-text available
Data engineering is becoming an increasingly important part of scientific discoveries with the adoption of deep learning and machine learning. Data engineering deals with a variety of data formats, storage, data extraction, transformation, and data movements. One goal of data engineering is to transform data from original data to vector/matrix/tens...
Preprint
Full-text available
Molecular dynamics (MD) simulations accelerated by high-performance computing (HPC) methods are powerful tools for investigating and extracting the microscopic mechanisms characterizing the properties of soft materials such as self-assembled nanoparticles, virus capsids, confined electrolytes, and polymeric fluids. However, despite the employment o...
Preprint
Full-text available
The amazing advances being made in the fields of machine and deep learning are a highlight of the Big Data era for both enterprise and research communities. Modern applications require resources beyond a single node's ability to provide. However this is just a small part of the issues facing the overall data processing environment, which must also...
Preprint
Full-text available
The COVID-19 pandemic has profound global consequences on health, economic , social, political, and almost every major aspect of human life. Therefore, it is of great importance to model COVID-19 and other pandemics in terms of the broader social contexts in which they take place. We present the architecture of AICov, which provides an integrative...
Preprint
Full-text available
Twister2 is an open source big data hosting environment designed to process both batch and streaming data at scale. Twister2 runs jobs in both high performance computing (HPC) and big data clusters. It provides a cross-platform resource scheduler to run jobs in diverse environments. Twister2 is designed with a layered architecture to support variou...
Article
Full-text available
The direction of computing is affected and lead by several trends. First, we have the Data Overwhelm from Commercial sources (e.g. Amazon), Community sources (e.g. Twitter), and Scientific applications (e.g. Genomics). Next, we have several light weight clients belong to many devices spanning from smartphones, tablets to sensors. Then, clouds are g...
Preprint
Full-text available
We show that one can study several time-series in terms of an underlying time evolution operator which can be learned with a recurrent deep learning network. This has been shown for Newton’s laws for particles and Covid case and death data from observation and models while other work has studied this successfully in transportation systems. We propo...
Preprint
Full-text available
Classical molecular dynamics simulations are based on Newton’s equations of motion and rely on numerical integrators to solve them. Using a small timestep to avoid discretization errors, Verlet integrators generate a trajectory of particle positions as solutions to Newton’s equations. We introduce an integrator based on deep neural networks that is...
Chapter
This chapter outlines a vision for how best to harness the computing continuum of interconnected sensors, actuators, instruments, and computing systems, from small numbers of very large devices to large numbers of very small devices. The hypothesis is that only via a continuum perspective one can intentionally specify desired continuum actions and...