Geoffrey Charles Fox

Geoffrey Charles Fox
University of Virginia | UVa · Department of Computer Science

Cambridge University PhD Physics
Applications of Deep Learning (for time series) and software systems to support integrated AI and data engineering.

About

1,477
Publications
300,174
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
27,514
Citations
Introduction
Fox has a Ph.D. in Theoretical Physics from Cambridge Univ. where he was Senior Wrangler. Biocomplexity Institute and Initiative and Computer Science Department, University of Virginia. He has supervised the Ph.D. of 75 students. He received the HPDC Achievement Award and the ACM - IEEE CS Ken Kennedy Award for Foundational contributions to parallel computing in 2019. He works on the interdisciplinary interface between computing and applications.
Additional affiliations
July 2001 - July 2023
Indiana University Bloomington
Position
  • Professor (Full)
July 2000 - June 2001
Florida State University
Position
  • Professor (Full)
July 1990 - June 2000
Syracuse University
Position
  • Professor of Computer Science and Professor of Physics
Education
July 1964 - June 1967
University of Cambridge
Field of study
  • Physics
September 1961 - June 1964
University of Cambridge
Field of study
  • Mathematics

Publications

Publications (1,477)
Article
Full-text available
MLCommons is an effort to develop and improve the artificial intelligence (AI) ecosystem through benchmarks, public data sets, and research. It consists of members from start-ups, leading companies, academics, and non-profits from around the world. The goal is to make machine learning better for everyone. In order to increase participation by other...
Article
Full-text available
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding pr...
Preprint
Full-text available
The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process...
Conference Paper
Full-text available
Currently, AI and, in particular, deep learning play a major role in science, from data analytics and simulation surrogates to policy and system decisions. This role is likely to increase as ideas from early adopters spread across all academic fields. One can group the structure of "AI for Science" into a few patterns, where one needs to explore ex...
Conference Paper
With the significant development of the Internet of Things and low-cost cloud services, the sensory and data processing requirements of IoT systems are continually going up. TrustZone is a hardware-protected Trusted Execution Environment (TEE) for ARM processors specifically designed for IoT handheld systems. It provides memory isolation techniques...
Chapter
Full-text available
Data analytics has become a critical component in business, where organizations aim to harness their data for better insights and informed decision-making. This has led to the rise of data engineering, which focuses on constructing scalable and efficient data pipelines to handle data's increasing volume and complexity. High-performance data enginee...
Chapter
The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily influenced this transformation. However, most widely used serial Dataframes today (R, pandas) experience performa...
Article
Full-text available
Plain Language Summary The question of whether earthquake occurrence is random in time, or perhaps chaotic with order hidden in the chaos, is of major importance to the determination of risk from these events. It was shown many years ago that if aftershocks are removed from the earthquake catalogs, what remains are apparently events that occur at r...
Preprint
Full-text available
Neural networks (NNs) have proven to be a viable alternative to traditional direct numerical algorithms, with the potential to accelerate computational time by several orders of magnitude. In the present paper we study the use of encoder-decoder convolutional neural network (CNN) as surrogates for steady-state diffusion solvers. The construction of...
Preprint
Full-text available
The data engineering and data science community has embraced the idea of using Python & R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these applications are now essential in order to process terabytes of data. They can easily exceed the capabilities of a single machine, but also demand signifi...
Chapter
Full-text available
With machine learning (ML) becoming a transformative tool for science, the scientific community needs a clear catalogue of ML techniques, and their relative benefits on various scientific problems, if they were to make significant advances in science using AI. Although this comes under the purview of benchmarking, conventional benchmarking initiati...
Preprint
Full-text available
Data pre-processing is a fundamental component in any data-driven application. With the increasing complexity of data processing operations and volume of data, Cylon, a distributed dataframe system, is developed to facilitate data processing both as a standalone application and as a library, especially for Python applications. While Cylon shows pro...
Preprint
Full-text available
In this paper, we summarize our effort to create and utilize a simple framework to coordinate computational analytics tasks with the help of a workflow system. Our design is based on a minimalistic approach while at the same time allowing to access computational resources offered through the owner's computer, HPC computing centers, cloud resources,...
Article
Full-text available
Nowcasting is a term originating from economics, finance, and meteorology. It refers to the process of determining the uncertain state of the economy, markets or the weather at the current time by indirect means. In this paper, we describe a simple two‐parameter data analysis that reveals hidden order in otherwise seemingly chaotic earthquake seism...
Conference Paper
What could a research team accomplish if given access to the latest GPU hardware? A molecular dynamics research team could process protein-ligand tests using 100 GPUs on Delta in about 3.6 days, a task that would take their local lab a full year using their own GPU hardware. A team exploring the wonders of cosmic rays is now part of the multi-messe...
Preprint
Full-text available
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding pr...
Preprint
Full-text available
The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily influenced this transformation. However, most widely used serial Dataframes today (R, pandas) experience performa...
Preprint
Full-text available
Yes. Interval statistics have been used to conclude that major earthquakes are random events in time and cannot be anticipated or predicted. Machine learning is a powerful new technique that enhances our ability to understand the information content of earthquake catalogs. We show that catalogs contain significant information on current hazard and...
Preprint
Full-text available
Go to the ESSOAR preprint server https://www.essoar.org/doi/abs/10.1002/essoar.10510940.5
Article
Full-text available
We review previous approaches to nowcasting earthquakes and introduce new approaches based on deep learning using three distinct models based on recurrent neural networks and transformers. We discuss different choices for observables and measures presenting promising initial results for a region of Southern California from 1950–2020. Earthquake act...
Article
Full-text available
Deep learning has transformed the use of machine learning technologies for the analysis of large experimental datasets. In science, such datasets are typically generated by large-scale experimental facilities, and machine learning focuses on the identification of patterns, trends and anomalies to extract meaningful scientific insights from the data...
Article
Full-text available
Classical molecular dynamics simulations are based on solving Newton’s equations of motion. Using a small timestep, numerical integrators such as Verlet generate trajectories of particles as solutions to Newton’s equations. We introduce operators derived using recurrent neural networks that accurately solve Newton’s equations utilizing sequences of...
Article
Full-text available
The earthquake cycle of stress accumulation and release is associated with the elastic rebound hypothesis proposed by H.F. Reid following the M7.9 San Francisco earthquake of 1906. However, observing details of the actual values of time- and space-dependent tectonic stress is not possible at the present time. In two previous papers, we have propose...
Article
Full-text available
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definitio...
Preprint
Full-text available
Spatiotemporal time series nowcasting should preserve temporal and spatial dynamics in the sense that generated new sequences from models respect the covariance relationship from history. Conventional feature extractors are built with deep convolutional neural networks (CNN). However, CNN models have limits to image-like applications where data can...
Preprint
Full-text available
We review previous approaches to nowcasting earthquakes and introduce new approaches based on deep learning using three distinct models based on recurrent neural networks and transformers. We discuss different choices for observables and measures presenting promising initial results for a region of Southern California from 1950-2020. Earthquake act...
Article
Full-text available
We propose a new machine learning-based method for nowcasting earthquakes to image the time-dependent earthquake cycle. The result is a timeseries which may correspond to the process of stress accumulation and release. The timeseries is constructed by using Principal Component Analysis of regional seismicity. The patterns are found as eigenvectors...
Preprint
Full-text available
The breakthrough in Deep Learning neural networks has transformed the use of AI and machine learning technologies for the analysis of very large experimental datasets. These datasets are typically generated by large-scale experimental facilities at national laboratories. In the context of science, scientific machine learning focuses on training mac...
Preprint
Full-text available
Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair an...
Conference Paper
Full-text available
Data-intensive applications impact many domains, and their steadily increasing size and complexity demands highperformance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering frameworks. They employ a set of operators on specific data abstractions that include vectors, matrices, tensors, g...
Preprint
Full-text available
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definitio...
Article
Full-text available
Free Unified Rendering in pYthon (FURY), is a community-driven, open-source, and highperformance scientific visualization library that harnesses the graphics processing unit (GPU) for improved speed, precise interactivity, and visual clarity. FURY provides an integrated API in Python that allows UI elements and 3D graphics to be programmed together...
Preprint
Full-text available
Data-intensive applications impact many domains, and their steadily increasing size and complexity demands high-performance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering frameworks. They employ a set of operators on specific data abstractions that include vectors, matrices, tensors,...
Article
Full-text available
Twister2 is an open‐source big data hosting environment designed to process both batch and streaming data at scale. Twister2 runs jobs in both high‐performance computing (HPC) and big data clusters. It provides a cross‐platform resource scheduler to run jobs in diverse environments. Twister2 is designed with a layered architecture to support variou...
Article
Full-text available
In many mechanistic medical, biological, physical, and engineered spatiotemporal dynamic models the numerical solution of partial differential equations (PDEs), especially for diffusion, fluid flow and mechanical relaxation, can make simulations impractically slow. Biological models of tissues and organs often require the simultaneous calculation o...
Preprint
Full-text available
Multidimensional scaling of gene sequence data has long played a vital role in analysing gene sequence data to identify clusters and patterns. However the computation complexities and memory requirements of state-of-the-art dimensional scaling algorithms make it infeasible to scale to large datasets. In this paper we present an autoencoder-based di...
Article
Support vector machines (SVM) is a widely used machine learning algorithm. With the increasing amount of research data nowadays, understanding how to do efficient training is more important than ever. This article discusses the performance optimizations and benchmarks related to providing high‐performance support for SVM training. In this research,...