Geoffrey Charles Fox

Geoffrey Charles Fox
University of Virginia | UVa · Department of Computer Science

Cambridge University PhD Physics
Applications of Deep Learning (for time series) and software systems to support integrated AI and data engineering.

About

1,501
Publications
339,755
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
28,559
Citations
Introduction
Fox has a Ph.D. in Theoretical Physics from Cambridge Univ. where he was Senior Wrangler. Biocomplexity Institute and Initiative and Computer Science Department, University of Virginia. He has supervised the Ph.D. of 75 students. He received the HPDC Achievement Award and the ACM - IEEE CS Ken Kennedy Award for Foundational contributions to parallel computing in 2019. He works on the interdisciplinary interface between computing and applications.
Additional affiliations
July 2001 - July 2023
Indiana University Bloomington
Position
  • Professor (Full)
July 1970 - August 1970
Argonne National Laboratory
Position
  • Visiting Researcher
July 2000 - June 2001
Florida State University
Position
  • Professor (Full)
Education
July 1964 - June 1967
University of Cambridge
Field of study
  • Physics
September 1961 - June 1964
University of Cambridge
Field of study
  • Mathematics

Publications

Publications (1,501)
Preprint
Full-text available
With the approach of the High Luminosity Large Hadron Collider (HL-LHC) era set to begin particle collisions by the end of this decade, it is evident that the computational demands of traditional collision simulation methods are becoming increasingly unsustainable. Existing approaches, which rely heavily on first-principles Monte Carlo simulations...
Article
Full-text available
Advancing the capabilities of earthquake nowcasting, the real-time forecasting of seismic activities, remains crucial for reducing casualties. This multifaceted challenge has recently gained attention within the deep learning domain, facilitated by the availability of extensive earthquake datasets. Despite significant advancements, the existing lit...
Preprint
Full-text available
Deep learning has proven very promising for interpreting MRI in brain tumor diagnosis. However, deep learning models suffer from a scarcity of brain MRI datasets for effective training. Self-supervised learning (SSL) models provide data-efficient and remarkable solutions to limited dataset problems. Therefore, this paper introduces a generative SSL...
Preprint
Full-text available
Particle collisions at accelerators such as the Large Hadron Collider, recorded and analyzed by experiments such as ATLAS and CMS, enable exquisite measurements of the Standard Model and searches for new phenomena. Simulations of collision events at these detectors have played a pivotal role in shaping the design of future experiments and analyzing...
Preprint
Full-text available
This research is part of a systematic study of scientific time series. In the last three years, hundreds of papers and over fifty new deep-learning models have been described for time series models. These mainly focus on the key aspect of time dependence, whereas in some scientific time series, the situation is more complex with multiple locations,...
Preprint
Full-text available
Redshift prediction is a fundamental task in astronomy, essential for understanding the expansion of the universe and determining the distances of astronomical objects. Accurate redshift prediction plays a crucial role in advancing our knowledge of the cosmos. Machine learning (ML) methods, renowned for their precision and speed, offer promising so...
Preprint
Full-text available
3D object detection is critical for autonomous driving, leveraging deep learning techniques to interpret LiDAR data. The PointPillars architecture is a prominent model in this field, distinguished by its efficient use of LiDAR data. This study provides an analysis of enhancing the performance of PointPillars model under various dropout rates to add...
Preprint
Full-text available
Advancing the capabilities of earthquake nowcasting, the real-time forecasting of seismic activities remains a crucial and enduring objective aimed at reducing casualties. This multifaceted challenge has recently gained attention within the deep learning domain, facilitated by the availability of extensive, long-term earthquake datasets. Despite si...
Article
Full-text available
The data engineering and data science community has embraced the idea of using Python and R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these frameworks are now ever more important in order to process terabytes of data. They can easily exceed the capabilities of a single machine but also deman...
Preprint
Full-text available
High-performance scientific simulations, important for comprehension of complex systems, encounter computational challenges especially when exploring extensive parameter spaces. There has been an increasing interest in developing deep neural networks (DNNs) as surrogate models capable of accelerating the simulations. However, existing approaches fo...
Preprint
Full-text available
Earthquake nowcasting has been proposed as a means of tracking the change in large earthquake potential in a seismically active area. The method was developed using observable seismic data, in which probabilities of future large earthquakes can be computed using Receiver Operating Characteristic (ROC) methods. Furthermore, analysis of the Shannon i...
Preprint
Full-text available
Earthquake nowcasting has been proposed as a means of tracking the change in large earthquake potential in a seismically active area. The method was developed using observable seismic data, in which probabilities of future large earthquakes can be computed using Receiver Operating Characteristic (ROC) methods. Furthermore, analysis of the Shannon i...
Preprint
Full-text available
The pursuit of understanding fundamental particle interactions has reached unparalleled precision levels. Particle physics detectors play a crucial role in generating low-level object signatures that encode collision physics. However, simulating these particle collisions is a demanding task in terms of memory and computation which will be exasperat...
Preprint
Full-text available
Deep learning datasets are expanding at an unprecedented pace, creating new challenges for data processing in model training pipelines. A crucial aspect of these pipelines is dataset shuffling, which significantly improves unbiased learning and convergence accuracy by adhering to the principles of random sampling. However, loading shuffled data for...
Preprint
Full-text available
In the evolving landscape of neural network models, one prominent challenge stand out: the significant memory overheads associated with training expansive models. Addressing this challenge, this study delves deep into the Rotated Tensor Parallelism (RTP). R-P is an innovative approach that strategically focuses on memory deduplication in distribute...
Preprint
Full-text available
Over the last several years, the computation landscape for conducting data analytics has completely changed. While in the past, a lot of the activities have been undertaken in isolation by companies, and research institutions, today's infrastructure constitutes a wealth of services offered by a variety of providers that offer opportunities for reus...
Article
Full-text available
MLCommons is an effort to develop and improve the artificial intelligence (AI) ecosystem through benchmarks, public data sets, and research. It consists of members from start-ups, leading companies, academics, and non-profits from around the world. The goal is to make machine learning better for everyone. In order to increase participation by other...
Article
Full-text available
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding pr...
Preprint
Full-text available
The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process...
Conference Paper
Full-text available
Currently, AI and, in particular, deep learning play a major role in science, from data analytics and simulation surrogates to policy and system decisions. This role is likely to increase as ideas from early adopters spread across all academic fields. One can group the structure of "AI for Science" into a few patterns, where one needs to explore ex...
Conference Paper
Full-text available
With the significant development of the Internet of Things and low-cost cloud services, the sensory and data processing requirements of IoT systems are continually going up. TrustZone is a hardware-protected Trusted Execution Environment (TEE) for ARM processors specifically designed for IoT handheld systems. It provides memory isolation techniques...
Chapter
Full-text available
Data analytics has become a critical component in business, where organizations aim to harness their data for better insights and informed decision-making. This has led to the rise of data engineering, which focuses on constructing scalable and efficient data pipelines to handle data's increasing volume and complexity. High-performance data enginee...
Chapter
Full-text available
The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily influenced this transformation. However, most widely used serial Dataframes today (R, pandas) experience performa...
Article
Full-text available
Plain Language Summary The question of whether earthquake occurrence is random in time, or perhaps chaotic with order hidden in the chaos, is of major importance to the determination of risk from these events. It was shown many years ago that if aftershocks are removed from the earthquake catalogs, what remains are apparently events that occur at r...
Preprint
Full-text available
Neural networks (NNs) have proven to be a viable alternative to traditional direct numerical algorithms, with the potential to accelerate computational time by several orders of magnitude. In the present paper we study the use of encoder-decoder convolutional neural network (CNN) as surrogates for steady-state diffusion solvers. The construction of...
Preprint
Full-text available
The data engineering and data science community has embraced the idea of using Python & R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these applications are now essential in order to process terabytes of data. They can easily exceed the capabilities of a single machine, but also demand signifi...
Chapter
Full-text available
With machine learning (ML) becoming a transformative tool for science, the scientific community needs a clear catalogue of ML techniques, and their relative benefits on various scientific problems, if they were to make significant advances in science using AI. Although this comes under the purview of benchmarking, conventional benchmarking initiati...
Preprint
Full-text available
Data pre-processing is a fundamental component in any data-driven application. With the increasing complexity of data processing operations and volume of data, Cylon, a distributed dataframe system, is developed to facilitate data processing both as a standalone application and as a library, especially for Python applications. While Cylon shows pro...
Preprint
Full-text available
In this paper, we summarize our effort to create and utilize a simple framework to coordinate computational analytics tasks with the help of a workflow system. Our design is based on a minimalistic approach while at the same time allowing to access computational resources offered through the owner's computer, HPC computing centers, cloud resources,...
Article
Full-text available
Nowcasting is a term originating from economics, finance, and meteorology. It refers to the process of determining the uncertain state of the economy, markets or the weather at the current time by indirect means. In this paper, we describe a simple two‐parameter data analysis that reveals hidden order in otherwise seemingly chaotic earthquake seism...
Conference Paper
What could a research team accomplish if given access to the latest GPU hardware? A molecular dynamics research team could process protein-ligand tests using 100 GPUs on Delta in about 3.6 days, a task that would take their local lab a full year using their own GPU hardware. A team exploring the wonders of cosmic rays is now part of the multi-messe...
Preprint
Full-text available
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding pr...
Preprint
Full-text available
The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily influenced this transformation. However, most widely used serial Dataframes today (R, pandas) experience performa...
Preprint
Full-text available
Yes. Interval statistics have been used to conclude that major earthquakes are random events in time and cannot be anticipated or predicted. Machine learning is a powerful new technique that enhances our ability to understand the information content of earthquake catalogs. We show that catalogs contain significant information on current hazard and...
Preprint
Full-text available
Go to the ESSOAR preprint server https://www.essoar.org/doi/abs/10.1002/essoar.10510940.5
Article
Full-text available
We review previous approaches to nowcasting earthquakes and introduce new approaches based on deep learning using three distinct models based on recurrent neural networks and transformers. We discuss different choices for observables and measures presenting promising initial results for a region of Southern California from 1950–2020. Earthquake act...
Article
Full-text available
Deep learning has transformed the use of machine learning technologies for the analysis of large experimental datasets. In science, such datasets are typically generated by large-scale experimental facilities, and machine learning focuses on the identification of patterns, trends and anomalies to extract meaningful scientific insights from the data...
Article
Full-text available
Classical molecular dynamics simulations are based on solving Newton’s equations of motion. Using a small timestep, numerical integrators such as Verlet generate trajectories of particles as solutions to Newton’s equations. We introduce operators derived using recurrent neural networks that accurately solve Newton’s equations utilizing sequences of...
Article
Full-text available
The earthquake cycle of stress accumulation and release is associated with the elastic rebound hypothesis proposed by H.F. Reid following the M7.9 San Francisco earthquake of 1906. However, observing details of the actual values of time- and space-dependent tectonic stress is not possible at the present time. In two previous papers, we have propose...