Geoffrey Charles FoxUniversity of Virginia | UVa · Department of Computer Science
Geoffrey Charles Fox
Cambridge University PhD Physics
Applications of Deep Learning (for time series) and software systems to support integrated AI and data engineering.
About
1,501
Publications
339,755
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
28,559
Citations
Introduction
Fox has a Ph.D. in Theoretical Physics from Cambridge Univ. where he was Senior Wrangler. Biocomplexity Institute and Initiative and Computer Science Department, University of Virginia. He has supervised the Ph.D. of 75 students. He received the HPDC Achievement Award and the ACM - IEEE CS Ken Kennedy Award for Foundational contributions to parallel computing in 2019. He works on the interdisciplinary interface between computing and applications.
Additional affiliations
July 2001 - July 2023
July 1970 - August 1970
July 2000 - June 2001
Education
July 1964 - June 1967
September 1961 - June 1964
Publications
Publications (1,501)
With the approach of the High Luminosity Large Hadron Collider (HL-LHC) era set to begin particle collisions by the end of this decade, it is evident that the computational demands of traditional collision simulation methods are becoming increasingly unsustainable. Existing approaches, which rely heavily on first-principles Monte Carlo simulations...
Advancing the capabilities of earthquake nowcasting, the real-time forecasting of seismic activities, remains crucial for reducing casualties. This multifaceted challenge has recently gained attention within the deep learning domain, facilitated by the availability of extensive earthquake datasets. Despite significant advancements, the existing lit...
Deep learning has proven very promising for interpreting MRI in brain tumor diagnosis. However, deep learning models suffer from a scarcity of brain MRI datasets for effective training. Self-supervised learning (SSL) models provide data-efficient and remarkable solutions to limited dataset problems. Therefore, this paper introduces a generative SSL...
Particle collisions at accelerators such as the Large Hadron Collider, recorded and analyzed by experiments such as ATLAS and CMS, enable exquisite measurements of the Standard Model and searches for new phenomena. Simulations of collision events at these detectors have played a pivotal role in shaping the design of future experiments and analyzing...
This research is part of a systematic study of scientific time series. In the last three years, hundreds of papers and over fifty new deep-learning models have been described for time series models. These mainly focus on the key aspect of time dependence, whereas in some scientific time series, the situation is more complex with multiple locations,...
Redshift prediction is a fundamental task in astronomy, essential for understanding the expansion of the universe and determining the distances of astronomical objects. Accurate redshift prediction plays a crucial role in advancing our knowledge of the cosmos. Machine learning (ML) methods, renowned for their precision and speed, offer promising so...
3D object detection is critical for autonomous driving, leveraging deep learning techniques to interpret LiDAR data. The PointPillars architecture is a prominent model in this field, distinguished by its efficient use of LiDAR data. This study provides an analysis of enhancing the performance of PointPillars model under various dropout rates to add...
Advancing the capabilities of earthquake nowcasting, the real-time forecasting of seismic activities remains a crucial and enduring objective aimed at reducing casualties. This multifaceted challenge has recently gained attention within the deep learning domain, facilitated by the availability of extensive, long-term earthquake datasets. Despite si...
The data engineering and data science community has embraced the idea of using Python and R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these frameworks are now ever more important in order to process terabytes of data. They can easily exceed the capabilities of a single machine but also deman...
High-performance scientific simulations, important for comprehension of complex systems, encounter computational challenges especially when exploring extensive parameter spaces. There has been an increasing interest in developing deep neural networks (DNNs) as surrogate models capable of accelerating the simulations. However, existing approaches fo...
Earthquake nowcasting has been proposed as a means of tracking the change in large earthquake potential in a seismically active area. The method was developed using observable seismic data, in which probabilities of future large earthquakes can be computed using Receiver Operating Characteristic (ROC) methods. Furthermore, analysis of the Shannon i...
Earthquake nowcasting has been proposed as a means of tracking the change in large earthquake potential in a seismically active area. The method was developed using observable seismic data, in which probabilities of future large earthquakes can be computed using Receiver Operating Characteristic (ROC) methods. Furthermore, analysis of the Shannon i...
The pursuit of understanding fundamental particle interactions has reached unparalleled precision levels. Particle physics detectors play a crucial role in generating low-level object signatures that encode collision physics. However, simulating these particle collisions is a demanding task in terms of memory and computation which will be exasperat...
Deep learning datasets are expanding at an unprecedented pace, creating new challenges for data processing in model training pipelines. A crucial aspect of these pipelines is dataset shuffling, which significantly improves unbiased learning and convergence accuracy by adhering to the principles of random sampling. However, loading shuffled data for...
In the evolving landscape of neural network models, one prominent challenge stand out: the significant memory overheads associated with training expansive models. Addressing this challenge, this study delves deep into the Rotated Tensor Parallelism (RTP). R-P is an innovative approach that strategically focuses on memory deduplication in distribute...
Over the last several years, the computation landscape for conducting data analytics has completely changed. While in the past, a lot of the activities have been undertaken in isolation by companies, and research institutions, today's infrastructure constitutes a wealth of services offered by a variety of providers that offer opportunities for reus...
MLCommons is an effort to develop and improve the artificial intelligence (AI) ecosystem through benchmarks, public data sets, and research. It consists of members from start-ups, leading companies, academics, and non-profits from around the world. The goal is to make machine learning better for everyone. In order to increase participation by other...
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding pr...
The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process...
Currently, AI and, in particular, deep learning play a major role in science, from data analytics and simulation surrogates to policy and system decisions. This role is likely to increase as ideas from early adopters spread across all academic fields. One can group the structure of "AI for Science" into a few patterns, where one needs to explore ex...
With the significant development of the Internet of Things and low-cost cloud services, the sensory and data processing requirements of IoT systems are continually going up. TrustZone is a hardware-protected Trusted Execution Environment (TEE) for ARM processors specifically designed for IoT handheld systems. It provides memory isolation techniques...
Data analytics has become a critical component in business, where organizations aim to harness their data for better insights and informed decision-making. This has led to the rise of data engineering, which focuses on constructing scalable and efficient data pipelines to handle data's increasing volume and complexity. High-performance data enginee...
The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily influenced this transformation. However, most widely used serial Dataframes today (R, pandas) experience performa...
Plain Language Summary
The question of whether earthquake occurrence is random in time, or perhaps chaotic with order hidden in the chaos, is of major importance to the determination of risk from these events. It was shown many years ago that if aftershocks are removed from the earthquake catalogs, what remains are apparently events that occur at r...
Neural networks (NNs) have proven to be a viable alternative to traditional direct numerical algorithms, with the potential to accelerate computational time by several orders of magnitude. In the present paper we study the use of encoder-decoder convolutional neural network (CNN) as surrogates for steady-state diffusion solvers. The construction of...
The data engineering and data science community has embraced the idea of using Python & R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these applications are now essential in order to process terabytes of data. They can easily exceed the capabilities of a single machine, but also demand signifi...
With machine learning (ML) becoming a transformative tool for science, the scientific community needs a clear catalogue of ML techniques, and their relative benefits on various scientific problems, if they were to make significant advances in science using AI. Although this comes under the purview of benchmarking, conventional benchmarking initiati...
Data pre-processing is a fundamental component in any data-driven application. With the increasing complexity of data processing operations and volume of data, Cylon, a distributed dataframe system, is developed to facilitate data processing both as a standalone application and as a library, especially for Python applications. While Cylon shows pro...
In this paper, we summarize our effort to create and utilize a simple framework to coordinate computational analytics tasks with the help of a workflow system. Our design is based on a minimalistic approach while at the same time allowing to access computational resources offered through the owner's computer, HPC computing centers, cloud resources,...
Nowcasting is a term originating from economics, finance, and meteorology. It refers to the process of determining the uncertain state of the economy, markets or the weather at the current time by indirect means. In this paper, we describe a simple two‐parameter data analysis that reveals hidden order in otherwise seemingly chaotic earthquake seism...
What could a research team accomplish if given access to the latest GPU hardware? A molecular dynamics research team could process protein-ligand tests using 100 GPUs on Delta in about 3.6 days, a task that would take their local lab a full year using their own GPU hardware. A team exploring the wonders of cosmic rays is now part of the multi-messe...
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding pr...
The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily influenced this transformation. However, most widely used serial Dataframes today (R, pandas) experience performa...
https://www.essoar.org/doi/abs/10.1002/essoar.10512008.5
Yes. Interval statistics have been used to conclude that major earthquakes are random events in time and cannot be anticipated or predicted. Machine learning is a powerful new technique that enhances our ability to understand the information content of earthquake catalogs. We show that catalogs contain significant information on current hazard and...
Go to the ESSOAR preprint server
https://www.essoar.org/doi/abs/10.1002/essoar.10510940.5
We review previous approaches to nowcasting earthquakes and introduce new approaches based on deep learning using three distinct models based on recurrent neural networks and transformers. We discuss different choices for observables and measures presenting promising initial results for a region of Southern California from 1950–2020. Earthquake act...
Deep learning has transformed the use of machine learning technologies for the analysis of large experimental datasets. In science, such datasets are typically generated by large-scale experimental facilities, and machine learning focuses on the identification of patterns, trends and anomalies to extract meaningful scientific insights from the data...
Classical molecular dynamics simulations are based on solving Newton’s equations of motion. Using a small timestep, numerical integrators such as Verlet generate trajectories of particles as solutions to Newton’s equations. We introduce operators derived using recurrent neural networks that accurately solve Newton’s equations utilizing sequences of...
The earthquake cycle of stress accumulation and release is associated with the elastic rebound hypothesis proposed by H.F. Reid following the M7.9 San Francisco earthquake of 1906. However, observing details of the actual values of time- and space-dependent tectonic stress is not possible at the present time. In two previous papers, we have propose...