Moshe Eliasof’s research while affiliated with University of Cambridge and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (55)


Learning Regularization for Graph Inverse Problems
  • Article

April 2025

·

3 Reads

Proceedings of the AAAI Conference on Artificial Intelligence

Moshe Eliasof

·

Md Shahriar Rahim Siddiqui

·

·

Eldad Haber

In recent years, Graph Neural Networks (GNNs) have been utilized for various applications ranging from drug discovery to network design and social networks. In many applications, it is impossible to observe some properties of the graph directly; instead, noisy and indirect measurements of these properties are available. These scenarios are coined as Graph Inverse Problems (GRIPs). In this work, we introduce a framework leveraging GNNs to solve GRIPs. The framework is based on a combination of likelihood and prior terms, which are used to find a solution that fits the data while adhering to learned prior information. Specifically, we propose to combine recent deep learning techniques that were developed for inverse problems, together with GNN architectures, to formulate and solve GRIPs. We study our approach on a number of representative problems that demonstrate the effectiveness of the framework.


On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems

April 2025

·

5 Reads

Proceedings of the AAAI Conference on Artificial Intelligence

Alessio Gravina

·

Moshe Eliasof

·

·

[...]

·

A common problem in Message-Passing Neural Networks is oversquashing -- the limited ability to facilitate effective information flow between distant nodes. Oversquashing is attributed to the exponential decay in information transmission as node distances increase. This paper introduces a novel perspective to address oversquashing, leveraging dynamical systems properties of global and local non-dissipativity, that enable the maintenance of a constant information flow rate. We present SWAN, a uniquely parameterized GNN model with antisymmetry both in space and weight domains, as a means to obtain non-dissipativity. Our theoretical analysis asserts that by implementing these properties, SWAN offers an enhanced ability to transmit information over extended distances. Empirical evaluations on synthetic and real-world benchmarks that emphasize long-range interactions validate the theoretical understanding of SWAN, and its ability to mitigate oversquashing.


FLASH: Flexible Learning of Adaptive Sampling from History in Temporal Graph Neural Networks

April 2025

·

3 Reads

Aggregating temporal signals from historic interactions is a key step in future link prediction on dynamic graphs. However, incorporating long histories is resource-intensive. Hence, temporal graph neural networks (TGNNs) often rely on historical neighbors sampling heuristics such as uniform sampling or recent neighbors selection. These heuristics are static and fail to adapt to the underlying graph structure. We introduce FLASH, a learnable and graph-adaptive neighborhood selection mechanism that generalizes existing heuristics. FLASH integrates seamlessly into TGNNs and is trained end-to-end using a self-supervised ranking loss. We provide theoretical evidence that commonly used heuristics hinders TGNNs performance, motivating our design. Extensive experiments across multiple benchmarks demonstrate consistent and significant performance improvements for TGNNs equipped with FLASH.


Towards Efficient Training of Graph Neural Networks: A Multiscale Approach

March 2025

·

10 Reads

Graph Neural Networks (GNNs) have emerged as a powerful tool for learning and inferring from graph-structured data, and are widely used in a variety of applications, often considering large amounts of data and large graphs. However, training on such data requires large memory and extensive computations. In this paper, we introduce a novel framework for efficient multiscale training of GNNs, designed to integrate information across multiscale representations of a graph. Our approach leverages a hierarchical graph representation, taking advantage of coarse graph scales in the training process, where each coarse scale graph has fewer nodes and edges. Based on this approach, we propose a suite of GNN training methods: such as coarse-to-fine, sub-to-full, and multiscale gradient computation. We demonstrate the effectiveness of our methods on various datasets and learning tasks.


Iterative Flow Matching -- Path Correction and Gradual Refinement for Enhanced Generative Modeling
  • Preprint
  • File available

February 2025

·

12 Reads

Generative models for image generation are now commonly used for a wide variety of applications, ranging from guided image generation for entertainment to solving inverse problems. Nonetheless, training a generator is a non-trivial feat that requires fine-tuning and can lead to so-called hallucinations, that is, the generation of images that are unrealistic. In this work, we explore image generation using flow matching. We explain and demonstrate why flow matching can generate hallucinations, and propose an iterative process to improve the generation process. Our iterative process can be integrated into virtually any\textit{any} generative modeling technique, thereby enhancing the performance and robustness of image synthesis systems.

Download

Towards Invariance to Node Identifiers in Graph Neural Networks

February 2025

·

8 Reads

Message-Passing Graph Neural Networks (GNNs) are known to have limited expressive power, due to their message passing structure. One mechanism for circumventing this limitation is to add unique node identifiers (IDs), which break the symmetries that underlie the expressivity limitation. In this work, we highlight a key limitation of the ID framework, and propose an approach for addressing it. We begin by observing that the final output of the GNN should clearly not depend on the specific IDs used. We then show that in practice this does not hold, and thus the learned network does not possess this desired structural property. Such invariance to node IDs may be enforced in several ways, and we discuss their theoretical properties. We then propose a novel regularization method that effectively enforces ID invariance to the network. Extensive evaluations on both real-world and synthetic tasks demonstrate that our approach significantly improves ID invariance and, in turn, often boosts generalization performance.


DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations

February 2025

·

25 Reads

Pre-trained Vision Transformers now serve as powerful tools for computer vision. Yet, efficiently adapting them for multiple tasks remains a challenge that arises from the need to modify the rich hidden representations encoded by the learned weight matrices, without inducing interference between tasks. Current parameter-efficient methods like LoRA, which apply low-rank updates, force tasks to compete within constrained subspaces, ultimately degrading performance. We introduce DiTASK a novel Diffeomorphic Multi-Task Fine-Tuning approach that maintains pre-trained representations by preserving weight matrix singular vectors, while enabling task-specific adaptations through neural diffeomorphic transformations of the singular values. By following this approach, DiTASK enables both shared and task-specific feature modulations with minimal added parameters. Our theoretical analysis shows that DITASK achieves full-rank updates during optimization, preserving the geometric structure of pre-trained features, and establishing a new paradigm for efficient multi-task learning (MTL). Our experiments on PASCAL MTL and NYUD show that DiTASK achieves state-of-the-art performance across four dense prediction tasks, using 75% fewer parameters than existing methods.


On the Effectiveness of Random Weights in Graph Neural Networks

January 2025

·

7 Reads

Graph Neural Networks (GNNs) have achieved remarkable success across diverse tasks on graph-structured data, primarily through the use of learned weights in message passing layers. In this paper, we demonstrate that random weights can be surprisingly effective, achieving performance comparable to end-to-end training counterparts, across various tasks and datasets. Specifically, we show that by replacing learnable weights with random weights, GNNs can retain strong predictive power, while significantly reducing training time by up to 6×\times and memory usage by up to 3×\times. Moreover, the random weights combined with our construction yield random graph propagation operators, which we show to reduce the problem of feature rank collapse in GNNs. These understandings and empirical results highlight random weights as a lightweight and efficient alternative, offering a compelling perspective on the design and training of GNN architectures.


Figure 1: Illustration of our Multiscale-SGD, introduced in Section 2.
Figure 2: Illustration of the Full Multiscale Training described in Section 4.
Figure 5: Convergence of different methods as a function of work-units
Figure 7: A comparison of different network predictions for Single Scale, Multiscale, and FullMultiscale for an image from the STL10 dataset. The first two columns display the original image and data (same for all rows), followed by results from UNet, ResNet, MFC-UNet, and MFC-ResNet.
Figure 8: A comparison of different network predictions for Single Scale, Multiscale, and FullMultiscale for an image from the STL10 dataset. The first two columns display the original image and data (same for all rows), followed by results from UNet, ResNet, MFC-UNet, and MFC-ResNet.

+3

Multiscale Training of Convolutional Neural Networks

January 2025

·

10 Reads

Convolutional Neural Networks (CNNs) are the backbone of many deep learning methods, but optimizing them remains computationally expensive. To address this, we explore multiscale training frameworks and mathematically identify key challenges, particularly when dealing with noisy inputs. Our analysis reveals that in the presence of noise, the gradient of standard CNNs in multiscale training may fail to converge as the mesh-size approaches to , undermining the optimization process. This insight drives the development of Mesh-Free Convolutions (MFCs), which are independent of input scale and avoid the pitfalls of traditional convolution kernels. We demonstrate that MFCs, with their robust gradient behavior, ensure convergence even with noisy inputs, enabling more efficient neural network optimization in multiscale settings. To validate the generality and effectiveness of our multiscale training approach, we show that (i) MFCs can theoretically deliver substantial computational speedups without sacrificing performance in practice, and (ii) standard convolutions benefit from our multiscale training framework in practice.


GRAMA: Adaptive Graph Autoregressive Moving Average Models

January 2025

·

18 Reads

Graph State Space Models (SSMs) have recently been introduced to enhance Graph Neural Networks (GNNs) in modeling long-range interactions. Despite their success, existing methods either compromise on permutation equivariance or limit their focus to pairwise interactions rather than sequences. Building on the connection between Autoregressive Moving Average (ARMA) and SSM, in this paper, we introduce GRAMA, a Graph Adaptive method based on a learnable Autoregressive Moving Average (ARMA) framework that addresses these limitations. By transforming from static to sequential graph data, GRAMA leverages the strengths of the ARMA framework, while preserving permutation equivariance. Moreover, GRAMA incorporates a selective attention mechanism for dynamic learning of ARMA coefficients, enabling efficient and flexible long-range information propagation. We also establish theoretical connections between GRAMA and Selective SSMs, providing insights into its ability to capture long-range dependencies. Extensive experiments on 14 synthetic and real-world datasets demonstrate that GRAMA consistently outperforms backbone models and performs competitively with state-of-the-art methods.


Citations (18)


... noisy, incomplete, or compressed data) by incorporating them into the denoising process. Nonetheless, it has been shown in Eliasof et al. (2024) that pre-trained diffusion models that are used for ill-posed inverse problems as regularizers tend to under-perform as compared to the models that are trained specifically on a particular inverse problem. In particular, such models tend to break when the noise level is not very low. ...

Reference:

DAWN-SI: Data-Aware and Noise-Informed Stochastic Interpolation for Solving Inverse Problems
An over complete deep learning method for inverse problems

Foundations of Data Science

... This approach improves both classification accuracy and model interpretability. • GLGCNII [28], proposed by Eliasof et al. in 2024, introduces global information such as class-wise features to improve classification performance. The method learns class prototypes by maximizing similarity between node features and their corresponding class features while increasing dissimilarity across different classes. ...

Global-local graph neural networks for node-classification
  • Citing Article
  • June 2024

Pattern Recognition Letters

... Numerical methods for the multidimensional CDR equation also constitute a major research area given the equation's utility spanning such a vast range of transport phenomena critical to climate modeling, energy systems, biomedical systems, materials synthesis, and related domains central to technology innovation [15][16][17]. Another approach to solving CDR equations is to use graph neural networks and deep neural networks [18][19][20]. ...

Feature Transportation Improves Graph Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence

... Let us emphasize that DNNs are typically nonregularizing methods. See, for example, [26,34,58] and references therein. ...

DRIP: deep regularizers for inverse problems

... The idea of replacing BVP with an IVP is not new and is the backbone of many so-called "shooting methods" [4]. Here follow an approach similar to [20] to solve the shooting problem. We learn the (nonlinear) mapping that maps the terminal condition to the initial condition. ...

Estimating a Potential Without the Agony of the Partition Function
  • Citing Article
  • November 2023

SIAM Journal on Mathematics of Data Science

... Graph pooling can be performed in various ways, as reviewed in Liu et al. (2022). Popular pooling techniques include DiffPool (Ying et al., 2018) MaxCutPool (Abate & Bianchi, 2024), graph wavelet compression Eliasof et al. (2023), as well as Independent-Set Pooling (Stanovic et al., 2024), all aiming to produce a coarsened a graph that bears similar properties to the original graph. Wang et al. (2020) and Wang et al. (2024) introduce the concept of subgraph pooling to GNNs. ...

Haar Wavelet Feature Compression for Quantized Graph Convolutional Networks
  • Citing Article
  • June 2023

IEEE Transactions on Neural Networks and Learning Systems

... Some of the tasks addressed by ML are related to geologic/geophysical interpretation of seismic volumes (Li, 2018;Zhao, 2018;Pham et al., 2019;Wu et al., 2019a;Alfarhan et al., 2020;Liu et al., 2020), fault mapping (Wu et al., 2019b), and well-log analysis (Chen and Zhang, 2020;Pham et al., 2020;Feng, 2021). Various applications support seismic processing such as first-break picking (Tsai et al., 2018;Zwartjes and Yoo, 2022), ground roll subtraction (Kaur et al., 2020;Pham and Li, 2022), deblending (Sun et al., 2022), denoising (Richardson and Feller, 2019), deblurring (Eliasof et al., 2023), acoustic impedance inversion , and seismic event detection and localization . Electromagnetic (EM) and potential field techniques also use artificial neural networks (ANN) for data denoising and inversion (Puzyrev, 2019;Moghadas, 2020;Wu et al., 2020) and for reservoir monitoring applications (Colombo et al., 2020a;Yang et al., 2022). ...

DRIP: Deep Regularizers for Inverse Problems

... An alternating stack of MgNet blocks and poolings can be viewed as the left leg of an MG V -cycle. Another hierarchical structure, however in the channel dimension, is proposed in (Eliasof et al., 2020). Their building block, termed multigrid-in-channels (MGIC), is built on grouped convolutions and coarsening via channel pooling. ...

MGIC: Multigrid-in-Channels Neural Network Architectures
  • Citing Article
  • February 2023

SIAM Journal on Scientific Computing

... Therefore, Zhang et al. [36] introduced the differentiable Superpixel Sampling Networks (SSN) [37], which successfully integrates with a mixhop GCN for hyperspectral image classification. Meanwhile, Eliasof et al. [38] utilized an unsupervised CNN [39] for superpixel segmentation, which was jointly trained with GNNs for unsupervised natural image semantic segmentation. While the above two methods overcome the end-to-end integration issue in hyperspectral and natural images, they do not belong to the HRS images we focus on. ...

Unsupervised Image Semantic Segmentation Through Superpixels and Graph Neural Networks
  • Citing Article
  • January 2022

SSRN Electronic Journal

... As many deep learning-based supervised methods rely on labeled data with limited generalization, unsupervised superpixel extraction is more scalable. Eliasof [24] suggested increasing similarity between soft superpixels and input images, emphasizing edges, and using atrous convolution for multi-scale processing. MaxCov-merge [25] employs Bayesian decision principles and heuristic strategies for fast, globally optimal superpixel region merging. ...

Rethinking Unsupervised Neural Superpixel Segmentation
  • Citing Conference Paper
  • October 2022