Jose M F Moura

Jose M F Moura
Carnegie Mellon University | CMU · Department of Electrical and Computer Engineering

DSc

About

751
Publications
97,611
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
29,374
Citations

Publications

Publications (751)
Preprint
Full-text available
Foundation models are now a major focus of leading technology organizations due to their ability to generalize across diverse tasks. Existing approaches for adapting foundation models to new applications often rely on Federated Learning (FL) and disclose the foundation model weights to clients when using it to initialize the global model. While the...
Preprint
Calculus of Variations is the mathematics of functional optimization, i.e., when the solutions are functions over a time interval. This is particularly important when the time interval is unknown like in minimum-time control problems, so that forward in time solutions are not possible. Calculus of Variations offers a robust framework for learning o...
Preprint
Peer-to-peer learning is an increasingly popular framework that enables beyond-5G distributed edge devices to collaboratively train deep neural networks in a privacy-preserving manner without the aid of a central server. Neural network training algorithms for emerging environments, e.g., smart cities, have many design considerations that are diffic...
Article
The year 2023 marked the 75th anniversary of the IEEE Signal Processing Society (SPS), which was founded in 1948 as the “Professional Group on Audio” of the Institute of Radio Engineers (IRE), becoming the first IEEE Society. (The IRE, founded in 1912 with a focus on radio and then electronics, together with the American Institute of Electrical Eng...
Article
This paper considers learning the hidden causal network of a linear networked dynamical system (NDS) from the time series data at some of its nodes -- partial observability. The dynamics of the NDS are driven by colored noise that generates spurious associations across pairs of nodes, rendering the problem much harder. To address the challenge of n...
Article
Directly parameterizing and learning gradients of functions has widespread significance, with specific applications in inverse problems, generative modeling, and optimal transport. This paper introduces gradient networks ( GradNets ): novel neural network architectures that parameterize gradients of various function classes. GradNets exhibit spe...
Article
This article reviews significant advances in networked signal and information processing (SIP), which have enabled in the last 25 years extending decision making and inference, optimization, control, and learning to the increasingly ubiquitous environments of distributed agents. As these interacting agents cooperate, new collective behaviors emerge...
Article
We study the problem of graph structure identification, i.e., of recovering the graph of dependencies among time series. We model these time series data as components of the state of linear stochastic networked dynamical systems. We assume partial observability, where the state evolution of only a subset of nodes comprising the network is observed....
Article
Signal processing (SP) excels at analyzing, processing, and inferring information defined over regular (first continuous, later discrete) domains such as time or space. Indeed, the last 75 years have shown how SP has made an impact in areas such as communications, acoustics, sensing, image processing, and control, to name a few. With the digitaliza...
Article
Signal processing (SP) is a “hidden” technology that has transformed the digital world and changed our lives in so many ways. The field of digital SP (DSP) took off in the mid-1960s, aided by the integrated circuit and increasing availability of digital computers. Since then, the field of DSP has grown tremendously and fueled groundbreaking advance...
Preprint
Full-text available
Graph signal processing (GSP) generalizes signal processing (SP) tasks to signals living on non-Euclidean domains whose structure can be captured by a weighted graph. Graphs are versatile, able to model irregular interactions, easy to interpret, and endowed with a corpus of mathematical results, rendering them natural candidates to serve as the bas...
Preprint
The paper presents the graph signal processing (GSP) companion model that naturally replicates the basic tenets of classical signal processing (DSP) for GSP. The companion model shows that GSP can be made equivalent to DSP 'plus' appropriate boundary conditions (bc) - this is shown under broad conditions and holds for arbitrary undirected or direct...
Preprint
Full-text available
While much effort has been devoted to deriving and studying effective convex formulations of signal processing problems, the gradients of convex functions also have critical applications ranging from gradient-based optimization to optimal transport. Recent works have explored data-driven methods for learning convex objectives, but learning their mo...
Preprint
Full-text available
The article reviews significant advances in networked signal and information processing, which have enabled in the last 25 years extending decision making and inference, optimization, control, and learning to the increasingly ubiquitous environments of distributed agents. As these interacting agents cooperate, new collective behaviors emerge from l...
Preprint
We study the problem of graph structure identification, i.e., of recovering the graph of dependencies among time series. We model these time series data as components of the state of linear stochastic networked dynamical systems. We assume partial observability, where the state evolution of only a subset of nodes comprising the network is observed....
Preprint
This paper introduces a $\textit{canonical}$ graph signal model defined by a $\textit{canonical}$ graph and a $\textit{canonical}$ shift, the $\textit{companion}$ graph and the $\textit{companion}$ shift. These are canonical because, under standard conditions, we show that any graph signal processing (GSP) model can be transformed into the canonica...
Article
Vertex based and spectral based GSP sampling has been studied recently. The literature recognizes that methods in one domain do not have a counterpart in the other domain. This paper shows that in fact one can develop a unified graph signal sampling theory with analogous interpretations in both domains just like sampling in traditional DSP. To achi...
Preprint
Few-shot classification aims at classifying categories of a novel task by learning from just a few (typically, 1 to 5) labelled examples. An effective approach to few-shot classification involves a prior model trained on a large-sample base domain, which is then finetuned over the novel few-shot task to yield generalizable representations. However,...
Article
Recounts the career and contributions of Peter Schultheiss.
Preprint
Forecasting graph-based time-dependent data has many practical applications. This task is challenging as models need not only to capture spatial dependency and temporal dependency within the data, but also to leverage useful auxiliary information for accurate predictions. In this paper, we analyze limitations of state-of-the-art models on dealing w...
Preprint
Full-text available
Datasets in the computer vision academic research community are primarily static. Once a dataset is accepted as a benchmark for a computer vision task, researchers working on this task will not alter it in order to make their results reproducible. At the same time, when exploring new tasks and new applications, datasets tend to be an ever changing...
Preprint
This paper focuses on finite-time in-network computation of linear transforms of distributed graph data. Finite-time transform computation problems are of interest in graph-based computing and signal processing applications in which the objective is to compute, by means of distributed iterative methods, various (linear) transforms of the data distr...
Preprint
The paper presents sampling in GSP as 1) linear operations (change of bases) between signal representations and 2) downsampling as linear shift invariant filtering and reconstruction (interpolation) as filtering, both in the spectral domain. To achieve this, it considers a spectral shift $M$ that leads to a spectral graph signal processing theory,...
Preprint
Unsupervised time series clustering is a challenging problem with diverse industrial applications such as anomaly detection, bio-wearables, etc. These applications typically involve small, low-power devices on the edge that collect and process real-time sensory signals. State-of-the-art time-series clustering methods perform some form of loss minim...
Preprint
Full-text available
Graph neural networks (GNNs) extend convolutional neural networks (CNNs) to graph-based data. A question that arises is how much performance improvement does the underlying graph structure in the GNN provide over the CNN (that ignores this graph structure). To address this question, we introduce edge entropy and evaluate how good an indicator it is...
Preprint
Full-text available
Deep learning has achieved great success in recognizing video actions, but the collection and annotation of training data are still laborious, which mainly lies in two aspects: (1) the amount of required annotated data is large; (2) temporally annotating the location of each action is time-consuming. Works such as few-shot learning or untrimmed vid...
Article
The augmented Lagrangian method (ALM) is a classical optimization tool that solves a given “difficult” (constrained) problem via finding solutions of a sequence of “easier” (often unconstrained) subproblems with respect to the original (primal) variable, wherein constraints satisfaction is controlled via the so-called dual variables. ALM is highly...
Article
The articles in this special section focus on graph signal processing. Generically, the networks that sustain our societies can be understood as complex systems formed by multiple nodes, where global network behavior arises from local interactions between connected nodes. More succinctly, a network or a graph can be defined as a structure that enco...
Article
Deep learning, particularly convolutional neural networks (CNNs), has yielded rapid, significant improvements in computer vision and related domains. But conventional deep learning architectures perform poorly when data have an underlying graph structure, as in social, biological, and many other domains. This article explores 1) how graph signal pr...
Preprint
Cross-domain few-shot learning (FSL) is proposed recently to transfer knowledge from general-domain known classes (e.g., ImageNet) to novel classes in other domains, and recognize novel classes with only few training samples. In this paper, we go further to define a more challenging scenario that transfers knowledge from general-domain known classe...
Preprint
Full-text available
Deep learning, particularly convolutional neural networks (CNNs), have yielded rapid, significant improvements in computer vision and related domains. But conventional deep learning architectures perform poorly when data have an underlying graph structure, as in social, biological, and many other domains. This paper explores 1) how graph signal pro...
Article
In graph signal processing (GSP), data dependencies are represented by a graph whose nodes label the data and the edges capture dependencies among nodes. The graph is represented by a weighted adjacency matrix $A$ that, in GSP, generalizes the Discrete Signal Processing (DSP) shift operator $z^{-1}$ . The (right) eigenvectors of the shift $A$...
Conference Paper
Full-text available
A feature-based model explanation denotes how much each input feature contributes to a model's output for a given data point. As the number of proposed explanation functions grows, we lack quantitative evaluation criteria to help practitioners know when to use which explanation function. This paper proposes quantitative evaluation criteria for feat...
Preprint
Few-shot learning (FSL) aims at recognizing novel classes given only few training samples, which still remains a great challenge for deep learning. However, humans can easily recognize novel classes with only few samples. A key component of such ability is the compositional recognition that human can perform, which has been well studied in cognitiv...
Preprint
Full-text available
A feature-based model explanation denotes how much each input feature contributes to a model's output for a given data point. As the number of proposed explanation functions grows, we lack quantitative evaluation criteria to help practitioners know when to use which explanation function. This paper proposes quantitative evaluation criteria for feat...
Preprint
Full-text available
Graph convolutional neural networks (GCNNs) are a powerful extension of deep learning techniques to graph-structured data problems. We empirically evaluate several pooling methods for GCNNs, and combinations of those graph pooling methods with three different architectures: GCN, TAGCN, and GraphSAGE. We confirm that graph pooling, especially DiffPo...
Preprint
The article discusses distributed gradient-descent algorithms for computing local and global minima in nonconvex optimization. For local optimization, we focus on distributed stochastic gradient descent (D-SGD)---a simple network-based variant of classical SGD. We discuss local minima convergence guarantees and explore the simple but critical role...
Conference Paper
Full-text available
Spatial and time-dependent data is of interest in many applications. This task is difficult due to its complex spatial dependency, long-range temporal dependency, data non-stationarity, and data heterogeneity. To address these challenges, we propose Fore-caster, a graph Transformer architecture. Specifically, we start by learning the structure of t...
Preprint
In Graph Signal Processing (GSP), data dependencies are represented by a graph whose nodes label the data and the edges capture dependencies among nodes. The graph is represented by a weighted adjacency matrix $A$ that, in GSP, generalizes the Discrete Signal Processing (DSP) shift operator $z^{-1}$. The (right) eigenvectors of the shift $A$ (graph...
Preprint
The augmented Lagrangian method (ALM) is a classical optimization tool that solves a given "difficult" (constrained) problem via finding solutions of a sequence of "easier"(often unconstrained) sub-problems with respect to the original (primal) variable, wherein constraints satisfaction is controlled via the so-called dual variables. ALM is highly...
Preprint
To analyze data supported by arbitrary graphs G, DSP has been extended to Graph Signal Processing (GSP) by redefining traditional DSP concepts like shift, filtering, and Fourier transform among others. This paper revisits modulation, convolution, and sampling of graph signals as appropriate natural extensions of the corresponding DSP concepts. To d...
Preprint
This paper studies the resilient distributed recovery of large fields under measurement attacks, by a team of agents, where each measures a small subset of the components of a large spatially distributed field. An adversary corrupts some of the measurements. The agents collaborate to process their measurements, and each is interested in recovering...
Preprint
Full-text available
Explainable machine learning seeks to provide various stakeholders with insights into model behavior via feature importance scores, counterfactual explanations, and influential samples, among other techniques. Recent advances in this line of work, however, have gone without surveys of how organizations are using these techniques in practice. This s...
Preprint
Spatial and time-dependent data is of interest in many applications. This task is difficult due to its complex spatial dependency, long-range temporal dependency, data non-stationarity, and data heterogeneity. To address these challenges, we propose Forecaster, a graph Transformer architecture. Specifically, we start by learning the structure of th...
Article
This paper studies resilient distributed estimation under measurement attacks. A set of agents each makes successive local, linear, noisy measurements of an unknown vector field collected in a vector parameter. The local measurement models are heterogeneous across agents and may be locally unobservable for the unknown parameter. An adversary compro...
Conference Paper
Datasets in the computer vision academic research community are primarily static. Once a dataset is accepted as a benchmark for a computer vision task, researchers working on this task will not alter it in order to make their results reproducible. At the same time, when exploring new tasks and new applications, datasets tend to be an ever changing...
Preprint
Full-text available
The paper considers a distributed algorithm for global minimization of a nonconvex function. The algorithm is a first-order consensus + innovations type algorithm that incorporates decaying additive Gaussian noise for annealing, converging to the set of global minima under certain technical assumptions. The paper presents simple methods for verifyi...
Article
Full-text available
Developing human-machine trust is a prerequisite for adoption of machine learning systems in decision critical settings (e.g healthcare and governance). Users develop appropriate trust in these systems when they understand how the systems make their decisions. Interpretability not only helps users understand what a system learns but also helps user...
Preprint
We study resilient distributed field estimation under measurement attacks. A network of agents or devices measures a large, spatially distributed physical field parameter. An adversary arbitrarily manipulates the measurements of some of the agents. Each agent's goal is to process its measurements and information received from its neighbors to estim...
Preprint
Full-text available
The paper proves convergence to global optima for a class of distributed algorithms for nonconvex optimization in network-based multi-agent settings. Agents are permitted to communicate over a time-varying undirected graph. Each agent is assumed to possess a local objective function (assumed to be smooth, but possibly nonconvex). The paper consider...
Preprint
Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image, using the conversation history as context. It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks in isolation on large, real datasets is infeasible as it requires prohibitively-expensive complete annotation o...
Preprint
Full-text available
In this paper, we present a new approach to interpreting deep learning models. More precisely, by coupling mutual information with network science, we explore how information flows through feed forward networks. We show that efficiently approximating mutual information via the dual representation of Kullback-Leibler divergence allows us to create a...
Preprint
Full-text available
Current approaches for explaining machine learning models fall into two distinct classes: antecedent event influence and value attribution. The former leverages training instances to describe how much influence a training point exerts on a test point, while the latter attempts to attribute value to the features most pertinent to a given prediction....
Preprint
This paper studies resilient distributed estimation under measurement attacks. A set of agents each makes successive local, linear, noisy measurements of an unknown vector field collected in a vector parameter. The local measurement models are heterogeneous across agents and may be locally unobservable for the unknown parameter. An adversary compro...
Article
This paper studies multi-agent distributed estimation under sensor attacks. Individual agents make sensor measurements of an unknown parameter belonging to a compact set, and, at every time step, a fraction of the agents' sensor measurements may fall under attack and take arbitrary values. We present the Saturated Innovation Update (SIU) algorithm...
Article
Full-text available
In this paper, we address the question of how to automatically map computational kernels to highly efficient code for a wide range of computing platforms and establish the correctness of the synthesized code. More specifically, we focus on two fundamental problems that software developers are faced with: performance portability across the ever-chan...
Article
Computer architectures and systems are becoming ever more powerful but increasingly more complex. With the end of frequency scaling (about 2004) and the era of multicores/manycores/accelerators, it is exceedingly hard to extract the promised performance, in particular, at a reasonable energy budget. Only highly trained and educated experts can hope...
Chapter
Visual dialog entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog, visual dialog encompasses several more. We focus on one such problem called visual coreference resolution that involves determinin...
Chapter
Human motion prediction, forecasting human motion in a few milliseconds conditioning on a historical 3D skeleton sequence, is a long-standing problem in computer vision and robotic vision. Existing forecasting algorithms rely on extensive annotated motion capture data and are brittle to novel actions. This paper addresses the problem of few-shot hu...

Network