
Raja Giryes- PhD
- PostDoc Position at Duke University
Raja Giryes
- PhD
- PostDoc Position at Duke University
About
252
Publications
51,288
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,398
Citations
Introduction
Raja Giryes is a senior lecturer in the school of electrical engineering at Tel Aviv University. He received the B.Sc (2007), M.Sc. (supervision by Prof. M. Elad and Prof. Y. C. Eldar, 2009), and PhD (supervision by Prof. M. Elad 2014) degrees from the Department of Computer Science, The Technion - Israel Institute of Technology, Haifa. Raja was a postdoc at the computer science department at the Technion (Nov. 2013 till July 2014) and at the lab of Prof. G. Sapiro at Duke University, Durham, USA (July 2014 and Aug. 2015). His research interests lie at the intersection between signal and image processing and machine learning, and in particular, in deep learning, inverse problems, sparse representations, and signal and image modeling.
Current institution
Publications
Publications (252)
Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generaliza...
In recent years, machine learning and deep neural networks applications have experienced a remarkable surge in the field of physics, with optics being no exception. This tutorial aims to offer a fundamental introduction to the utilization of deep learning in optics, catering specifically to newcomers. Within this tutorial, we cover essential concep...
Mammography is commonly used as an imaging technique in breast cancer screening but comes with the disadvantage of a high overdiagnosis rate and low sensitivity in dense tissue. dynamic contrast enhanced (DCE)-magnetic resonance imaging (MRI) features higher sensitivity but requires time consuming dynamic imaging and injection of contrast media, li...
p>The article focuses on the problem of detecting transmitted data over an additive Gaussian noise channel. We propose an integration of various deep-learning architectures with the classical Viterbi algorithm for channel estimation and equalization, to improve the estimation performance.
The goal is to design communication schemes that are oblivio...
p>The article focuses on the problem of detecting transmitted data over an additive Gaussian noise channel. We propose an integration of various deep-learning architectures with the classical Viterbi algorithm for channel estimation and equalization, to improve the estimation performance.
The goal is to design communication schemes that are oblivio...
Video reconstruction from a single motion-blurred image is a challenging problem, which can enhance the capabilities of existing cameras. Recently, several works addressed this task using conventional imaging and deep learning. Yet, such purely digital methods are inherently limited, due to direction ambiguity and noise sensitivity. Some works atte...
As neural networks become deeper, the redundancy within their parameters increases. This phenomenon has led to several methods that attempt to reduce the correlation between convolutional filters. We propose a computationally efficient regularization technique that encourages orthonormality between groups of filters within the same layer. Our exper...
This paper introduces novel supervised learning techniques for real-time selection (triggering) of hadronically decaying tau leptons in proton-proton colliders. By implementing classic machine learning decision trees and advanced deep learning models, such as Multi-Layer Perceptron or residual NN, visible improvements in performance compared to sta...
Transfer learning is a valuable tool in deep learning as it allows propagating information from one "source dataset" to another "target dataset", especially in the case of a small number of training examples in the latter. Yet, discrepancies between the underlying distributions of the source and target data are commonplace and are known to have a s...
The recently introduced Segment Anything Model (SAM) combines a clever architecture and large quantities of training data to obtain remarkable image segmentation capabilities. However, it fails to reproduce such results for Out-Of-Distribution (OOD) domains such as medical images. Moreover, while SAM is conditioned on either a mask or a set of poin...
We present SENS, a novel method for generating and editing 3D models from hand-drawn sketches, including those of an abstract nature. Our method allows users to quickly and easily sketch a shape, and then maps the sketch into the latent space of a part-aware neural implicit shape architecture. SENS analyzes the sketch and encodes its parts into ViT...
Vision and Language (VL) models offer an effective method for aligning representation spaces of images and text, leading to numerous applications such as cross-modal retrieval, visual question answering, captioning, and more. However, the aligned image-text spaces learned by all the popular VL models are still suffering from the so-called `object b...
The lottery ticket hypothesis (LTH) has increased attention to pruning neural networks at initialization. We study this problem in the linear setting. We show that finding a sparse mask at initialization is equivalent to the sketching problem introduced for efficient matrix multiplication. This gives us tools to analyze the LTH problem and gain ins...
In recent years, Denoising Diffusion Probabilistic Models (DDPM) have caught significant attention. By composing a Markovian process that starts in the data domain and then gradually adds noise until reaching pure white noise, they achieve superior performance in learning data distributions. Yet, these models require a large number of diffusion ste...
This work bridges two important concepts: the Neural Tangent Kernel (NTK), which captures the evolution of deep neural networks (DNNs) during training, and the Neural Collapse (NC) phenomenon, which refers to the emergence of symmetry and structure in the last-layer features of well-trained classification DNNs. We adopt the natural assumption that...
Label-efficient and reliable semantic segmentation is essential for many real-life applications, especially for industrial settings with high visual diversity, such as waste sorting. In industrial waste sorting, one of the biggest challenges is the extreme diversity of the input stream depending on factors like the location of the sorting facility,...
Recent breakthroughs in text-guided image generation have led to remarkable progress in the field of 3D synthesis from text. By optimizing neural radiance fields (NeRF) directly from text, recent methods are able to produce remarkable results. Yet, these methods are limited in their control of each object's placement or appearance, as they represen...
Thanks to the tremendous interest from the research community, the focus of the March issue of the
IEEE Signal Processing Magazine
is on the second volume of the special issue on physics-driven machine learning for computational imaging, which brings together nine articles of the 19 accepted papers from the original 47 submissions.
An autoencoder is a specific type of a neural network, which is mainly designed to encode the input into a compressed and meaningful representation and then decode it back such that the reconstructed input is similar as possible to the original one. This chapter surveys the different types of autoencoders that are mainly used today. It also describ...
Deep learning (DL) has made a major impact on data science in the last decade. This chapter introduces the basic concepts of this field. It includes both the basic structures used to design deep neural networks and a brief survey of some of its popular use-cases.
Generative adversarial networks (GANs) are very popular frameworks for generating high-quality data and are immensely used in both the academia and industry in many domains. Arguably, their most substantial impact has been in the area of computer vision, where they achieve state-of-the-art image generation. This chapter gives an introduction to GAN...
In this paper, we present TEXTure, a novel method for text-guided generation, editing, and transfer of textures for 3D shapes. Leveraging a pretrained depth-to-image diffusion model, TEXTure applies an iterative scheme that paints a 3D model from different viewpoints. Yet, while depth-to-image models can create plausible textures from a single view...
Recent years have witnessed a rapidly growing interest in next-generation imaging systems and their combination with machine learning. While model-based imaging schemes that incorporate physics-based forward models, noise models, and image priors laid the foundation in the emerging field of computational sensing and imaging, recent advances in mach...
Nowadays, many of the images captured are ‘observed’ by machines only and not by humans, e.g., in autonomous systems. High-level machine vision models, such as object recognition or semantic segmentation, assume images are transformed into some canonical image space by the camera Image Signal Processor (ISP). However, the camera ISP is optimized fo...
The prominent success of neural networks, mainly in computer vision tasks, is increasingly shadowed by their sensitivity to small, barely perceivable adversarial perturbations in image input. In this work, we aim at explaining this vulnerability through the framework of sparsity. We show the connection between adversarial attacks and sparse represe...
In recent years, denoising diffusion models have demonstrated outstanding image generation performance. The information on natural images captured by these models is useful for many image reconstruction applications, where the task is to restore a clean image from its degraded observations. In this work, we propose a conditional sampling scheme tha...
Animals navigate using various sensory information to guide their movement. Miniature tracking devices now allow documenting animals’ routes with high accuracy. Despite this detailed description of animal movement, how animals translate sensory information to movement is poorly understood. Recent machine learning advances now allow addressing this...
Despite recent advances in geometric modelling, 3D mesh modelling still involves a considerable amount of manual labour by experts. In this paper, we introduce Mesh Draping: a neural method for transferring existing mesh structure from one shape to another. The method drapes the source mesh over the target geometry and at the same time seeks to pre...
In Deep Image Prior (DIP), a Convolutional Neural Network (CNN) is fitted to map a latent space to a degraded (e.g. noisy) image but in the process learns to reconstruct the clean image. This phenomenon is attributed to CNN's internal image-prior. We revisit the DIP framework, examining it from the perspective of a neural implicit representation. M...
The goal of Anomaly-Detection (AD) is to identify outliers, or outlying regions, from some unknown distribution given only a set of positive (good) examples. Few-Shot AD (FSAD) aims to solve the same task with a minimal amount of normal examples. Recent embedding-based methods, that compare the embedding vectors of queries to a set of reference emb...
Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks. However, some aspects of complex language understanding still remain a challenge. We introduce the collective notion of Structured Vision&Language Concepts (SVLC) which includes object attributes, relations, and states which are present in the...
Text-guided image generation has progressed rapidly in recent years, inspiring major breakthroughs in text-guided shape generation. Recently, it has been shown that using score distillation, one can successfully text-guide a NeRF model to generate a 3D object. We adapt the score distillation to the publicly available, and computationally efficient,...
The increasing availability of video recordings made by multiple cameras has offered new means for mitigating occlusion and depth ambiguities in pose and motion reconstruction methods. Yet, multi-view algorithms strongly depend on camera parameters, particularly on relative transformations between the cameras. Such a dependency becomes a hurdle onc...
Generative adversarial networks (GANs) and clustering algorithms are both very popular unsupervised methodologies in machine learning. In this work, we propose a novel strategy that uses GANs to improve clustering and vice verse. We start by providing a simple but yet powerful scheme for improving clustering methods that rely on the latent space of...
Overparameterization in deep learning typically refers to settings where a trained Neural Network (NN) has representational capacity to fit the training data in many ways, some of which generalize well, while others do not. In the case of Recurrent Neural Networks (RNNs), there exists an additional layer of overparameterization, in the sense that a...
Diffusion models are a class of generative models, showing superior performance as compared to other generative models in creating realistic images when trained on natural image datasets. We introduce DISPR, a diffusion-based model for solving the inverse problem of three-dimensional (3D) cell shape prediction from two-dimensional (2D) single cell...
Purpose
To construct an automatic machine-learning derived algorithm discriminating between normal corneas and suspect irregular or keratoconic corneas.
Methods
A total of 8526 corneal tomography images of 4904 eyes obtained between November 2010 and July 2017 using a combined Scheimpflug/Placido tomographer were retrospectively evaluated. Each im...
In this work, we suggest Kernel Filtering Linear Overparameterization (KFLO), where a linear cascade of filtering layers is used during training to improve network performance in test time. We implement this cascade in a kernel filtering fashion, which prevents the trained architecture from becoming unnecessarily deeper. This also allows using our...
Neural implicit fields are quickly emerging as an attractive representation for learning based techniques. However, adopting them for 3D shape modeling and editing is challenging. We introduce a method for E diting I mplicit S hapes T hrough P art A ware G enera T ion, permuted in short as SPAGHETTI. Our architecture allows for manipulation of impl...
The recent success of learning-based algorithms can be greatly attributed to the immense amount of annotated data used for training. Yet, many datasets lack annotations due to the high costs associated with labeling, resulting in degraded performances of deep learning methods. Self-supervised learning is frequently adopted to mitigate the reliance...
Learning from one or few visual examples is one of the key capabilities of humans since early infancy, but is still a significant challenge for modern AI systems. While considerable progress has been achieved in few-shot learning from a few image examples, much less attention has been given to the verbal descriptions that are usually provided to in...
Member inference (MI) attacks aim to determine if a specific data sample was used to train a machine learning model. Thus, MI is a major privacy threat to models trained on private sensitive data, such as medical records. In MI attacks one may consider the black-box settings, where the model's parameters and activations are hidden from the adversar...
We present a technique for visualizing point clouds using a neural network. Our technique allows for an instant preview of any point cloud, and bypasses the notoriously difficult surface reconstruction problem or the need to estimate oriented normals for splat‐based rendering. We cast the preview problem as a conditional image‐to‐image translation...
In a previous paper, we introduced a deep learning neural network that should be able to detect the existence of very shallow periodic planetary transits in the presence of red noise. The network in that feasibility study would not provide any further details about the detected transits. The current paper completes this missing part. We present a n...
Using synthetic data for training neural networks that achieve good performance on real-world data is an important task as it has the potential to reduce the need for costly data annotation. Yet, a network that is trained on synthetic data alone does not perform well on real data due to the domain gap between the two. Reducing this gap, also known...
Point cloud registration (PCR) is an important task in many fields including autonomous driving with LiDAR sensors. PCR algorithms have improved significantly in recent years, by combining deep-learned features with robust estimation methods. These algorithms succeed in scenarios such as indoor scenes and object models registration. However, testin...
We study the 2-D super-resolution multi-reference alignment (SR-MRA) problem: estimating an image from its down-sampled, circularly-translated, and noisy copies. The SR-MRA problem serves as a mathematical abstraction of the structure determination problem for biological molecules. Since the SR-MRA problem is ill-posed without prior knowledge, accu...
In a previous paper, we have introduced a deep learning neural network that should be able to detect the existence of very shallow periodic planetary transits in the presence of red noise. The network in that feasibility study would not provide any further details about the detected transits. The current paper completes this missing part. We presen...
The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system. In this paper, different from most cross-domain works that utilize some (or full) source domain supervision, we approach a relatively ne...
Generative Adversarial Networks (GANs) are very popular frameworks for generating high-quality data, and are immensely used in both the academia and industry in many domains. Arguably, their most substantial impact has been in the area of computer vision, where they achieve state-of-the-art image generation. This chapter gives an introduction to GA...
This work suggests using sampling theory to analyze the function space represented by interpolating mappings. While the analysis in this paper is general, we focus it on neural networks with bounded weights that are known with their ability to interpolate (fit) the training data. First, we show, under the assumption of a finite input domain, which...
In this work, we present our strategy for camera control in dynamic scenes with multiple people (sports teams). We learn a generic model of the player dynamics offline in simulation. We use only a few sparse demonstrations of a user's camera control policy to learn a reward function to drive camera motion in an ongoing dynamic scene. Key to our app...
Neural implicit fields are quickly emerging as an attractive representation for learning based techniques. However, adopting them for 3D shape modeling and editing is challenging. We introduce a method for $\mathbf{E}$diting $\mathbf{I}$mplicit $\mathbf{S}$hapes $\mathbf{T}$hrough $\mathbf{P}$art $\mathbf{A}$ware $\mathbf{G}$enera$\mathbf{T}$ion, p...
Fictional languages have become increasingly popular over the recent years appearing in novels, movies, TV shows, comics, and video games. While some of these fictional languages have a complete vocabulary, most do not. We propose a deep learning solution to the problem. Using style transfer and machine translation tools, we generate new words for...
We introduce DeepMLS, a space-based deformation technique, guided by a set of displaced control points. We leverage the power of neural networks to inject the underlying shape geometry into the deformation parameters. The goal of our technique is to enable a realistic and intuitive shape deformation. Our method is built upon moving least-squares (M...
We present a method for video reconstruction of the scene dynamics from a single image using coded motion blur. Our approach addresses the limitations of the ill-posed task and utilizes a learned optical coding approach.
Video reconstruction from a single motion-blurred image is a challenging problem, which can enhance existing cameras' capabilities. Recently, several works addressed this task using conventional imaging and deep learning. Yet, such purely-digital methods are inherently limited, due to direction ambiguity and noise sensitivity. Some works proposed t...
The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system. In this paper, different from most cross-domain works that utilize some (or full) source domain supervision, we approach a relatively ne...
Network architecture search achieves state-of-the-art results in various tasks such as classification and semantic segmentation. Recently, a reinforcement learning-based approach has been proposed for generative adversarial networks (GANs) search. In this work, we propose an alternative strategy for GAN search by using a proxy task instead of commo...
Deep learning is a powerful tool for exploring large datasets and discovering new patterns. This work presents an account of a metric learning-based deep convolutional neural network (CNN) applied to an archaeological dataset. The proposed account speaks of three stages: training, testing/validating, and community detection. Several thousand artefa...
We introduce a novel technique for neural point cloud consolidation which learns from only the input point cloud. Unlike other point up-sampling methods which analyze shapes via local patches, in this work, we learn from global subsets. We repeatedly self-sample the input point cloud with global subsets that are used to train a deep neural network....
Despite recent advances in geometric modeling, 3D mesh modeling still involves a considerable amount of manual labor by experts. In this paper, we introduce Mesh Draping: a neural method for transferring existing mesh structure from one shape to another. The method drapes the source mesh over the target geometry and at the same time seeks to preser...
Recently, several deep learning approaches have been proposed for point cloud registration. These methods train a network to generate a representation that helps finding matching points in two 3D point clouds. Finding good matches allows them to calculate the transformation between the point clouds accurately. Two challenges of these techniques are...
Although Deep Neural Networks (DNNs) achieve excellent performance on many real-world tasks, they are highly vulnerable to adversarial attacks. A leading defense against such attacks is adversarial training, a technique in which a DNN is trained to be robust to adversarial attacks by introducing adversarial noise to its input. This procedure is eff...
Convolutional Neural Networks (CNNs) are very popular in many fields including computer vision, speech recognition, natural language processing, etc. Though deep learning leads to groundbreaking performance in those domains, the networks used are very computationally demanding and are far from being able to perform in real-time applications even on...
Unsupervised style transfer that supports diverse input styles using only one trained generator is a challenging and interesting task in computer vision. This paper proposes a Multi-IlluStrator Style Generative Adversarial Network (MISS GAN) that is a multi-style framework for unsupervised image-to-illustration translation, which can generate style...
Unsupervised style transfer that supports diverse input styles using only one trained generator is a challenging and interesting task in computer vision. This paper proposes a Multi-IlluStrator Style Generative Adversarial Network (MISS GAN) that is a multi-style framework for unsupervised image-to-illustration translation, which can generate style...
Establishing a consistent normal orientation for point clouds is a notoriously difficult problem in geometry processing, requiring attention to both local and global shape characteristics. The normal direction of a point is a function of the local surface neighborhood; yet, point clouds do not disclose the full underlying surface structure. Even as...
Stereo imaging is the most common passive method for producing reliable depth maps. Calibration is a crucial step for every stereo-based system, and despite all the advancements in the field, most calibrations are still done by the same tedious method using a checkerboard target. Monocular-based depth estimation methods do not require extrinsic cal...
Recently, great progress has been made in the field of Few-Shot Learning (FSL). While many different methods have been proposed, one of the key factors leading to higher FSL performance is surprisingly simple. It is the backbone network architecture used to embed the images of the few-shot tasks. While first works on FSL resorted to small architect...
We present a technique for rendering point clouds using a neural network. Existing point rendering techniques either use splatting, or first reconstruct a surface mesh that can then be rendered. Both of these techniques require solving for global point normal orientation, which is a challenging problem on its own. Furthermore, splatting techniques...
Few-shot detection and classification have advanced significantly in recent years. Yet, detection approaches require strong annotation (bounding boxes) both for pre-training and for adaptation to novel classes, and classification approaches rarely provide localization of objects in the scene. In this paper, we introduce StarNet - a few-shot model f...
The increasing availability of video recordings made by multiple cameras has offered new means for mitigating occlusion and depth ambiguities in pose and motion reconstruction methods. Yet, multi-view algorithms strongly depend on camera parameters, in particular, the relative positions among the cameras. Such dependency becomes a hurdle once shift...
The increasing availability of video recordings made by multiple cameras has offered new means for mitigating occlusion and depth ambiguities in pose and motion reconstruction methods. Yet, multi-view algorithms strongly depend on camera parameters, in particular, the relative positions among the cameras. Such dependency becomes a hurdle once shift...
Establishing a consistent normal orientation for point clouds is a notoriously difficult problem in geometry processing, requiring attention to both local and global shape characteristics. The normal direction of a point is a function of the local surface neighborhood; yet, point clouds do not disclose the full underlying surface structure. Even as...
Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for W...
We introduce a Progressive Positional Encoding (PPE) layer, which gradually exposes signals with increasing frequencies throughout the neural optimization. In this paper, we show the competence of the PPE layer for mesh transfer and its advantages compared to contemporary surface mapping techniques. Our approach is simple and requires little user g...
This paper proposes a new methodology for solving the well-known rank aggregation problem from pairwise comparisons using low-rank matrix completion. Partial and noisy data of pairwise comparisons is first transformed into a matrix form. We then use tools from matrix completion, which has served as a major component in the low-rank based completion...
We present a novel method for neural network quantization. Our method, named UNIQ , emulates a non-uniform k -quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-...
We propose new, and robust, loss functions for the point cloud registration problem. Our loss functions are inspired by the Best Buddies Similarity (BBS) measure that counts the number of mutual nearest neighbors between two point sets. This measure has been shown to be robust to outliers and missing data in the case of template matching for images...
Deep neural networks are widespread due to their powerful performance. Yet, they suffer from degraded performance in the presence of noisy labels at train time or adversarial examples during inference. Inspired by the setting of learning with expert advice, where multiplicative weights (MW) updates were recently shown to be robust to moderate adver...
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution. In recent years, solutions that are based on deep Convolutional Neural Networks (CNNs) have shown great promise. Yet, most of these techniques, which train CNNs using external data, are restricted to the observation models that have bee...
Blind deconvolution and demixing is the problem of reconstructing convolved signals and kernels from the sum of their convolutions. This problem arises in many applications, such as blind MIMO. This work presents a separable approach to blind deconvolution and demixing via convex optimization. Unlike previous works, our formulation allows separatio...
Blind deconvolution and demixing is the problem of reconstructing convolved signals and kernels from the sum of their convolutions. This problem arises in many applications, such as blind MIMO. This work presents a separable approach to blind deconvolution and demixing via convex optimization. Unlike previous works, our formulation allows separatio...
The vast majority of image recovery tasks are ill-posed problems. As such, methods that are based on optimization use cost functions that consist of both fidelity and prior (regularization) terms. A recent line of works imposes the prior by the Regularization by Denoising (RED) approach, which exploits the good performance of existing image denoisi...
Nowadays, many of the images captured are "observed" by machines only and not by humans, for example, robots' or autonomous cars' cameras. High-level machine vision models, such as object recognition, assume images are transformed to some canonical image space by the camera ISP. However, the camera ISP is optimized for producing visually pleasing i...