October 2023
·
31 Reads
·
98 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
October 2023
·
31 Reads
·
98 Citations
June 2023
·
571 Reads
·
149 Citations
Nature Biomedical Engineering
Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such ‘out of distribution’ performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for ‘Robust and Efficient Medical Imaging with Self-supervision’), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1–33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging.
February 2023
·
82 Reads
·
202 Citations
This article does not describe a working system. Instead, it presents a single idea about representation that allows advances made by several different groups to be combined into an imaginary system called GLOM.¹ The advances include transformers, neural fields, contrastive representation learning, distillation, and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy that has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language.
December 2022
·
510 Reads
·
7 Citations
The aim of this paper is to introduce a new learning procedure for neural networks and to demonstrate that it works well enough on a few small problems to be worth further investigation. The Forward-Forward algorithm replaces the forward and backward passes of backpropagation by two forward passes, one with positive (i.e. real) data and the other with negative data which could be generated by the network itself. Each layer has its own objective function which is simply to have high goodness for positive data and low goodness for negative data. The sum of the squared activities in a layer can be used as the goodness but there are many other possibilities, including minus the sum of the squared activities. If the positive and negative passes could be separated in time, the negative passes could be done offline, which would make the learning much simpler in the positive pass and allow video to be pipelined through the network without ever storing activities or stopping to propagate derivatives.
December 2022
·
37 Reads
Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference. We present Fast Weight Layers (FWLs), a neural component that provides the benefits of dynamic evaluation much more efficiently by expressing gradient updates as linear attention. A key improvement over dynamic evaluation is that FWLs can also be applied at training time so the model learns to make good use of gradient updates. FWLs can easily be added on top of existing transformer models, require relatively little extra compute or memory to run, and significantly improve language modeling perplexity.
November 2022
·
180 Reads
The GLOM architecture proposed by Hinton [2021] is a recurrent neural network for parsing an image into a hierarchy of wholes and parts. When a part is ambiguous, GLOM assumes that the ambiguity can be resolved by allowing the part to make multi-modal predictions for the pose and identity of the whole to which it belongs and then using attention to similar predictions coming from other possibly ambiguous parts to settle on a common mode that is predicted by several different parts. In this study, we describe a highly simplified version of GLOM that allows us to assess the effectiveness of this way of dealing with ambiguity. Our results show that, with supervised training, GLOM is able to successfully form islands of very similar embedding vectors for all of the locations occupied by the same object and it is also robust to strong noise injections in the input and to out-of-distribution input transformations.
October 2022
·
195 Reads
We revisit the challenging problem of training Gaussian-Bernoulli restricted Boltzmann machines (GRBMs), introducing two innovations. We propose a novel Gibbs-Langevin sampling algorithm that outperforms existing methods like Gibbs sampling. We propose a modified contrastive divergence (CD) algorithm so that one can generate images with GRBMs starting from noise. This enables direct comparison of GRBMs with deep generative models, improving evaluation protocols in the RBM literature. Moreover, we show that modified CD and gradient clipping are enough to robustly train GRBMs with large learning rates, thus removing the necessity of various tricks in the literature. Experiments on Gaussian Mixtures, MNIST, FashionMNIST, and CelebA show GRBMs can generate good samples, despite their single-hidden-layer architecture. Our code is released at: \url{https://github.com/lrjconan/GRBM}.
October 2022
·
28 Reads
Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image. As permutations of instance IDs are also valid solutions, the task requires learning of high-dimensional one-to-many mapping. As a result, state-of-the-art approaches use customized architectures and task-specific loss functions. We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task. A diffusion model based on analog bits is used to model panoptic masks, with a simple, generic architecture and loss function. By simply adding past predictions as a conditioning signal, our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically. With extensive experiments, we demonstrate that our generalist approach can perform competitively to state-of-the-art specialist methods in similar settings.
October 2022
·
199 Reads
Forward gradient learning computes a noisy directional gradient and is a biologically plausible alternative to backprop for learning deep neural networks. However, the standard forward gradient algorithm, when applied naively, suffers from high variance when the number of parameters to be learned is large. In this paper, we propose a series of architectural and algorithmic modifications that together make forward gradient learning practical for standard deep learning benchmark tasks. We show that it is possible to substantially reduce the variance of the forward gradient estimator by applying perturbations to activations rather than weights. We further improve the scalability of forward gradient by introducing a large number of local greedy loss functions, each of which involves only a small number of learnable parameters, and a new MLPMixer-inspired architecture, LocalMixer, that is more suitable for local learning. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
August 2022
·
85 Reads
·
11 Citations
We present Bit Diffusion: a simple and generic approach for generating discrete data with continuous diffusion models. The main idea behind our approach is to first represent the discrete data as binary bits, and then train a continuous diffusion model to model these bits as real numbers which we call analog bits. To generate samples, the model first generates the analog bits, which are then thresholded to obtain the bits that represent the discrete variables. We further propose two simple techniques, namely Self-Conditioning and Asymmetric Time Intervals, which lead to a significant improvement in sample quality. Despite its simplicity, the proposed approach can achieve strong performance in both discrete image generation and image captioning tasks. For discrete image generation, we significantly improve previous state-of-the-art on both CIFAR-10 (which has 3K discrete 8-bit tokens) and ImageNet-64x64 (which has 12K discrete 8-bit tokens), outperforming the best autoregressive model in both sample quality (measured by FID) and efficiency. For image captioning on MS-COCO dataset, our approach achieves competitive results compared to autoregressive models.
... This structure contrasts with Recurrent Neural Networks (RNNs), where nodes in the hidden layer are not only interconnected but also receive inputs from both the current state and the preceding state. This design enables RNNs to maintain a memory of past information, which is then integrated with new inputs to update the network's state (Graves et al. 2013). However, RNNs often face challenges with gradient vanishing or explosion during long-term dependencies and backpropagation, which can impede effective learning (Bengio et al. 1994). ...
March 2013
... Beyond image generation, DMs inherently perform implicit discriminative reasoning while generating data, which proves highly effective in visual tasks that require complex relationship modeling and spatiotemporal reasoning. Therefore, a surge of work has successfully adapted generative diffusion models for tasks including image segmentation [1,4,5,13,30,41,95], object detection [11,75,106], object tracking [64,65,105], and monocular depth estimation [24,66,77,121]. Recent research has also utilizes DMs for more complex tasks that require high-level visual understanding abilities such as visual-linguistic understanding [45], scene generation [36], and human-object interaction detection [38,51]. ...
October 2023
... Schlag et al. (2021) point out that self-attention without softmax and other linear Transformer variants (Tsai et al., 2019;Katharopoulos et al., 2020;Choromanski et al., 2021;Peng et al., 2021) can be viewed as FWPs. Clark et al. (2022) propose fast weight layers which are added on top of the Transformer model after the last attention layer for language modeling. Different from previous work mainly focusing on specific tasks, our goal is to enhance frozen pretrained LMs with fast associative memory for general language processing. ...
January 2022
... Azizi et al. created REMEDIS [5], a multi-supervision level approach for transfer learning on medical tasks. REMEDIS combines large-scale supervised learning on natural images with self-supervised learning on medical images, and achieves significant performance gains on 15 medical tasks, compared to supervised baselines. ...
June 2023
Nature Biomedical Engineering
... Recent approaches have also explored using two forward passes to facilitate communication between upstream and downstream neurons [12][13][14][15]. The "Forward-Forward" (FF) learning algorithm [12], is an approach in which data and label hypotheses are combined as inputs, with optimisation seeking to upregulate neural response to correctly labelled inputs and subdue responses to spuriously labelled inputs. ...
December 2022
... Indeed, while often effective under Independent and Identically Distributed (I.I.D) conditions, CNNs' reliance on local spatial correlations leads to a tendency to prioritize superficial features, such as texture, over more intrinsic object characteristics as shown by Geirhos et al [15]. As further elaborated by in Hinton [17], this limitation stems from architectural choices like pooling layers, which, while providing translation invariance, inadvertently sacrifice precise spatial relationships crucial for encoding exact pose. Hinton further points out that this architecture prioritizes activity invariance over activity equivariance and weight invariance, leading to a reliance on data augmentation for viewpoint generalization. ...
February 2023
... However, diffusion models cannot directly be applied to discrete data such as DNA. Possible solutions include modifying the input or modelling space 134 or mapping the discrete input into a continuous latent space [135][136][137] . ...
August 2022
... Methods for receiving signals are described in the works [8,9]. With the advent of digital recording and processing techniques, more modern spatial filters have evolved, as have true two-dimensional filters. ...
June 2022
... For the microsatellite instability status prediction, we used Sim-CLR with a Resnet-18 as a backbone. SimCLR is a contrastive learning method that maximizes the agreement between two different augmented versions of the same image, thereby learning a relevant feature representation of the image 54 . It was trained on 50,000 synthetic tiles (10,000 per cancer type) for 50 epochs. ...
May 2022
... However their use of a mesh template does not allow for photorealistic renderings. Implicit functions [51,59] have also been utilized to reconstruct detailed 3D clothed humans [4,10,13,25,29,64]. However, they are also unable to generate photorealistic renderings and are often not reposable. ...
March 2020