PreprintPDF Available


  • Latent Sciences
Preprints and early-stage research may not have been peer reviewed yet.
Accepted to 2020 Cold Spring Harbor Laboratory meeting: From Neuroscience to Artificially Intelligent Systems (NAISys)
Alexander Lavin1
1Augustus Intelligence, R&D, NYC, NY
Attention in the human vision system is a mechanism of efficiency, focusing limited computational resources on the
relevant parts of a scene to save “bandwidth” and minimize complexity. There are distinct and complementary attentional
mechanisms: e.g. covert in the periphery, overt guiding fixation, feature-based attention (FBA) to identify specific aspects
such as color, and object-based attention (OBA).
Attention in artificial vision systems, on the other hand, aims to isolate “interesting” or salient regions of an image for
further processing by a convolutional neural net (i.e. R-CNN). These methods have been successful in image
classification tasks while significantly reducing the computational burden of CNNs. Yet we suggest room for
improvement by incorporating aspects of human visual attention: efficiency via complementary attention processes, and
utilizing valuable task-dependent information.
Our info-theoretic, sequential processing notion of saliency more closely resembles human fixation patterns than other
methods. We define a (unsupervised) partially-observable Markov decision process (POMDP) atop a retinotopically
organized self-information map. For task-dependent attention, we can incorporate a supervisory signal as feedback at
each fixation step, yielding a mutual information map.
Our “retina-like” sensor does not see the environment in full, but rather extracts information only in a local region (or
narrow frequency band), similar to the Recurrent Attention Model. They define a reinforcement learning agent that
receives a scalar reward at each fixation to learn a high-order policy, whereas we use a first-order POMDP with optional
supervision to guide the sensor; studies show humans do not integrate high-order sequence info in fixations.
We model a dual covert and overt attentional process: The covert mechanism analyzes the periphery for
maximally-informative data, leading to overt fixation to that next unseen location. To overcome fixations oscillating
between regions of max interest, we bookkeep with a fixation history map: a 2D representation larger than the visual field
containing the sequence of recent fixations, not unlike human frontal eye fields.
For visual processing at each attended subset of the visual space, we implement deep kernel learning, combining the
non-parametric flexibility of Gaussian Processes with the inductive biases and feature extraction of a CNN.
We experiment on Multi-MNIST to show OBA vs FBA -- the former seeks full digits, the latter seeks specific features. We
use two challenging human eye fixation datasets – MIT300, CAT2000 -- to validate the task-based attention paths.
Visual anomaly detection and localization is a great application of our approach: object-based algorithms of R-CNN won’t
suffice, and in general end-to-end learning is a poor approach due to class imbalance and unavailability of labeled
anomaly data. We elucidate this on the benchmark Cement Crack dataset, yielding results competitive with state-of-art
visual anomaly detection methods, while being computationally more efficient.
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.