Article

A Graph-Based Accelerator of Retinex Model With Bit-Serial Computing for Image Enhancements

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This work proposes the Poisson equation formulation of the Retinex model for image enhancements using a low-power graph hardware accelerator performing finite difference updates on a lattice graph processing element (PE) array. By encapsulating the underlying algorithm in a graph hardware structure, a highly localized dataflow that takes advantage of the physical placement of the PEs is enabled to minimize data movement and maximize data reuse. The on-chip dataflow that achieves data sharing, and reuse among neighboring PEs during massively parallel updates is generated in each PE driven by two external control signals. Using a custom accumulator design intended for bit-serial computing, this work enables precision on demand and extensive on-chip data reuse with minimal area overhead, accommodating a non-overlap image mapping scheme in which a 20×20 image tile can be processed without external memory access at a time. With increasing user-configurable update count, image noise and shadow can be progressively removed with the inevitable loss of image details. Fabricated using a 65nm technology, the test chip occupies 0.2955mm2 core area and consumes 2.191mW operating at 1V, 25.6MHz, and a reconfigurable 10-or 14-bit precision.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Fifty years of Moore’s law scaling in microelectronics have brought remarkable opportunities for the rapidly evolving field of microscopic robotics1–5. Electronic, magnetic and optical systems now offer an unprecedented combination of complexity, small size and low cost6,7, and could be readily appropriated for robots that are smaller than the resolution limit of human vision (less than a hundred micrometres)8–11. However, a major roadblock exists: there is no micrometre-scale actuator system that seamlessly integrates with semiconductor processing and responds to standard electronic control signals. Here we overcome this barrier by developing a new class of voltage-controllable electrochemical actuators that operate at low voltages (200 microvolts), low power (10 nanowatts) and are completely compatible with silicon processing. To demonstrate their potential, we develop lithographic fabrication-and-release protocols to prototype sub-hundred-micrometre walking robots. Every step in this process is performed in parallel, allowing us to produce over one million robots per four-inch wafer. These results are an important advance towards mass-manufactured, silicon-based, functional robots that are too small to be resolved by the naked eye.
Article
Full-text available
Microscale systems that can combine multiple functionalities, such as untethered motion, actuation and communication, could be of use in a variety of applications from robotics to drug delivery. However, these systems require both rigid and flexible components—including microelectronic circuits, engines, actuators, sensors, controllers and power supplies—to be integrated on a single platform. Here, we report a flexible microsystem that is capable of controlled locomotion and actuation, and is driven by wireless power transfer. The microsystem uses two tube-shaped catalytic micro-engines that are connected via a flat polymeric structure. A square coil is integrated into the platform, which enables wireless energy transfer via inductive coupling. As a result, the catalytic engines can be locally heated and the direction of motion controlled. Our platform can also integrate light-emitting diodes and a thermoresponsive micro-arm that can be used to perform grasp and release tasks.
Conference Paper
Full-text available
In this paper, It focuses on few out of many Retinex based method for Image Enhancement. Retinex is basically a concept of capturing an image in such a way in which a human being perceives it after looking at an object at the place with the help of their retina (Human Eye) and cortex (Mind). On the basis of Retinex theory, we can say an image as a product of illumination and reflectance from the object. Retinex focuses on dynamic range and color constancy of an image. There are various methods proposed by various researchers till date which use Retinex for image contrast enhancement. In this paper, we will discuss Single Scale Retinex (SSR), Multi-Scale Retinex (MSR), Improved Retinex Image Enhancement (IRIE), MSR improvement for night time Enhancement (MSRINTE) and Retinex Based Perceptual Contrast Enhancement in image using luminance adaptation (RBPCELA).
Article
Full-text available
Understanding shadows from a single image spontaneously derives into two types of task in previous studies, containing shadow detection and shadow removal. In this paper, we present a multi-task perspective, which is not embraced by any existing work, to jointly learn both detection and removal in an end-to-end fashion that aims at enjoying the mutually improved benefits from each other. Our framework is based on a novel STacked Conditional Generative Adversarial Network (ST-CGAN), which is composed of two stacked CGANs, each with a generator and a discriminator. Specifically, a shadow image is fed into the first generator which produces a shadow detection mask. That shadow image, concatenated with its predicted mask, goes through the second generator in order to recover its shadow-free image consequently. In addition, the two corresponding discriminators are very likely to model higher level relationships and global scene characteristics for the detected shadow region and reconstruction via removing shadows, respectively. More importantly, for multi-task learning, our design of stacked paradigm provides a novel view which is notably different from the commonly used one as the multi-branch version. To fully evaluate the performance of our proposed framework, we construct the first large-scale benchmark with 1870 image triplets (shadow image, shadow mask image, and shadow-free image) under 135 scenes. Extensive experimental results consistently show the advantages of ST-CGAN over several representative state-of-the-art methods on two large-scale publicly available datasets and our newly released one.
Article
Full-text available
The retina is the most accessible element of the central nervous system for linking behavior to the activity of isolated neurons. We unraveled behavior at the elementary level of single input units?the visual sensation generated by stimulating individual long (L), middle (M), and short (S) wavelength?sensitive cones with light. Spectrally identified cones near the fovea of human observers were targeted with small spots of light, and the type, proportion, and repeatability of the elicited sensations were recorded. Two distinct populations of cones were observed: a smaller group predominantly associated with signaling chromatic sensations and a second, more numerous population linked to achromatic percepts. Red and green sensations were mainly driven by L- and M-cones, respectively, although both cone types elicited achromatic percepts. Sensations generated by cones were rarely stochastic; rather, they were consistent over many months and were dominated by one specific perceptual category. Cones lying in the midst of a pure spectrally opponent neighborhood, an arrangement purported to be most efficient in producing chromatic signals in downstream neurons, were no more likely to signal chromatic percepts. Overall, the results are consistent with the idea that the nervous system encodes high-resolution achromatic information and lower-resolution color signals in separate pathways that emerge as early as the first synapse. The lower proportion of cones eliciting color sensations may reflect a lack of evolutionary pressure for the chromatic system to be as fine-grained as the high-acuity achromatic system.
Article
Full-text available
In 1964 Edwin H. Land formulated the Retinex theory, the first attempt to simulate and explain how the human visual system perceives color. Unfortunately, the Retinex Land-McCann original algorithm is both complex and not fully specified. Indeed, this algorithm computes at each pixel an average of a very large set of paths on the image. For this reason, Retinex has received several interpretations and implementations which, among other aims, attempt to tune down its excessive complexity. But, Morel et al. have shown that the original Retinex algorithm can be formalized as a (discrete) partial differential equation. This article describes the PDE-Retinex, a fast implementation of the Land-McCann original theory using only two DFT’s.
Article
Full-text available
In 1964 Edwin H. Land formulated the Retinex theory, the first attempt to simulate and explain how the human visual system perceives color. His theory and an extension, the "reset Retinex" were further formalized by Land and McCann [1]. Several Retinex algorithms have been developed ever since. These color constancy algorithms modify the RGB values at each pixel to give an estimate of the color sensation without a priori information on the illumination. Unfortunately, the Retinex Land-McCann original algorithm is both complex and not fully specified. Indeed, this algorithm computes at each pixel an average of a very large set of paths on the image. For this reason, Retinex has received several interpretations and implementations which, among other aims, attempt to tune down its excessive complexity. In this paper, it is proved that if the paths are assumed to be symmetric random walks, the Retinex solutions satisfy a discrete screened Poisson equation. This formalization yields an exact and fast implementation using only two FFTs. Several experiments on color images illustrate the effectiveness of the Retinex original theory.
Article
Full-text available
Sensations of color show a strong correlation with reflectance, even though the amount of visible light reaching the eye depends on the product of reflectance and illumination. The visual system must achieve this remarkable result by a scheme that does not measure flux. Such a scheme is described as the basis of retinex theory. This theory assumes that there are three independent cone systems, each starting with a set of receptors peaking, respectively, in the long-, middle-, and short-wavelength regions of the visible spectrum. Each system forms a separate image of the world in terms of lightness that shows a strong correlation with reflectance within its particular band of wavelengths. These images are not mixed, but rather are compared to generate color sensations. The problem then becomes how the lightness of areas in these separate images can be independent of flux. This article describes the mathematics of a lightness scheme that generates lightness numbers, the biologic correlate of reflectance, independent of the flux from objects
Article
Full-text available
We propose a novel image denoising strategy based on an enhanced sparse representation in transform domain. The enhancement of the sparsity is achieved by grouping similar 2-D image fragments (e.g., blocks) into 3-D data arrays which we call "groups." Collaborative filtering is a special procedure developed to deal with these 3-D groups. We realize it using the three successive steps: 3-D transformation of a group, shrinkage of the transform spectrum, and inverse 3-D transformation. The result is a 3-D estimate that consists of the jointly filtered grouped image blocks. By attenuating the noise, the collaborative filtering reveals even the finest details shared by grouped blocks and, at the same time, it preserves the essential unique features of each individual block. The filtered blocks are then returned to their original positions. Because these blocks are overlapping, for each pixel, we obtain many different estimates which need to be combined. Aggregation is a particular averaging procedure which is exploited to take advantage of this redundancy. A significant improvement is obtained by a specially developed collaborative Wiener filtering. An algorithm based on this novel denoising strategy and its efficient implementation are presented in full detail; an extension to color-image denoising is also developed. The experimental results demonstrate that this computationally scalable algorithm achieves state-of-the-art denoising performance in terms of both peak signal-to-noise ratio and subjective visual quality.
Article
Microscale robots introduce great perspectives into many medical applications such as drug delivery, minimally invasive surgery, and localized biometric diagnostics. Fully automatic microrobots' real-time detection and tracking using medical imagers are actually investigated for future clinical translation. Ultrasound (US) B-mode imaging has been employed to monitor single agents and collective swarms of microrobots in vitro and ex vivo in controlled experimental conditions. However, low contrast and spatial resolution still limit the effective employment of such a method in a medical microrobotic scenario due to uncertainties associated with the position of microrobots. The positioning error arises due to the inaccuracy of the US-based visual feedback, which is provided by the detection and tracking algorithms. The application of deep learning networks is a promising solution to detect and track real-time microrobots in noisy ultrasonic images. However, what is most striking is the performance gap among state-of-the-art microrobots deep learning detection and tracking research. A key factor of that is the unavailability of large-scale datasets and benchmarks. In this paper, we present the first publicly available B-mode ultrasound dataset for microrobots ( USmicroMagSet ) with accurate annotations which contains more than 40000 samples of magnetic microrobots. In addition, for analyzing the performance of microrobots included in the proposed benchmark dataset, 4 deep learning detectors and 4 deep learning trackers are used.
Article
This article presents a 50.1-Mpixel 14-bit 250-frames/s back-illuminated stacked CMOS image sensor on 35-mm optical format exhibiting 1.18- e\text{e}^{-} rms random noise at 0 dB. This sensor employs a load reduction technique by splitting half of pixel signal line using a Cu-Cu connection technology underneath the pixel area, pipelined operation with a gain-adaptive column-parallel kT / C noise-canceling sample and hold, and a 250-frames/s scanning rate and 14-bit resolution delta-sigma analog-to-digital converter (ADC) circuit. Moreover, an on-chip online calibration of column mismatch maintains the non-linearity of the output image within −0.42%. As a result, FoM6 ( e\text{e}^\ast pJ/step) of 0.09 is obtained as the state-of-the-art performance.
Article
This article presents generative adversarial network processing unit (GANPU), an energy-efficient multiple deep neural network (DNN) training processor for GANs. It enables on-device training of GANs on performance- and battery-limited mobile devices, without sending user-specific data to servers, fully evading privacy concerns. Training GANs require a massive amount of computation, and therefore, it is difficult to accelerate in a resource-constrained platform. Besides, networks and layers in GANs show dramatically changing operational characteristics, making it difficult to optimize the processor’s core and bandwidth allocation. For higher throughput and energy efficiency, this article proposed three key features. An adaptive spatiotemporal workload multiplexing is proposed to maintain high utilization in accelerating multiple DNNs in a single GAN model. To take advantage of ReLU sparsity during both inference and training, dual-sparsity exploitation architecture is proposed to skip redundant computations due to input and output feature zeros. Moreover, an exponent-only ReLU speculation (EORS) algorithm is proposed along with its lightweight processing element (PE) architecture, to estimate the location of output feature zeros during the inference with minimal hardware overhead. Fabricated in a 65-nm process, the GANPU achieved the energy efficiency of 75.68 TFLOPS/W for 16-bit floating-point computation, which is 4.85 ×\times higher than the state of the art. As a result, GANPU enables on-device training of GANs with high energy efficiency.
Article
Neuromorphic vision sensors (NVSs) can enable energy savings due to their event-driven that exploits the temporal redundancy in video streams from a stationary camera. However, noise-driven events lead to the false triggering of the object recognition processor. Image denoise operations require memory-intensive processing leading to a bottleneck in energy and latency. In this article, we present in-memory filtering (IMF), a 6T-SRAM in-memory computing (IMC)-based image denoising for event-based binary image (EBBI) frame from an NVS. We propose a non-overlap median filter (NOMF) for image denoising. An IMC framework enables hardware implementation of NOMF leveraging the inherent read disturb phenomenon of 6T-SRAM. To demonstrate the energy-saving and effectiveness of the algorithm, we fabricated the proposed architecture in a 65-nm CMOS process. Compared to fully digital implementation, IMF enables >70 ×\times energy savings and a >3 ×\times improvement of processing time when tested with the video recordings from a DAVIS sensor and achieves a peak throughput of 134.4 GOPS. Furthermore, the peak energy efficiencies of the NOMF are 51.3 TOPS/W, comparable with state-of-the-art in-memory processors. We also show that the accuracy of the images obtained by NOMF provides comparable accuracy in tracking and classification applications compared with images obtained by conventional median filtering.
Conference Paper
The goal of this study is to estimate the thermal impact of a titanium skull unit (SU) implanted on the exterior aspect of the human skull. We envision this unit to house the front-end of a fully implantable electrocorticogram (ECoG)-based bi-directional (BD) brain-computer interface (BCI). Starting from the bio-heat transfer equation with physiologically and anatomically constrained tissue parameters, we used the finite element method (FEM) implemented in COMSOL to build a computational model of the SU’s thermal impact. Based on our simulations, we predicted that the SU could consume up to 75 mW of power without raising the temperature of surrounding tissues above the safe limits (increase in temperature of 1°C). This power budget by far exceeds the power consumption of our front-end prototypes, suggesting that this design can sustain the SU’s ability to record ECoG signals and deliver cortical stimulation. These predictions will be used to further refine the existing SU design and inform the design of future SU prototypes.
Article
Shadow removal is an essential task for scene understanding. Many studies consider only matching the image contents, which often causes two types of ghosts: color in-consistencies in shadow regions or artifacts on shadow boundaries (as shown in Figure. 1). In this paper, we tackle these issues in two ways. First, to carefully learn the border artifacts-free image, we propose a novel network structure named the dual hierarchically aggregation network (DHAN). It contains a series of growth dilated convolutions as the backbone without any down-samplings, and we hierarchically aggregate multi-context features for attention and prediction, respectively. Second, we argue that training on a limited dataset restricts the textural understanding of the network, which leads to the shadow region color in-consistencies. Currently, the largest dataset contains 2k+ shadow/shadow-free image pairs. However, it has only 0.1k+ unique scenes since many samples share exactly the same background with different shadow positions. Thus, we design a shadow matting generative adversarial network (SMGAN) to synthesize realistic shadow mattings from a given shadow mask and shadow-free image. With the help of novel masks or scenes, we enhance the current datasets using synthesized shadow images. Experiments show that our DHAN can erase the shadows and produce high-quality ghost-free images. After training on the synthesized and real datasets, our network outperforms other state-of-the-art methods by a large margin. The code is available: http://github.com/vinthony/ghost-free-shadow-removal/
Article
Retinex theory is developed mainly to decompose an image into the illumination and reflectance components by analyzing local image derivatives. In this theory, larger derivatives are attributed to the changes in reflectance, while smaller derivatives are emerged in the smooth illumination. In this paper, we utilize exponentiated local derivatives (with an exponent γ\gamma ) of an observed image to generate its structure map and texture map. The structure map is produced by been amplified with γ>1\gamma >1 , while the texture map is generated by been shrank with γ<1\gamma < 1 . To this end, we design exponential filters for the local derivatives, and present their capability on extracting accurate structure and texture maps, influenced by the choices of exponents γ\gamma . The extracted structure and texture maps are employed to regularize the illumination and reflectance components in Retinex decomposition. A novel Structure and Texture Aware Retinex (STAR) model is further proposed for illumination and reflectance decomposition of a single image. We solve the STAR model by an alternating optimization algorithm. Each sub-problem is transformed into a vectorized least squares regression, with closed-form solutions. Comprehensive experiments on commonly tested datasets demonstrate that, the proposed STAR model produce better quantitative and qualitative performance than previous competing methods, on illumination and reflectance decomposition, low-light image enhancement, and color correction. The code is publicly available at https://github.com/csjunxu/STAR .
Article
This article presents a 640 ×\times 640 fully dynamic CMOS image sensor for the always-on operation. It consists of a dynamic pixel source follower (SF), whose output signal is sampled into a parasitic column capacitor and then read out by a dynamic single-slope (SS) analog-to-digital converter (ADC) based on a dynamic bias comparator and an energy-efficient two-step counter. The prototype sensor was implemented in a 110-nm CMOS process, achieving 0.3% peak non-linearity, 6.1 e rms random noise (RN), and 67-dB dynamic range. The power consumption is only 2.1 mW at 44 frames per second (fps) and is further reduced to 140 μW\mu \text{W} at 5 fps with the sub-sampled 320 ×\times 320 mode. This sensor achieves a state-of-the-art energy efficiency figure-of-merit of 0.71 e \cdot nJ.
Article
In this paper, we report on a back-illuminated, global shutter, CMOS image sensor (CIS) with a pixel-parallel, single-slope analog-to-digital converter (ADC). We adopted a digital bucket relay transfer with multistage flip-flop connection, a pixel unit Cu-Cu connection, and positive-feedback circuitry, to realize a 6.9-μm pixel-pitch, 1.46-Mpixel pixel-parallel ADC. By operating the comparator with a bias current in the subthreshold region of 7.74-111 nA, we succeeded in reducing the peak current during simultaneous ADC. In combination with an ADC standby operation, we succeeded in further reducing the pixel-parallel ADC power consumption. With these techniques, we realized a normalized figure of merit of 0.24 nJ·e-rms/step calculated by dividing the entire sensor power by the effective ADC resolution at a subthreshold current of 111 nA during 660 frames/s operation.
Article
This paper presents a CMOS image sensor (CIS) that extracts a multi-level edge image as well as a human-friendly normal image in a real time from conventional pixels for machine-vision applications, utilizing a proposed speed/power-efficient dual-mode successive-approximation register analog-to-digital converter (SAR ADC). The proposed readout scheme operates in two modes, fine step SAR (FS-SAR) mode and coarse-step single-slope (CS-SS) mode, depending on the difference (Δ) between a chosen pixel and the previous pixel. If a chosen pixel is at a boundary of an object with a large Δ from the previous pixel, the readout ADC works in the CS-SS mode to readout the edge strength (ES), while the FS-SAR mode is applied for other pixels. By displaying the ES, a multi-level edge image can be obtained in a real time along with a normal image with no hardware/time overhead. By saving the MSBs conversion cycles regardless of Δ, the proposed dual-mode readout scheme enhances the readout speed and reduces power consumption. A prototype QQVGA CIS with 10-bit SAR ADCs was fabricated in a 0.18-μm 1P4M CMOS image sensor process with a 4.9-μm pixel pitch. With a maximum pixel rate of 61.4 Mp/s, the prototype demonstrated figure of merits of 70 pJ/pixel/frame, 0.35 e⁻ · nJ, and 0.34 e⁻ · pJ/step.
Article
Lacking realistic ground truth data, image denoising techniques are traditionally evaluated on images corrupted by synthesized i.i.d. Gaussian noise. We aim to obviate this unrealistic setting by developing a methodology for benchmarking denoising techniques on real photographs. We capture pairs of images with different ISO values and appropriately adjusted exposure times, where the nearly noise-free low-ISO image serves as reference. To derive the ground truth, careful post-processing is needed. We correct spatial misalignment, cope with inaccuracies in the exposure parameters through a linear intensity transform based on a novel heteroscedastic Tobit regression model, and remove residual low-frequency bias that stems, e.g., from minor illumination changes. We then capture a novel benchmark dataset, the Darmstadt Noise Dataset (DND), with consumer cameras of differing sensor sizes. One interesting finding is that various recent techniques that perform well on synthetic noise are clearly outperformed by BM3D on photographs with real noise. Our benchmark delineates realistic evaluation scenarios that deviate strongly from those commonly used in the scientific literature.
Article
This paper presents an on-chip implementation of a scalable reconfigurable bilateral filtering processor for computational photography applications such as HDR imaging, low-light enhancement, and glare reduction. Careful pipelining and scheduling has minimized the local storage requirement to tens of kB. The 40-nm CMOS test chip operates from 98 MHz at 0.9 V to 25 MHz at 0.5 V. The test chip processes 13 megapixels/s while consuming 17.8 mW at 98 MHz and 0.9 V, achieving significant energy reduction compared with software implementations on recent mobile processors.
Article
We rely on our visual system to cope with the vast barrage of incoming light patterns and to extract features from the scene that are relevant to our well-being. The necessary reduction of visual information already begins in the eye. In this review, we summarize recent progress in understanding the computations performed in the vertebrate retina and how they are implemented by the neural circuitry. A new picture emerges from these findings that helps resolve a vexing paradox between the retina's structure and function. Whereas the conventional wisdom treats the eye as a simple prefilter for visual images, it now appears that the retina solves a diverse set of specific tasks and provides the results explicitly to downstream brain areas.