Article

Efficient Panorama Mosaicing Based on Enhanced-FAST and Graph Cuts

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper presents an efficient and accurate method for creating full view panoramas. A new feature points detection algorithm called Enhanced- FAST is proposed to accurately align images and a graph cuts algorithm is used to merge two adjacent images seamlessly. Based on the FAST algorithm, the Enhanced-FAST algorithm smoothes and extends the sampling area making the feature points detection more insensitive to noise. Our graph cuts algorithm uses image Laplacian to compute the edge weights which can find an optimized seam even under different lighting. Experiments and comparisons show that our method is efficient and robust to image noise and lighting changing.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Another major issue of the FAST-based algorithms is that they are not particularly robust to increased degree of variations. For that, extending the sampling area beyond the sixteen pixels around each candidate point [53] could be considered as a promising approach, since it gives the FAST corner points more distinctiveness and, in turn, makes them invariant to larger variations. ...
... SURF detector-based Fast computation, good for real-time applications [27], [44], [45], [46] No information available MI-based [29], [47], [48] Processing time of 1.5min for stitching a pair of images (using 3.2 GHZ Pentium IV processor with 2GB RAM) [29] Processing time of 1s for video mosaicing 40 160*100 frames using 2.4 GHZ Pentium processor [47] Harris corner detector-based [30], [51], [23] Processing time of 2ms for stitching a pair of 512*512 images (using XC3S5000 FPGA board) [23] FAST corner detector-based [31], [52], [53] Processing time of 20ms for registering a pair of multimodal real images (using 2.5 GHZ Pentium Dual-Core processor with 3GB RAM) [52] Processing time of 437ms for stitching a pair of 512*512 images [53] SIFT feature detector-based [54], [11], [55], [10], [32], [56] Processing time of 2s for stitching 10 100*67 images using 2.4 GHZ Pentium Core 2Duo processor with 4GB RAM [11] Processing time of 31m for stitching 10 images using 3.3 GHZ Pentium dual core processor with 6GB RAM [10] SURF detector-based [58], [50], [59], [60] Processing time of 400ms for feature detection and description (using 3GHZ Pentium IV processor) [58] Processing time of 23.6 s for stitching a set of 9 1280*720 images [59] Processing time of 11.865s for matching a pair of 1600*1200 images [60] Contour-based [49], [65], [66], [67] Processing time of 3s for registering a pair of 1024*768 images (using 2.4 GHZ Pentium Dual-Core processor with 4GB RAM) [49] Frequency domain-based [34], [68], [69] Processing time of 3.9s for feature detection (using 3GHZ Pentium IV processor) [34] ...
... SURF detector-based Fast computation, good for real-time applications [27], [44], [45], [46] No information available MI-based [29], [47], [48] Processing time of 1.5min for stitching a pair of images (using 3.2 GHZ Pentium IV processor with 2GB RAM) [29] Processing time of 1s for video mosaicing 40 160*100 frames using 2.4 GHZ Pentium processor [47] Harris corner detector-based [30], [51], [23] Processing time of 2ms for stitching a pair of 512*512 images (using XC3S5000 FPGA board) [23] FAST corner detector-based [31], [52], [53] Processing time of 20ms for registering a pair of multimodal real images (using 2.5 GHZ Pentium Dual-Core processor with 3GB RAM) [52] Processing time of 437ms for stitching a pair of 512*512 images [53] SIFT feature detector-based [54], [11], [55], [10], [32], [56] Processing time of 2s for stitching 10 100*67 images using 2.4 GHZ Pentium Core 2Duo processor with 4GB RAM [11] Processing time of 31m for stitching 10 images using 3.3 GHZ Pentium dual core processor with 6GB RAM [10] SURF detector-based [58], [50], [59], [60] Processing time of 400ms for feature detection and description (using 3GHZ Pentium IV processor) [58] Processing time of 23.6 s for stitching a set of 9 1280*720 images [59] Processing time of 11.865s for matching a pair of 1600*1200 images [60] Contour-based [49], [65], [66], [67] Processing time of 3s for registering a pair of 1024*768 images (using 2.4 GHZ Pentium Dual-Core processor with 4GB RAM) [49] Frequency domain-based [34], [68], [69] Processing time of 3.9s for feature detection (using 3GHZ Pentium IV processor) [34] ...
Article
Image mosaicing, the process of obtaining a wider field-of-view of a scene from a sequence of partial views, has been an attractive research area because of its wide range of applications, including motion detection, resolution enhancement, monitoring global land usage, and medical imaging. A number of image mosaicing algorithms have been proposed over the last two decades. This paper provides an in-depth survey of the existing image mosaicing algorithms by classifying them into several groups. For each group, the fundamental concepts are first explained and then the modifications made to the basic concepts by different researchers are explained. Furthermore, this paper also discusses the advantages and disadvantages of all the mosaicing groups.
... The most classic approach is to detect and extract image point features corresponding to unique landmarks in the scene and then match them across different views. This feature-based mosaicking approach (Milgram, 1975) has been investigated extensively in recent decades, using different wellknown hand-crafted feature approaches such as Harris (Okumura et al., 2013), SIFT (Li et al., 2008), SURF (Rong et al., 2009), ORB (Chaudhari et al., 2017), and FAST (Wang et al., 2012). More recently, datadriven features that are learned by deep neural networks have been utilised for image mosaicking (Bano et al., 2020;Zhang et al., 2019). ...
Article
Full-text available
We propose an endoscopic image mosaicking algorithm that is robust to light conditioning changes, specular reflections, and feature-less scenes. These conditions are especially common in minimally invasive surgery where the light source moves with the camera to dynamically illuminate close range scenes. This makes it difficult for a single image registration method to robustly track camera motion and then generate consistent mosaics of the expanded surgical scene across different and heterogeneous environments. Instead of relying on one specialised feature extractor or image registration method, we propose to fuse different image registration algorithms according to their uncertainties, formulating the problem as affine pose graph optimisation. This allows to combine landmarks, dense intensity registration, and learning-based approaches in a single framework. To demonstrate our application we consider deep learning-based optical flow, hand-crafted features, and intensity-based registration, however, the framework is general and could take as input other sources of motion estimation, including other sensor modalities. We validate the performance of our approach on three datasets with very different characteristics to highlighting its generalisability, demonstrating the advantages of our proposed fusion framework. While each individual registration algorithm eventually fails drastically on certain surgical scenes, the fusion approach flexibly determines which algorithms to use and in which proportion to more robustly obtain consistent mosaics.
... As stated in [63], since pose and acquisition systems vary, the set of possible observations of a scene is immense. Therefore, the challenge of determining the correspondences between observed images becomes complex and complicated. ...
Article
Image mosaicing is one of the most important subjects of research in computer vision at current. Image mocaicing requires the integration of direct techniques and feature based techniques. Direct techniques are found to be very useful for mosaicing large overlapping regions, small translations and rotations while feature based techniques are useful for small overlapping regions. Feature based image mosaicing is a combination of corner detection, corner matching, motion parameters estimation and image stitching. Furthermore, image mosaicing is considered the process of obtaining a wider field-of-view of a scene from a sequence of partial views, which has been an attractive research area because of its wide range of applications, including motion detection, resolution enhancement, monitoring global land usage, and medical imaging. Numerous algorithms for image mosaicing have been proposed over the last two decades. In this paper the authors present a review on different approaches for image mosaicing and the literature over the past few years in the field of image masaicing methodologies. The authors take an overview on the various methods for image mosaicing. This review paper also provides an in depth survey of the existing image mosaicing algorithms by classifying them into several groups. For each group, the fundamental concepts are first clearly explained. Finally this paper also reviews and discusses the strength and weaknesses of all the mosaicing groups.
Article
Full-text available
The repeatability and efficiency of a corner detector determines how likely it is to be useful in a real-world application. The repeatability is important because the same scene viewed from different positions should yield features which correspond to the same real-world 3D locations. The efficiency is important because this determines whether the detector combined with further processing can operate at frame rate. Three advances are described in this paper. First, we present a new heuristic for feature detection and, using machine learning, we derive a feature detector from this which can fully process live PAL video using less than 5 percent of the available processing time. By comparison, most other detectors cannot even operate at frame rate (Harris detector 115 percent, SIFT 195 percent). Second, we generalize the detector, allowing it to be optimized for repeatability, with little loss of efficiency. Third, we carry out a rigorous comparison of corner detectors based on the above repeatability criterion applied to 3D scenes. We show that, despite being principally constructed for speed, on these stringent tests, our heuristic detector significantly outperforms existing feature detectors. Finally, the comparison demonstrates that using machine learning produces significant improvements in repeatability, yielding a detector that is both very fast and of very high quality.
Conference Paper
Full-text available
Image stitching is used to combine several individual images having some overlap into a composite image. The quality of image stitching is measured by the similarity of the stitched image to each of the input images, and by the visibility of the seam between the stitched images. In order to define and get the best possible stitching, we introduce several formal cost functions for the evaluation of the quality of stitching. In these cost functions, the similarity to the input images and the visibility of the seam are defined in the gradient domain, minimizing the disturbing edges along the seam. A good image stitching will optimize these cost functions, overcoming both photometric inconsistencies and geometric misalignments between the stitched images. This approach is demonstrated in the generation of panoramic images and in object blending. Comparisons with existing methods show the benefits of optimizing the measures in the gradient domain.
Article
We describe the construction of accurate panoramic mosaics from multiple images taken with a rotating camera, or alternatively of a planar scene. The novelty of the approach lies in (i) the transfer of photogrammetric bundle adjustment techniques to mosaicing; (ii) a new representation of image line measurements enabling the use of lines in camera self-calibration, including computation of the radial and other non-linear distortion; and (iii) the application of the variable state dimension filter to obtain efficient sequential updates of the mosaic as each image is added.We demonstrate that our method achieves better results than the alternative approach of optimising over pairs of images.
Conference Paper
The problem considered in this paper is the fully asutomaticconstruction of panoramas.Fundamentally, thisproblem requires recognition, as we need to know whichparts of the panorama join up.Previous approaches haveused human input or restrictions on the image sequencefor the matching step.In this work we use object recognitiontechniques based on invariant local features to selectmatchings images, and a probabilistic model for verification.Because of this our method is insensitive to the ordering, orientation, scale and illumination of the images.It is also insensitive to 'noise' images which are not partof the panorama at all, that is, it recognises panoramas.This suggests a useful application for photographers: thesystem takes as input the images on an entire flash card orfilm, recognises images that form part of a panorama, andstitches them with no user input whatsoever.
Article
This paper presents a complete system for constructing panoramic image mosaics from sequences of images. Our mosaic representation associates a transformation matrix with each input image, rather than explicitly projecting all of the images onto a common surface (e.g., a cylinder). In particular, to construct a full view panorama, we introduce a rotational mosaic representation that associates a rotation matrix (and optionally a focal length) with each input image. A patch-based alignment algorithm is developed to quickly align two images given motion models. Techniques for estimating and refining camera focal lengths are also presented. In order to reduce accumulated registration errors, we apply global alignment (block adjustment) to the whole sequence of images, which results in an optimally registered image mosaic. To compensate for small amounts of motion parallax introduced by translations of the camera and other unmodeled distortions, we use a local alignment (deghosting) technique which warps each image based on the results of pairwise local image registrations. By combining both global and local alignment, we significantly improve the quality of our image mosaics, thereby enabling the creation of full view panoramic mosaics with hand-held cameras. We also present an inverse texture mapping algorithm for efficiently extracting environment maps from our panoramic image mosaics. By mapping the mosaic onto an arbitrary texture-mapped polyhedron surrounding the origin, we can explore the virtual environment using standard 3D graphics viewers and hardware without requiring special-purpose players.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Article
After [15], [31], [19], [8], [25], [5], minimum cut/maximum flow algorithms on graphs emerged as an increasingly useful tool for exact or approximate energy minimization in low-level vision. The combinatorial optimization literature provides many min-cut/max-flow algorithms with different polynomial time complexity. Their practical efficiency, however, has to date been studied mainly outside the scope of computer vision. The goal of this paper is to provide an experimental comparison of the efficiency of min-cut/max flow algorithms for applications in vision. We compare the running times of several standard algorithms, as well as a new algorithm that we have recently developed. The algorithms we study include both Goldberg-Tarjan style "push-relabel" methods and algorithms based on Ford-Fulkerson style "augmenting paths." We benchmark these algorithms on a number of typical graphs in the contexts of image restoration, stereo, and segmentation. In many cases, our new algorithm works several times faster than any of the other methods, making near real-time performance possible. An implementation of our max-flow/min-cut algorithm is available upon request for research purposes.
Article
Many tasks in computer vision involve assigning a label (such as disparity) to every pixel. A common constraint is that the labels should vary smoothly almost everywhere while preserving sharp discontinuities that may exist, e.g., at object boundaries. These tasks are naturally stated in terms of energy minimization. The authors consider a wide class of energies with various smoothness constraints. Global minimization of these energy functions is NP-hard even in the simplest discontinuity-preserving case. Therefore, our focus is on efficient approximation algorithms. We present two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves. These moves can simultaneously change the labels of arbitrarily large sets of pixels. In contrast, many standard algorithms (including simulated annealing) use small moves where only one pixel changes its label at a time. Our expansion algorithm finds a labeling within a known factor of the global minimum, while our swap algorithm handles more general energy functions. Both of these algorithms allow important cases of discontinuity preserving energies. We experimentally demonstrate the effectiveness of our approach for image restoration, stereo and motion. On real data with ground truth, we achieve 98 percent accuracy
Article
We describe mosaicing for a sequence of images acquired by a camera rotating about its centre. The novel contributions are in two areas. First, in the automation and estimation of image registration: images (60+) are registered under a full (8 degrees of freedom) homography; the registration is automatic and robust, and a maximum likelihood estimator is used. In particular the registration is consistent so that there are no accumulated errors over a sequence. This means that it is not a problem if the sequence loops back on itself. The second novel area is in enhanced resolution. A region of the mosaic can be viewed at a resolution higher than any of the original frames. It is shown that the degree of resolution enhancement is determined by a measure based on a matrix norm. A maximum likelihood solution is given, which also takes account of the errors in the estimated homographies. An improved MAP estimator is also developed. Results of both MLE and MAP estimation are included for se...