ArticlePDF Available

Abstract

Navigation during Minimally Invasive Surgery (MIS) has recognized difficulties due to limited field-of-view, off-axis visualization and loss of direct 3D vision. This can cause visual-spatial disorientation when exploring complex in vivo structures. In this paper, we present an approach to dynamic view expansion which builds a 3D textured model of the MIS environment to facilitate in vivo navigation. With the proposed technique, no prior knowledge of the environment is required and the model is built sequentially while the laparoscope is moved. The method is validated on simulated data with known ground truth. Its potential clinical value is also demonstrated with in vivo experiments.
A preview of the PDF is not available
... However, the surgeons suffer from the small field of view because the procedures are performed in a narrow space with elongated tools and without direct 3D vision. Hence, MIS poses more difficulties to surgeons than open surgeries [1]. To overcome this challenge, stereoscopes are integrated into the operational imaging system to provide 3D stereo instead of single 2D images. ...
... Based on the probability within the window, a MAP-based PDF estimation is proposed to quantify the pixel-wise variance. Finally, an extensive amount of in-vivo/ex-vivo real-world experiments with reference depth 1 and several ablation studies are conducted to validate the performance of the proposed method. ...
Preprint
Full-text available
In stereoscope-based Minimally Invasive Surgeries (MIS), dense stereo matching plays an indispensable role in 3D shape recovery, AR, VR, and navigation tasks. Although numerous Deep Neural Network (DNN) approaches are proposed, the conventional prior-free approaches are still popular in the industry because of the lack of open-source annotated data set and the limitation of the task-specific pre-trained DNNs. Among the prior-free stereo matching algorithms, there is no successful real-time algorithm in none GPU environment for MIS. This paper proposes the first CPU-level real-time prior-free stereo matching algorithm for general MIS tasks. We achieve an average 17 Hz on 640*480 images with a single-core CPU (i5-9400) for surgical images. Meanwhile, it achieves slightly better accuracy than the popular ELAS. The patch-based fast disparity searching algorithm is adopted for the rectified stereo images. A coarse-to-fine Bayesian probability and a spatial Gaussian mixed model were proposed to evaluate the patch probability at different scales. An optional probability density function estimation algorithm was adopted to quantify the prediction variance. Extensive experiments demonstrated the proposed method's capability to handle ambiguities introduced by the textureless surfaces and the photometric inconsistency from the non-Lambertian reflectance and dark illumination. The estimated probability managed to balance the confidences of the patches for stereo images at different scales. It has similar or higher accuracy and fewer outliers than the baseline ELAS in MIS, while it is 4-5 times faster. The code and the synthetic data sets are available at https://github.com/JingweiSong/BDIS-v2.
... Foreign scholars have carried out certain studies on stereoscopic vision. In 2009, Mountney et al. [27] integrated stereoscopic vision into SLAM architecture to expand the endoscopic vision. Later, Stoyanov et al. [1] sparsely reconstructed the heart surface through Shi-Tomasi corner matching and obtained semi-dense three-dimensional reconstruction through structural propagation. ...
Article
With medical endoscopic equipment development, minimally invasive surgery (MIS) has gradually become an essential technical means in daily medical practice. In recent years, minimally invasive surgery has been widely used because of its small incision and quick recovery. However, at the same time, minimally invasive surgery has put forward higher requirements for the operator. A 3D reconstruction framework combined with stereo vision and Shape from Shading (SFS) was proposed to improve endoscopic imaging accuracy and reduce the difficulty of minimally invasive surgery. This paper constructs a joint objective function based on the improved SFS and the classical stereo matching method. The optimization algorithm of the depth map under the joint objective function is given. Finally, the experimental verification of the joint reconstruction algorithm is carried out. The joint reconstruction framework’s effectiveness is verified by qualitative and quantitative comparison and analysis based on the silica-gel-heart model and real-heart image datasets. The experimental results show that the joint reconstruction framework can restore the heart surface shape as a whole and retain the local details. Compared with the classical stereo vision and SFS methods, the proposed joint reconstruction method has better robustness and reconstruction accuracy.
... The authors of [7,8] used simultaneous localization and mapping to provide an expanded surgical view. They built a 3D model of the surgical site based on the laparoscope navigation. ...
Article
Full-text available
Minimally invasive surgery is widely used because of its tremendous benefits to the patient. However, there are some challenges that surgeons face in this type of surgery, the most important of which is the narrow field of view. Therefore, we propose an approach to expand the field of view for minimally invasive surgery to enhance surgeons’ experience. It combines multiple views in real-time to produce a dynamic expanded view. The proposed approach extends the monocular Oriented features from an accelerated segment test and Rotated Binary robust independent elementary features—Simultaneous Localization And Mapping (ORB-SLAM) to work with a multi-camera setup. The ORB-SLAM’s three parallel threads, namely tracking, mapping and loop closing, are performed for each camera and new threads are added to calculate the relative cameras’ pose and to construct the expanded view. A new algorithm for estimating the optimal inter-camera correspondence matrix from a set of corresponding 3D map points is presented. This optimal transformation is then used to produce the final view. The proposed approach was evaluated using both human models and in vivo data. The evaluation results of the proposed correspondence matrix estimation algorithm prove its ability to reduce the error and to produce an accurate transformation. The results also show that when other approaches fail, the proposed approach can produce an expanded view. In this work, a real-time dynamic field-of-view expansion approach that can work in all situations regardless of images’ overlap is proposed. It outperforms the previous approaches and can also work at 21 fps.
... For an active or frequently moving laparoscopic camera, structure-frommotion algorithms are suitable for generating 3D surface models. Methods which rely only on one camera and its motion must incorporate dynamic view expansion (DVE) to construct and expand dense 3D models due to the limited field-of-view (FOV) of laparoscopic cameras [143,222]. While promising, this approach highly constrains scene motion to ensure accurate frame matching, which is not amenable to the highly dynamic nature of real world, online surgical scenes. ...
... At the start, the mapping and localization were dealt independently by researchers; later, it was accepted that these two are acutely dependent, which means that, for precise localization in an environment, an accurate map is required; alternatively, for precise mapping, it is crucial to be appropriately localized. This behavior of SLAM has made it suitable for state estimation problems in different domains, ranging from virtual and augmented reality [1] to autonomous ground vehicles [2], medical appliances [3], and flying robots [4]. This field has also got a certain level of maturity in many commercial products due to different 1545-5955 © 2020 IEEE. ...
Article
Estimation of the motion of an agent and its environment concurrently is done by simultaneous localization and mapping (SLAM). In the recent past, SLAM has made rapid and exciting progress and is used in different fields such as unmanned aerial vehicle (UAV), medical surgeries, and endoscopic procedures. The aim of this article is to devise a more accurate physiotherapy exercise monitoring device on the basis of analysis from eight different SLAM algorithms with criteria including power and memory consumption, CPU heat, and CPU utilization. This article provides a comprehensive evaluation on an embedded platform and is first of its kind, especially providing that SLAM systems ego-motion estimation has never been done so explicitly before. Based on the results of the prior analysis, we proposed a stereo visual-inertial tracking (S-VIT) for lower limb tracking in physiotherapy applications. Our proposed algorithm has significantly improved results compared with the state-of-the-art algorithms. Data sets of various physiotherapy rehabilitation exercises for leg are also collected for detailed validations where the ground truth is acquired with a state-of-the-art motion tracking system, Vicon.
... Structure-from-motion (SfM) [27] or simultaneously localization and mapping (SLAM) [28] [29] [30] methods are able to 3 align video frames at different time steps and generate a much larger synthetic field of view, which have been employed for 3D reconstruction of tissues. For example, Mountney et al [31] proposed to expand the field of view based on SLAM. Most SfM and SLAM methods only reconstruct sparse feature points, which poorly describe the surgical scene. ...
Preprint
We propose an approach to reconstruct dense three-dimensional (3D) model of tissue surface from stereo optical videos in real-time, the basic idea of which is to first extract 3D information from video frames by using stereo matching, and then to mosaic the reconstructed 3D models. To handle the common low texture regions on tissue surfaces, we propose effective post-processing steps for the local stereo matching method to enlarge the radius of constraint, which include outliers removal, hole filling and smoothing. Since the tissue models obtained by stereo matching are limited to the field of view of the imaging modality, we propose a model mosaicking method by using a novel feature-based simultaneously localization and mapping (SLAM) method to align the models. Low texture regions and the varying illumination condition may lead to a large percentage of feature matching outliers. To solve this problem, we propose several algorithms to improve the robustness of SLAM, which mainly include (1) a histogram voting-based method to roughly select possible inliers from the feature matching results, (2) a novel 1-point RANSAC-based P$n$P algorithm called as DynamicR1PP$n$P to track the camera motion and (3) a GPU-based iterative closest points (ICP) and bundle adjustment (BA) method to refine the camera motion estimation results. Experimental results on ex- and in vivo data showed that the reconstructed 3D models have high resolution texture with an accuracy error of less than 2 mm. Most algorithms are highly parallelized for GPU computation, and the average runtime for processing one key frame is 76.3 ms on stereo images with 960x540 resolution.
... For an active or frequently moving laparoscopic camera, structure-from-motion algorithms are suitable for generating 3D surface models. Methods which rely only on one camera and its motion must incorporate dynamic view expansion (DVE) to construct and expand dense 3D models due to the limited field-of-view (FOV) of laparoscopic cameras [7], [8]. While promising, this approach highly constrains scene motion to ensure accurate frame matching, which is not amenable to the highly dynamic nature of real world, online surgical scenes. ...
... A SLAM algorithm consists of exploring the unknown environment to build or updating its map and to determine the position of the robot on the map based on a series of actions and observations [3]. There are many applications of SLAM solution, such as in autonomous vehicles [4,5], minimal invasive surgery [6,7] and harvesting [8,9]. SLAM is a complex problem because of the strict requirements on mobile robots, especially related to robustness, computational efficiency and algorithm accuracy [10]. ...
Article
Full-text available
The purpose of this paper is to show an approach in 2D localization and real-time mapping for robot applications that combine the Particle Filter algorithm, Extended Kalman Filter (EKF), and Iterative Closest Point (ICP). The closing loop method is added and shows satisfactory 2D mapping and localization results. We tested our approach to large floor buildings. For testing, we used a two-wheeled differential drive robot equipped with an optical encoder, laser scanner and gyroscope. Test results show that an accurate map of large high-rise buildings can be produced. Real-time mapping can reach a resolution of 5 cm. Automatic localization of cellular robots in unknown environments is one of the most fundamental problems in robot navigation. This is a complex problem due to the stringent requirements on cellular robots, especially those relations to accuracy, durability, and computational efficiency. The conclusions from this study can help in developing real-time 2D mapping for robot applications that process 2D cloud points directly.
Article
Full-text available
Objective: 3D reconstruction of the shape and texture of hollow organs captured by endoscopy is important for the diagnosis and surveillance of early and recurrent cancers. Better evaluation of 3D reconstruction pipelines developed for such applications requires easy access to extensive datasets and associated ground truths, cost-efficient and scalable simulations of a range of possible clinical scenarios, and more reliable and insightful metrics to assess performance. Methods: We present a computer-aided simulation platform for cost-effective synthesis of monocular endoscope videos and corresponding ground truths that mimic a range of potential settings and situations one might encounter during acquisition of clinical endoscopy videos. Using cystoscopy of the bladder as model case, we generated an extensive dataset comprising several synthesized videos of a bladder phantom. We then introduce a novel evaluation procedure to reliably assess an individual 3D reconstruction pipeline or to compare different pipelines. Results: To illustrate the use of the proposed platform and evaluation procedure, we use the aforementioned dataset and ground truths to evaluate a proprietary 3D reconstruction pipeline (CYSTO3D) for bladder cystoscopy videos and compared it with a general-purpose 3D reconstruction pipeline (COLMAP). The evaluation results provide insight into the suggested clinical acquisition protocol and several potential areas for refinement of the pipeline to improve future performance. Conclusion: Our work proposes an endoscope video synthesis and reconstruction evaluation toolset and presents experimental results that illustrate usage of the toolset to efficiently assess performance and reveal possible problems of any given 3D reconstruction pipeline, to compare different pipelines, and to provide technically or clinically actionable insights.
Article
The ability to extend the field of view of laparoscopy images can help the surgeons to obtain a better understanding of the anatomical context. However, due to tissue deformation, complex camera motion and significant three-dimensional (3D) anatomical surface, image pixels may have non-rigid deformation and traditional mosaicking methods cannot work robustly for laparoscopy images in real-time. To solve this problem, a novel two-dimensional (2D) non-rigid simultaneous localization and mapping (SLAM) system is proposed in this paper, which is able to compensate for the deformation of pixels and perform image mosaicking in real-time. The key algorithm of this 2D non-rigid SLAM system is the expectation maximization and dual quaternion (EMDQ) algorithm, which can generate smooth and dense deformation field from sparse and noisy image feature matches in real-time. An uncertainty-based loop closing method has been proposed to reduce the accumulative errors. To achieve real-time performance, both CPU and GPU parallel computation technologies are used for dense mosaicking of all pixels. Experimental results on in vivo and synthetic data demonstrate the feasibility and accuracy of our non-rigid mosaicking method.
Article
Full-text available
We present an algorithm capable of making in real time image mosaics with enlarged field-of-view from the endoscopic video data stream. The method is applied for the first time to neuroendoscopic color video. It is shown that radial lens distorsion correction leads to improved registration of the mosaic frames.
Article
Full-text available
The medical diagnostic analysis and therapy of urinary bladder cancer based on endoscopes are state of the art in urological medicine. Due to the limited field of view of endoscopes, the physician can examine only a small part of the whole operating field at once. This constraint makes visual control and navigation difficult, especially in hollow organs. A panoramic image, covering a larger field of view, can overcome this difficulty. Directly motivated by a physician we developed an image mosaicing algorithm for endoscopic bladder fluorescence video sequences. In this paper, we present an approach which is capable of stitching single endoscopic video images to a combined panoramic image. Based on SIFT features we estimate a 2-D homography for each image pair, using an affine model and an iterative model-fitting algorithm. We then apply the stitching process and perform a mutual linear interpolation. Our panoramic image results show a correct stitching and lead to a better overview and understanding of the operation field.
Conference Paper
Full-text available
We present an algorithm capable of making in real time im- age mosaics with enlarged fleld-of-view from the endoscopic video data stream. The algorithm is based on the method of Kourogi et al. (1999) which we extend to the case of endoscopic masks. The algorithm auto- matically flnds the optimal a-ne transform between video frames and builds the enlarged fleld-of-view as an intervention-free side task. We ap- ply our algorithm to endoscopic video sequences and compare it to the well-known image-mosaicing algorithm of Szeliski (1994). Our method turns out to be more robust, more than 3 times faster, having at the same time a 4 times smaller average motion estimation error: 0.19 pixel instead of 0.72 pixel between successive frames.
Conference Paper
Full-text available
This paper introduces a stereoscopic fibroscope imaging system for minimally invasive surgery (MIS) and examines the feasibility of utilizing images transmitted from the distal fibroscope tip to a proximally mounted CCD camera to recover both camera motion and 3D scene information. Fibre image guides facilitate instrument miniaturization and have the advantage of being more easily integrated with articulated robotic instruments. In this paper, twin 10,000 pixel coherent fibre bundles (590mum diameter) have been integrated into a bespoke laparoscopic imaging instrument. Images captured by the system have been used to build a 3D map of the environment and reconstruct the laparoscope's 3D pose and motion using a SLAM algorithm. Detailed phantom validation of the system demonstrates its practical value and potential for flexible MIS instrument integration due to the small footprint and flexible nature of the fibre image guides.
Conference Paper
Full-text available
With the advancement of minimally invasive techniques for surgical and diagnostic procedures, there is a growing need for the development of methods for improved visualization of internal body structures. Video mosaicking is one method for doing this. This approach provides a broader field of view of the scene by stitching together images in a video sequence. Of particular importance is the need for online processing to provide real-time feedback and visualization for image-guided surgery and diagnosis. We propose a method for online video mosaicking applied to endoscopic imagery, with examples in microscopic retinal imaging and catadioptric endometrial imaging.
Conference Paper
Full-text available
Minimally Invasive Surgery (MIS) has recognized benefits of reduced patient trauma and recovery time. In practice, MIS procedures present a number of challenges due to the loss of 3D vision and the narrow field-of-view provided by the camera. The restricted vision can make navigation and localization within the human body a challenging task. This paper presents a robust technique for building a repeatable long term 3D map of the scene whilst recovering the camera movement based on Simultaneous Localization and Mapping (SLAM). A sequential vision only approach is adopted which provides 6 DOF camera movement that exploits the available textured surfaces and reduces reliance on strong planar structures required for range finders. The method has been validated with a simulated data set using real MIS textures, as well as in vivo MIS video sequences. The results indicate the strength of the proposed algorithm under the complex reflectance properties of the scene, and the potential for real-time application for integrating with the existing MIS hardware.
Conference Paper
Full-text available
Laser photocoagulation is a proven procedure to treat various pathologies of the retina. Challenges such as motion compensation, correct energy dosage, and avoiding incidental damage are responsible for the still low success rate. They can be overcome with improved instrumentation, such as a fully automatic laser photocoagulation system. In this paper, we present a core image processing element of such a system, namely a novel approach for retina mosaicing. Our method relies on recent developments in region detection and feature description to automatically fuse retina images. In contrast to the state-of-the-art the proposed approach works even for retina images with no discernable vascularity. Moreover, an efficient scheme to determine the blending masks of arbitrarily overlapping images for multi-band blending is presented.
Article
Full-text available
We present a real-time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a previously unknown scene. Our system, which we dub MonoSLAM, is the first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to Structure from Motion approaches. The core of the approach is the online creation of a sparse but persistent map of natural landmarks within a probabilistic framework. Our key novel contributions include an active approach to mapping and measurement, the use of a general motion model for smooth camera movement, and solutions for monocular feature initialization and feature orientation estimation. Together, these add up to an extremely efficient and robust algorithm which runs at 30 Hz with standard PC and camera hardware. This work extends the range of robotic systems in which SLAM can be usefully applied, but also opens up new areas. We present applications of MonoSLAM to real-time 3D localization and mapping for a high-performance full-size humanoid robot and live augmented reality with a hand-held camera.
Article
Using generic interpolation machinery based on solving Poisson equations, a variety of novel tools are introduced for seamless editing of image regions. The first set of tools permits the seamless importation of both opaque and transparent source image regions into a destination region. The second set is based on similar mathematical ideas and allows the user to modify the appearance of the image seamlessly, within a selected region. These changes can be arranged to affect the texture, the illumination, and the color of objects lying in the region, or to make tileable a rectangular selection.
Conference Paper
Natural Orifice Transluminal Endoscopic Surgery (NOTES) is an emerging surgical technique with increasing global interest. It has recently transcended the boundaries of clinical experiments towards initial clinical evaluation. Although profound benefits to the patient have been demonstrated, NOTES requires highly skilled endoscopists for it to be performed safely and successfully. This predominantly reflects the skill required to navigate a flexible endoscope through a spatially complex environment. This paper presents a method to extend the visual field of the surgeon without compromising on the safety of the patient. The proposed dynamic view expansion uses a novel parallax correction scheme to provide enhanced visual cues that aid the navigation and orientation during NOTES surgery in periphery, while leaving the focal view undisturbed. The method was validated using a natural orifice simulated surgical environment and demonstrated on in vivo porcine data.