Conference PaperPDF Available

A Probabilistic Framework for Tracking Deformable Soft Tissue in Minimally Invasive Surgery


Abstract and Figures

The use of vision based algorithms in minimally invasive surgery has attracted significant attention in recent years due to its potential in providing in situ 3D tissue deformation recovery for intra-operative surgical guidance and robotic navigation. Thus far, a large number of feature descriptors have been proposed in computer vision but direct application of these techniques to minimally invasive surgery has shown significant problems due to free-form tissue deformation and varying visual appearances of surgical scenes. This paper evaluates the current state-of-the-art feature descriptors in computer vision and outlines their respective performance issues when used for deformation tracking. A novel probabilistic framework for selecting the most discriminative descriptors is presented and a Bayesian fusion method is used to boost the accuracy and temporal persistency of soft-tissue deformation tracking. The performance of the proposed method is evaluated with both simulated data with known ground truth, as well as in vivo video sequences recorded from robotic assisted MIS procedures.
Content may be subject to copyright.
N. Ayache, S. Ourselin, A. Maeder (Eds.): MICCAI 2007, Part II, LNCS 4792, pp. 34–41, 2007.
© Springer-Verlag Berlin Heidelberg 2007
A Probabilistic Framework for Tracking Deformable
Soft Tissue in Minimally Invasive Surgery
Peter Mountney
, Benny Lo
, Surapa Thiemjarus
, Danail Stoyanov
and Guang Zhong-Yang
Department of Computing,
Institute of Biomedical Engineering
Imperial College, London SW7 2BZ, UK
Abstract. The use of vision based algorithms in minimally invasive surgery has
attracted significant attention in recent years due to its potential in providing in
situ 3D tissue deformation recovery for intra-operative surgical guidance and
robotic navigation. Thus far, a large number of feature descriptors have been
proposed in computer vision but direct application of these techniques to
minimally invasive surgery has shown significant problems due to free-form
tissue deformation and varying visual appearances of surgical scenes. This
paper evaluates the current state-of-the-art feature descriptors in computer
vision and outlines their respective performance issues when used for
deformation tracking. A novel probabilistic framework for selecting the most
discriminative descriptors is presented and a Bayesian fusion method is used to
boost the accuracy and temporal persistency of soft-tissue deformation tracking.
The performance of the proposed method is evaluated with both simulated data
with known ground truth, as well as in vivo video sequences recorded from
robotic assisted MIS procedures.
Keywords: feature selection, descriptors, features, Minimally Invasive Surgery.
1 Introduction
Minimally Invasive Surgery (MIS) represents one of the major advances in modern
healthcare. This approach has a number of well known advantages for the patients
including shorter hospitalization, reduced post-surgical trauma and morbidity.
However, MIS procedures also have a number of limitations such as reduced
instrument control, difficult hand-eye coordination and poor operating field
localization. These impose significant demand on the surgeon and require extensive
skills in manual dexterity and 3D visuomotor control. With the recent introduction of
MIS surgical robots, dexterity is enhanced by microprocessor controlled mechanical
wrists, allowing motion scaling for reducing gross hand movements and the
performance of micro-scale tasks that are otherwise not possible. In order to perform
MIS with improved precision and repeatability, intra-operative surgical guidance is
essential for complex surgical tasks. In prostatectomy, for example, 3D visualization
of the surrounding anatomy can result in improved neurovascular bundle preservation
A Probabilistic Framework for Tracking Deformable Soft Tissue in MIS 35
and enhanced continence and potency rates. The effectiveness and clinical benefit of
intra-operative guidance have been well recognized in neuro and orthopedic surgeries.
Its application to cardiothoracic or gastrointestinal surgery, however, remains
problematic as the complexity of tissue deformation imposes a significant challenge.
The major difficulty involved is in the accurate reconstruction of dynamic
deformation of the soft-tissue in vivo so that patient-specific preoperative/intra-
operative data can be registered to the changing surgical field-of-views. This is also
the prerequisite of providing augmented reality or advanced robotic control with
dynamic active constraints and motion stabilization.
Existing imaging modalities, such as intra-operative ultrasound, potentially offer
detailed morphological information of the soft-tissue. However, there are recognised
difficulties in integrating these imaging techniques for complex MIS procedures.
Recent research has shown that it is more practical to rely on optical based techniques
by using the existing laparoscopic camera to avoid further complication of the current
MIS setup. It has been demonstrated that by introducing fiducial markers onto the
exposed tissue surface, it is possible to obtain dynamic characteristics of the tissue in
real-time [1]. Less invasive methods using optical flow and image derived features
have also been attempted to infer tissue deformation [2]. These methods, however,
impose strong geometrical constraints on the underlying tissue surface. They are
generally not able to cater for large tissue deformation as experienced in
cardiothoracic and gastrointestinal procedures. Existing research has shown that the
major difficulty of using vision based techniques for inferring tissue deformation is in
the accurate identification and tracking of surface features. They need to be robust to
tissue deformation, specular highlights, and inter-reflecting lighting conditions.
In computer vision, the issue of reliable feature tracking is a well researched topic
for disparity analysis and depth reconstruction. Existing techniques, however, are
mainly tailored for rigid man-made environments. Thus far, a large number of feature
descriptors have been proposed and many of them are only invariant to perspective
transformation due to camera motion [3]. Direct application of these techniques to
MIS has shown significant problems due to free-form tissue deformation and
contrastingly different visual appearances of changing surgical scenes. The purpose of
this paper is to evaluate existing feature descriptors in computer vision and outline
their respective performance issues when applied to MIS deformation tracking. A
novel probabilistic framework for selecting the most discriminative descriptors is
presented and a Bayesian fusion method is used to boost the accuracy and temporal
persistency of soft-tissue deformation tracking. The performance of the proposed
method is evaluated with both simulated data with known ground truth, as well as in
vivo video sequences recorded from robotic assisted MIS procedures.
2 Methods
2.1 Feature Descriptors and Matching
In computer vision, feature descriptors are successfully used in many applications in
rigid man-made environments for robotic navigation, object recognition, video data
mining and tracking. For tissue deformation tracking, however, the effectiveness of
36 P. Mountney et al.
existing techniques has not been studied in detail. To determine their respective
quality for MIS, we evaluated a total of 21 descriptors, including seven different
descriptors extended to work with color invariant space using techniques outlined in
[4]. Color invariant descriptors are identified by a ‘C’ prefix. Subsequently, a machine
learning method for inferring the most informative descriptors is proposed for
Bayesian fusion. Table 1 provides a summary of all the descriptors used in this study.
For clarity of terminology, we define a feature as a visual cue in an image. A detector
is a low level feature extractor applied to all image pixels (such as edges and corners),
whereas a descriptor provides a high level signature that describes the visual
characteristics around a detected feature.
Table 1. A summary of the feature descriptors evaluated in this study
ID Descriptor
SIFT, CSIFT[4] Scale Invariant Feature Transform, robust to scale and rotation changes.
GLOH, CGLOH Gradient Location Orientation Histogram, SIFT with log polar location grid.
SURF[5], CSURF Speeded Up Robust Features, robust to scale and rotation changes.
Spin, CSpin Spin images, a 2D histogram of pixel intensity measured by the distance from
the centre of the feature.
MOM, CMOM Moment invariants computed up to the 2nd order and 2nd degree.
CC, CCC Cross correlation, a 9×9 uniform sample template of the smoothed feature.
SF, CSF Steerable Filters, Gaussian derivatives are computed up to the 4th order.
DI, CDI Differential Invariants, Gaussian derivatives are computed up to the 4th order.
GIH[6] Geodesic-Intensity Histogram, A 2D surface embedded in 3D space is used to
create a descriptor which is robust to deformation.
CCCI [7] Color Constant Color Indexing, A color based descriptor invariant to
illumination which uses histogram of color angle.
BR-CCCI Sensitivity of CCCI to blur is reduced using the approach in[8].
CBOR [9] Color Based Object Recognition, a similar approach to CCCI using
alternative color angle
BR-CBOR Sensitivity of CBOR to blur is reduced using the approach in[8].
For tissue deformation tracking and surface reconstruction, it is important to
identify which features detected in an image sequence represent material
correspondence. This process is known as matching and depending on the feature
descriptor used, matching can be performed in different ways, e.g., using normalized
cross-correlation over image regions or by measuring the Euclidean or Mahalanobis
distance between descriptors.
2.2 Descriptor Selection and Descriptor Fusion
With the availability of a set of possible descriptors, it is important to establish their
respective discriminative power in representing salient visual features that are suitable
for subsequent feature tracking. To this end, a BFFS algorithm is used. It is a machine
A Probabilistic Framework for Tracking Deformable Soft Tissue in MIS 37
learning approach formulated as a filter algorithm for reducing the complexity of
multiple descriptors while maintaining the overall inferencing accuracy. The
advantage of this method is that the selection of descriptors is purely based on the
data distribution, and thus is unbiased towards a specific model. The criteria for
descriptor selection are based on the expected Area Under the Receiver Operating
Characteristic (ROC) Curve (AUC), and therefore the selected descriptor yield the
best classification performance in terms of the ROC curve or sensitivity and
specificity for an ideal classifier. Under this framework, the expected AUC is
interpreted as a metric which describes the intrinsic discriminability of the descriptors
in classification. The basic principle of the algorithm is described in [13].
There are three major challenges related to the selection of the optimal set of
descriptors: 1) the presence of irrelevant descriptors, 2) the presence of correlated or
redundant descriptors and 3) the presence of descriptor interaction. Thus far, BFFS
has been implemented using both forward and backward search strategies and it has
been observed that the backward elimination suffers less from interaction [10,11,13].
In each step of the backward selection approach, a descriptor
which minimizes the
objective function
D d
will be eliminated from the descriptor set
, resulting in a
new set
. To maximize the performance of the model, the standard BFFS
prefers the descriptor set that maximizes the expected AUC. This is equivalent to
discarding, at each step, the descriptor that contributes to the smallest change in the
expected AUC.
() ()kk
Dd E E d= −−
, 1 1
= ≤≤+
denotes the descriptor set at the beginning of the
iteration k, and
E is a function which returns the expected AUC given by its
parameter. Since the discriminability of the descriptor set before elimination
is constant regardless of
, omitting the term in general does not affect
the ranking of the features.
While irrelevant descriptors are uninformative, redundant descriptors are often
useful despite the fact that their presence may not necessarily increase the expected
AUC. With the evaluation function described in Eq. (1), irrelevant and redundant
descriptors are treated in the same manner since both contribute little to the overall
model performance. In order to discard irrelevant descriptors before removing
redundant descriptors, the following objective function has been proposed:
() ( )
ωω= −− ×
ω is the weighting factor ranging between 0 and 1. This function attempts to
to maximise the discriminability of the selected descriptor set while minimizing the
discriminability of the eliminated descriptors.
Once the relevant descriptors are derived by using BFFS, a Naïve Bayesian
Network (NBN) is used in this study to provide a probabilistic fusing of the selected
descriptors. The result can subsequently be used for feature matching, where two
features are classified as either matching or not matching by fusing the similarity
measurements between descriptors to estimate the posterior probabilities. The NBN
was trained on a subset of data with ground truth.
38 P. Mountney et al.
3 Experiments and Results
To evaluate the proposed framework for feature descriptor selection, two MIS image
sequences with large tissue deformation were used. The first shown in Fig. 1a-e is a
simulated dataset with known ground truth, where tissue deformation is modeled by
sequentially warping a textured 3D mesh using a Gaussian mixture model. The
second sequence shown in Fig. 2a-d is an in vivo sequence from a laparoscopic
cholecystectomy procedure, where the ground truth data is defined manually. Both
sequences involve significant tissue deformation due to instrument-tissue interaction
near the cystic duct. Low level features for these images were detected using the
Difference of Gaussian (DoG) and the Maximally Stable Extremal Regions (MSER)
Descriptor performance is quantitatively evaluated with respect to deformation
using two metrics, sensitivity - the ratio of correctly matched features to the total
number of corresponding features between two images and 1-specificity - the ratio of
incorrectly matched features to the total number of non corresponding features.
Results are presented in the form of ROC curves in Fig. 1 and Fig. 2. A good
descriptor should be able to correctly identify matching features whilst having a
minimum number of mismatches. Individual descriptors use a manually defined
threshold on the Euclidean distance between descriptors to determine matching
features. This threshold is varied to obtain the curves on the graphs. Our fusion
approach has no manually defined threshold and is shown as a point on the graph.
Ground truth data is acquired for quantitative analysis. On the simulated data,
feature detection was performed on the first frame to provide an initial set of feature
positions. These positions were identified on the 3D mesh enabling ground truth to be
generated for subsequent images by projecting the deformed mesh positions back into
the image plane. To acquire ground truth for in vivo data, feature detection was
performed on each frame and corresponding features were matched manually.
The AUC graph shown in Fig. 1 illustrates that by effective fusion of descriptor
responses, the overall descriminability of the system is improved, which allows better
matching of feature landmarks under large tissue deformation. The derived AUC
curve (bottom left) indicates the ID of the top performing descriptors in a descending
order. It is evident that after CGLOH, the addition of further feature descriptors does
not provide additional performance enhancement to the combined feature descriptors.
The ROC graph (bottom right) shows the performance of the fused descriptor when
the top n descriptors are used (represented as
). Ideal descriptors will have high
sensitivity and low 1-specificity. It is evident from these graphs that descriptor fusion
can obtain a higher level of sensitivity than that of individual descriptors for an
acceptable specificity. This enables the fusion technique to match more features and
remain robust. The best performing descriptor is Spin and its sensitivity is 11.96% less
than the fusion method for the specificity achieved with fusion. To obtain the same
level of sensitivity using only the Spin descriptor specificity has to be compromised
resulting in a 19.16% increase and a drop in robustness of feature matching.
A Probabilistic Framework for Tracking Deformable Soft Tissue in MIS 39
Fig. 1. (a-e) Example images showing the simulated data for evaluating the performance of
different feature descriptors. The two graphs represent the AUC and the ROC (sensitivity vs. 1-
specificity) curves of the descriptors used. For clarity, only the six best performing descriptors
are shown for the ROC graph.
Fig. 2. (a-d) Images form an in vivo laparoscopic cholecystectomy procedure showing
instrument tissue interaction. The two graphs illustrate the AUC and the ROC (sensitivity vs. 1-
specificity) curves of the descriptors used. As in Fig. 1, only the six best performing descriptors
are shown for the ROC graph for clarity.
For in vivo validation, a total of 40 matched ground truth features were used.
Detailed analysis results are shown in Fig. 2. It is evident that by descriptor fusion,
the discriminative power of feature description is enhanced. The fused method obtains
a specificity of 0.235 which gives a 30.63% improvement in sensitivity over the best
performing descriptor GIH at the given specificity. This demonstrates the fused
descriptor is capable of matching considerably more features than any individual
descriptor for deforming tissue. Detailed performance analysis has shown that for
40 P. Mountney et al.
MIS images, the best performing individual descriptors are Spin, SIFT, SURF, DIH
and GLOH. Computing the descriptors in color invariant space has no apparent effect
on discriminability but the process is more computationally intensive. By using the
proposed Bayesian fusion method, however, we are able to reliably match
significantly more features than by using individual descriptors.
Fig. 3. 3D deformation tracking and depth reconstruction based on computational stereo by
using the proposed descriptor fusion and SIFT methods for a robotic assisted lung lobectomy
procedure. SIFT was identified by the BFFS as the most discriminative descriptor for this
image sequence. Improved feature persistence is achieved by using the proposed fusion
method, leading to improved 3D deformation recovery.
To further illustrate the practical value of the proposed framework, the fused
descriptor was applied to 3D stereo deformation recovery for an in vivo stereoscopic
sequence from a lung lobectomy procedure performed by using a daVinci® robot.
The representative 3D reconstruction results by using the proposed matching scheme
are shown in Fig. 3. Visual features as detected in the first video frame were matched
across the entire image sequence for temporal deformation recovery. Features that
were successfully tracked both in time and space were used for 3D depth
reconstruction. The overlay of dense and sparse reconstructions with the proposed
method indicates the persistence of features by using the descriptor fusion scheme.
The robustness of the derived features in persistently matching through time is an
important prerequisite of all vision-based 3D tissue deformation techniques. The
results obtained in this study indicate the practical value of the proposed method in
underpinning the development of accurate in vivo 3D deformation reconstruction
A Probabilistic Framework for Tracking Deformable Soft Tissue in MIS 41
4 Discussion and Conclusions
In conclusion, we have presented a method for systematic descriptor selection for
MIS feature tracking and deformation recovery. Experimental results have shown that
the proposed framework performed favorably as compared to the existing techniques
and the method is capable of matching a greater number of features in the presence of
large tissue deformation. To our knowledge, this paper represents the first
comprehensive study of feature descriptors in MIS images. It represents an important
step towards more effective use of visual cues in developing vision based deformation
recovery techniques. This work has also highlighted the importance of adaptively
selecting viable image characteristics that can cater for surgical scene variations.
The authors would like to thank Adam James for acquiring in vivo data and Andrew
Davison for constructive discussions.
1. Ginhoux, R., Gangloff, J.A., Mathelin, M.F.: Beating heart tracking in robotic surgery
using 500 Hz visual servoing, model predictive control and an adaptive observer. In: Proc.
ICRA, pp. 274–279 (2004)
2. Stoyanov, D., Mylonas, G.P., Deligianni, F., Darzi, A., Yang, G.Z.: Soft-tissue motion
tracking and structure estimation for robotic assisted MIS procedures. In: Duncan, J.S.,
Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 139–146. Springer, Heidelberg
3. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE
Transactions on Pattern Analysis and Machine Intelligence 27(10), 1615–1630 (2005)
4. Abdel-Hakim, A.E., Farag, A.A.: CSIFT: A SIFT Descriptor with Color Invariant
Characteristics. In: Proc CVPR, pp. 1978–1983 (2006)
5. Bay, H., Tuytelaars, H., Van Gool, H.: SURF: Speeded Up Robust Features. In: Leonardis,
A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, Springer, Heidelberg (2006)
6. Ling, H., Jacobs, D.W.: Deformation invariant image matching. In: Proc. ICCV, pp. 1466–
1473 (2005)
7. Funt, B.V., Finlayson, G.D.: Color constant color indexing. IEEE Transactions on Pattern
Analysis and Machine Intelligence 17(5), 522–529 (1995)
8. van de Weijer, J., Schmid, C.: Blur Robust and Color Constant Image Description. In:
Proc. ICIP, pp. 993–996 (2006)
9. Gevers, T., Smeulders, A.W.M.: Color Based Object Recognition. Pattern Recognition 32,
453–464 (1999)
10. Koller, D., Sahami, M.: Towards optimal feature selection. In: Proc. ICML, pp. 284–292
11. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97,
273–324 (1997)
12. Hu, X.P.: Feature selection and extraction of visual search strategies with eye tracking
13. Yang, G.Z., Hu, X.P.: Multi-Sensor Fusion. Body Sensor Networks, 239–286 (2006)
14. Thiemjarus, S., Lo, B.P.L., Laerhoven, K.V., Yang, G.Z.: Feature Selection for Wireless
Sensor Networks. In: Proceedings of the 1st International Workshop on Wearable and
Implantable Body Sensor Networks (2004)
... Tracking the tissue becomes more challenging when we take into consideration that the tissue may have a homogeneous texture, and consequently, only a few distinct features are available for tracking the tissue's motion. Previous work has shown that in MIS we can track soft tissue using salient features tailored for MIS applications [25], [26] or increase the number of features being matched correctly, between different images, by fusing different feature detectors [27]. However, the fusion of descriptors is more computationally expensive, which restricts its direct use in real-time applications such as tissue scanning. ...
Full-text available
In Minimally Invasive Surgery (MIS), tissue scanning with imaging probes is required for subsurface visualisation to characterise the state of the tissue. However, scanning of large tissue surfaces in the presence of deformation is a challenging task for the surgeon. Recently, robot-assisted local tissue scanning has been investigated for motion stabilisation of imaging probes to facilitate the capturing of good quality images and reduce the surgeon's cognitive load. Nonetheless, these approaches require the tissue surface to be static or deform with periodic motion. To eliminate these assumptions, we propose a visual servoing framework for autonomous tissue scanning, able to deal with free-form tissue deformation. The 3D structure of the surgical scene is recovered and a feature-based method is proposed to estimate the motion of the tissue in real-time. A desired scanning trajectory is manually defined on a reference frame and continuously updated using projective geometry to follow the tissue motion and control the movement of the robotic arm. The advantage of the proposed method is that it does not require the learning of the tissue motion prior to scanning and can deal with free-form deformation. We deployed this framework on the da Vinci surgical robot using the da Vinci Research Kit (dVRK) for Ultrasound tissue scanning. Since the framework does not rely on information from the Ultrasound data, it can be easily extended to other probe-based imaging modalities.
... Deformable object manipulation is of significant importance to the robotics community, from the surgical robots [28,10] to industrial manipulation [37]. Early works focus on identifying the object configuration with handcrafted visual features. ...
Full-text available
Manipulating deformable objects, such as cloth and ropes, is a long-standing challenge in robotics: their large number of degrees of freedom (DoFs) and complex non-linear dynamics make motion planning extremely difficult. This work aims to learn latent Graph dynamics for DefOrmable Object Manipulation (G-DOOM). To tackle the challenge of many DoFs and complex dynamics, G-DOOM approximates a deformable object as a sparse set of interacting keypoints and learns a graph neural network that captures abstractly the geometry and interaction dynamics of the keypoints. Further, to tackle the perceptual challenge, specifically, object self-occlusion, G-DOOM adds a recurrent neural network to track the keypoints over time and condition their interactions on the history. We then train the resulting recurrent graph dynamics model through contrastive learning in a high-fidelity simulator. For manipulation planning, G-DOOM explicitly reasons about the learned dynamics model through model-predictive control applied at each of the keypoints. We evaluate G-DOOM on a set of challenging cloth and rope manipulation tasks and show that G-DOOM outperforms a state-of-the-art method. Further, although trained entirely on simulation data, G-DOOM transfers directly to a real robot for both cloth and rope manipulation in our experiments.
... In [232] a framework for characterizing and propagation of the uncertainty in the localization of the biopsy points was presented. Mountney et al. [233] performed a review of various feature descriptors applied to deformable tissue tracking and in [234] proposed an Extended Kalman filter (EKF) framework for simultaneous localization and mapping (SLAM) based method for feature tracking in deformable scene, such as in laparoscopic surgery. This EKF framework was then extended in [235] for maintaining a global map of biopsy sites for endoluminal procedures, intra-operatively. ...
This paper attempts to provide the reader a place to begin studying the application of computer vision and machine learning to gastrointestinal (GI) endoscopy. They have been classified into 18 categories. It should be be noted by the reader that this is a review from pre-deep learning era. A lot of deep learning based applications have not been covered in this thesis.
Purpose: We are attempting to develop a navigation system for safe and effective peripancreatic lymphadenectomy in gastric cancer surgery. As a preliminary study, we examined whether or not the peripancreatic dissection line could be learned by a machine learning model (MLM). Methods: Among the 41 patients with gastric cancer who underwent radical gastrectomy between April 2019 and January 2020, we selected 6 in whom the pancreatic contour was relatively easy to trace. The pancreatic contour was annotated by a trainer surgeon in 1242 images captured from the video recordings. The MLM was trained using the annotated images from five of the six patients. The pancreatic contour was then segmented by the trained MLM using images from the remaining patient. The same procedure was repeated for all six combinations. Results: The median maximum intersection over union of each image was 0.708, which was higher than the threshold (0.5). However, the pancreatic contour was misidentified in parts where fatty tissue or thin vessels overlaid the pancreas in some cases. Conclusion: The contour of the pancreas could be traced relatively well using the trained MLM. Further investigations and training of the system are needed to develop a practical navigation system.
Conventional neuro-navigation can be challenged in targeting deep brain structures via transventricular neuroendos-copy due to unresolved geometric error following soft-tissue defor-mation. Current robot-assisted endoscopy techniques are fairly limited, primarily serving to planned trajectories and provide a stable scope holder. We report the implementation of a robot-as-sisted ventriculoscopy (RAV) system for 3D reconstruction, regis-tration, and augmentation of the neuroendoscopic scene with in-traoperative imaging, enabling guidance even in the presence of tissue deformation and providing visualization of structures be-yond the endoscopic field-of-view. Phantom studies were per-formed to quantitatively evaluate image sampling requirements, registration accuracy, and computational runtime for two recon-struction methods and a variety of clinically relevant ventriculo-scope trajectories. A median target registration error of 1.2 mm was achieved with an update rate of 2.34 frames per second, vali-dating the RAV concept and motivating translation to future clin-ical studies.
Artificial intelligence (AI) has made increasing inroads in clinical medicine. In surgery, machine learning–based algorithms are being studied for use as decision aids in risk prediction and even for intraoperative applications, including image recognition and video analysis. While AI has great promise in surgery, these algorithms come with a series of potential pitfalls that cannot be ignored as hospital systems and surgeons consider implementing these technologies. The aim of this review is to discuss the progress, promise, and pitfalls of AI in surgery.
Full-text available
Computer vision is an important cornerstone for the foundation of many modern technologies. The development of modern computer‐aided‐surgery, especially in the context of surgical navigation for minimally invasive surgery, is one example. Surgical navigation provides the necessary spatial information in computer‐aided‐surgery. Amongst the various forms of perception, vision‐based sensing has been proposed as a promising candidate for tracking and localisation application largely due to its ability to provide timely intra‐operative feedback and contactless sensing. The motivation for vision‐based sensing in surgical navigation stems from many factors, including the challenges faced by other forms of navigation systems. A common surgical navigation system performs tracking of surgical tools with external tracking systems, which may suffer from both technical and usability issues. Vision‐based tracking offers a relatively streamlined framework compared to those approaches implemented with external tracking systems. This review study aims to discuss contemporary research and development in vision‐based sensing for surgical navigation. The selected review materials are expected to provide a comprehensive appreciation of state‐of‐the‐art technology and technical issues enabling holistic discussions of the challenges and knowledge gaps in contemporary development. Original views on the significance and development prospect of vision‐based sensing in surgical navigation are presented.
Conference Paper
Full-text available
In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF’s strong performance.
Conference Paper
Full-text available
An important class of color constant image descriptors is based on image derivatives. These derivative-based image descriptors have a major drawback: they are sensitive to changes of image blur. Image blur has various causes such as being out-of-focus, motion of the camera or the object, and inaccurate acquisition settings. Since image blur is a frequently occurring image degradation, it is desirable for object description to be robust to its variations. We propose a set of descriptors which are both robust with respect to blurring effects, and invariant to illuminant color changes. Experiments on retrieval tasks show that the newly proposed object descriptors outperform existing descriptors in the presence of blurring effects
Full-text available
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.
Full-text available
In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context, steerable filters, PCA-SIFT, differential invariants, spin images, SIFT, complex filters, moment invariants, and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.
Conference Paper
Full-text available
In robotically assisted laparoscopic surgery, soft-tissue motion tracking and structure recovery are important for intraoperative surgical guidance, motion compensation and delivering active constraints. In this paper, we present a novel method for feature based motion tracking of deformable soft-tissue surfaces in totally endoscopic coronary artery bypass graft (TECAB) surgery. We combine two feature detectors to recover distinct regions on the epicardial surface for which the sparse 3D surface geometry may be computed using a pre-calibrated stereo laparoscope. The movement of the 3D points is then tracked in the stereo images with stereo-temporal constrains by using an iterative registration algorithm. The practical value of the technique is demonstrated on both a deformable phantom model with tomographically derived surface geometry and in vivo robotic assisted minimally invasive surgery (MIS) image sequences.
In the previous chapters, we have discussed issues concerning hardware, communication and network topologies for the practical deployment of Body Sensor Networks (BSNs). The pursuit of low power miniaturised distributed sensing under a patient’s natural physiological conditions has also imposed significant technical challenges on integrating information from what is often heterogeneous, incomplete and error-prone sensor data. For BSNs, the nature of errors can be attributed to a number of sources; but motion artefacts, inherent limitations and possible malfunctions of the sensors along with communication errors are the main causes of concern. In practice, it is desirable to rely on sensors with redundant or complementary data to maximise the information content and reduce both systematic errors and random artefacts. This, in essence, is the main drive for multi-sensor fusion, which is concerned with the synergistic use of multiple sources of information.
Assuming white illumination and dichromatic reflectance, we propose new color models c 1 c 2 c 3 and l 1 l 2 l 3 invariant to the viewing direction, object geometry and shading. Further, it is shown that l 1 l 2 l 3 is also invariant to highlights. Further, a change in spectral power distribution of the illumination is considered to propose a new photometric color invariant m 1 m 2 m 3 for matte objects. To evaluate photometric color invariant object recognition in practice, experiments have been carried out on a database consisting of 500 images taken from 3-D multicolored man-made objects. On the basis of the reported theory and experimental results, it is shown that high object recognition accuracy is achieved by l 1 l 2 l 3 and hue H followed by c 1 c 2 c 3 and normalized colors rgb under the constraint of white illumination. Finally, it is shown that solely m 1 m 2 m 3 is invariant to a change in illumination color.