Conference PaperPDF Available

Soft Tissue Tracking for Minimally Invasive Surgery: Learning Local Deformation Online

Authors:

Abstract and Figures

Accurate estimation and tracking of dynamic tissue deformation is important to motion compensation, intra-operative surgical guidance and navigation in minimally invasive surgery. Current approaches to tissue deformation tracking are generally based on machine vision techniques for natural scenes which are not well suited to MIS because tissue deformation cannot be easily modeled by using ad hoc representations. Such techniques do not deal well with inter-reflection changes and may be susceptible to instrument occlusion. The purpose of this paper is to present an online learning based feature tracking method suitable for in vivo applications. It makes no assumptions about the type of image transformations and visual characteristics, and is updated continuously as the tracking progresses. The performance of the algorithm is compared with existing tracking algorithms and validated on simulated, as well as in vivo cardiovascular and abdominal MIS data. The strength of the algorithm in dealing with drift and occlusion is validated and the practical value of the method is demonstrated by decoupling cardiac and respiratory motion in robotic assisted surgery.
Content may be subject to copyright.
D. Metaxas et al. (Eds.): MICCAI 2008, Part II, LNCS 5242, pp. 364–372, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Soft Tissue Tracking for Minimally Invasive Surgery:
Learning Local Deformation Online
Peter Mountney1,2 and Guang-Zhong Yang1,2
1 Department of Computing, 2 Institute of Biomedical Engineering
Imperial College, London SW7 2BZ, UK
Abstract. Accurate estimation and tracking of dynamic tissue deformation is
important to motion compensation, intra-operative surgical guidance and navi-
gation in minimally invasive surgery. Current approaches to tissue deformation
tracking are generally based on machine vision techniques for natural scenes
which are not well suited to MIS because tissue deformation cannot be easily
modeled by using ad hoc representations. Such techniques do not deal well with
inter-reflection changes and may be susceptible to instrument occlusion. The
purpose of this paper is to present an online learning based feature tracking
method suitable for in vivo applications. It makes no assumptions about the type
of image transformations and visual characteristics, and is updated continuously
as the tracking progresses. The performance of the algorithm is compared with
existing tracking algorithms and validated on simulated, as well as in vivo car-
diovascular and abdominal MIS data. The strength of the algorithm in dealing
with drift and occlusion is validated and the practical value of the method is
demonstrated by decoupling cardiac and respiratory motion in robotic assisted
surgery.
Keywords: Feature, tracking, matching, tissue deformation.
1 Introduction
With the maturity of Minimally Invasive Surgery (MIS), the clinical uptake is steadily
increasing because of its recognized benefits to patients and healthcare providers,
particularly in terms of reduced patient trauma and hospital recovery times. The per-
formance of MIS, however, is complicated by a number of visuomotor and ergonomic
challenges including misaligned visuomotor axes, the fulcrum effect during instru-
ment manipulation, limited field of view, and loss of 3D vision and tactile feedback.
The introduction of robotic assisted MIS has provided surgeons with improved visu-
alization and enhanced dexterity. It also offers the possibility of integrating patient-
specific preoperative/intraoperative data to allow imaged guided surgical navigation
and intervention. For these techniques to be successful, particularly for cardiovascular
and gastrointestinal surgeries where large scale tissue deformation is common, an
important prerequisite is the accurate estimation and tracking of dynamic tissue
deformation.
Soft Tissue Tracking for MIS: Learning Local Deformation Online 365
Recent work has shown that it is possible to perform 3D tissue tracking by using
both monocular and stereo depth cues [1-3]. Researchers have also relied on gaze
vergence through binocular eye tracking to facilitate real-time 3D tissue deformation
recovery [4]. Two major issues identified in current vision based techniques include
tracked feature density and persistency. The former dictates the level of detail of the
deforming surface that can be reconstructed, whereas the latter is affected by mutual
and self-occlusion of the tissue and instrument during the surgical procedure. Feature
persistency is also heavily influenced by changes in operating field-of-view and light-
ing conditions. Current research has made significant inroads into improving density
of the tracked features by using multiple depth cues to cater for the complex tissue
geometry in vivo. However, they generally do not explicitly model nonlinear tissue
deformation and may be susceptible to drift and occlusion.
The purpose of this paper is to present an online learning based feature tracking
method suitable for in vivo applications. Feature tracking is formalized as a classifi-
cation problem where we propose solutions to training the classifier with unlabeled
data and adaptive updates during the tracking process. The approach makes no as-
sumptions about the type of image transformations or visual characteristics enabling it
to deal with nonlinear tissue deformation. It is demonstrated in this paper that with the
proposed technique for general MIS scenes, as little as just 0.5 seconds may be re-
quired to start building up a complete feature representation. The performance of the
algorithm is compared with other conventional trackers, and validated on simulated as
well as in vivo cardiovascular and abdominal MIS data. The strength of the algorithm
in dealing with drift and occlusion as well as tissue deformation is demonstrated.
2 Methods
2.1 Learning Based Feature Tracking
The effectiveness of a feature tracking algorithm is largely determined by how the
appearance of the feature is represented. This consists of two elements, firstly which
information to encode (e.g. color, edges, intensity, texture, gradient) and secondly
how to represent the encoded information (e.g. the use of probability density histo-
gram, histogram of gradients, templates, points, contours, active appearance models).
The choice of what information to encode and how to represent the encoded informa-
tion is context specific. For example, in [5] mean-shift is used to track deformable
objects by making the assumption that color is the most salient information to encode.
Approaches such as SIFT [6] represent scaled information as gradient oriented
histograms.
It should be noted that these methods make ad hoc assumptions about which in-
formation will be most discriminative and how to encode it. They work well if the
underlying assumptions hold. In MIS, changes in lighting and specular highlights can
significantly alter the 2D appearance of the tissue. These environmental factors are
exacerbated by 3D nonlinear tissue deformation. This makes ad hoc modeling of
tissue appearance for consistent tracking difficult. Alternatively, it is possible to learn
which information is most discriminative and how best to encode it. This concept has
been adopted in hand writing recognition [8], object detection [9] and corner detection
366 P. Mountney and G.-Z. Yang
[10] by learning offline the most discriminative representation of the data. Offline
learning requires prior knowledge of the data which is not available during tissue
tracking. In this paper, an online learning scheme is developed to extract discrimina-
tive information adaptively as the tracking process progresses such that it can cater for
tissue deformation and environment changes while remaining robust to drift and oc-
clusion. The main steps of the algorithm are outlined in Fig. 1 and detailed in the
following sections.
Fig. 1. A diagrammatic overview of the proposed learning based online tracking system
The algorithm is comprised of five main steps; 1-2) feature tracking is initially per-
formed using a Lucas Kanade (LK) [7] algorithm and then with our online approach,
3) building the online tracker and learn a representation for a feature, 4) adapted and
updated the feature representation and 5) tracker evaluation.
Building the Online Tracker
The feature tracking problem can be formalized as a classification problem where the
goal is to classify the feature in a new image as a true match and classify all other
features as false matches. In order to train a classifier we require a set of training data
with true and false labels. Such training data can be obtained synthetically if the ap-
pearance of the feature can be well modeled or through manual labeling for offline
learning. Neither of these approaches is suitable for tracking tissue therefore the clas-
sifier will need to be trained from unlabeled data.
To solve this problem, we propose to extract the training data online while the fea-
tures are tracked. In this paper, features are extracted using Difference Of Gaussian
[6] and Shi and Tomasi [11] detectors and initially tracked using a LK tracker. The
key to our proposed method is to learn what information will be most appropriate for
tracking, therefore the training data will consist of a number of image patches ex-
tracted directly from the image. The LK tracker enables the generation of a labeled set
of true matches for the classifier. The set of false matches are then taken from the
local area around the tracked feature. The labeled data provides the information which
enables a set of patches S to be partitioned into two sets t
Sand f
S representing
‘true’ and ‘false’ matches. An ID3 [12] decision tree is then used to iteratively parti-
tion S. For each patch in set S, a test compares two pixel values to identify if the
first pixel is greater, similar or less in value than the second pixel. The entropy of
Soft Tissue Tracking for MIS: Learning Local Deformation Online 367
each subset is measured to identify the test that provides the maximum information
and therefore the best partition, i.e.,
22 2
( ) log log log
ttf f
HS S S S S S S=− − (1)
Exhaustive search using Eq. (1) is computationally prohibitive, this is solved instead
by computing the log likelihood ratios [13] between distributions of t
Sand f
S and
applying a variance ratio to find the optimum solution. At each pixel location, we
create histograms txy(, ) and fxy(, ) and calculate the log likelihood
txy
Lxy fxy
max( ( , ), )
(, ) log
max( ( , ), )
δ
δ
= (2)
where δis set to be 0.001 to avoid dividing by zero. The variance ratio of the log
likelihood is used to quantify the distance of the two classes, i.e.,
var( ;( )/2)
(;,) [var(;) var(;)]
Lt f
VLxy Lt Lf
+
=+ (3)
where
()
2
2
var ;( ) () () () ()
ii
La aiLi aiLi
⎛⎞
=−
⎝⎠
∑∑
(4)
given the discrete probability density function i
a. This provides a measure of intra-
and inter-class variance, and is capable of handling multimodal distributions unlike
linear discriminate analysis.
Tracking Features in New Frames
A search area is defined based on the position of the feature in the last frame and at
each point pxy(, )in the search region, the patch around that point is classified using
the decision tree. This classification can be performed quickly as the tests are simple
and the false matches can be readily identified with only a few tests. This classifica-
tion step results in a number of candidate points in the search region which represent
the potential location of the feature. The feature is localized by examining the prob-
ability distribution 1
()
jtjt
PN S S
= at the tree node
j
Nto determine if it is a cor-
rect match, where tj
S is the number of true matches classified by node
j
and t
Sis
the number of true matches classified by the entire tree. The best candidate point ,xy
p
in the search area is then selected using the node distribution and a Gaussian kernel
centered on the last known position.
Evaluating and Improving the Online Tracker Performance
Building the decision tree can be computationally intensive if the data set is large.
However, testing the performance of our classifier is relatively fast. This is exploited
in the proposed algorithm to adaptively build classification trees that best fit the ob-
served data. The tree is built initially with a small set of data, this is then followed by
evaluating its classification performance and further improving the classifier. The
368 P. Mountney and G.-Z. Yang
online tracker’s performance is evaluated by measuring the classification accuracy on
the current data set. The metric used here is the false negative rate of the classifier, for
which a high value indicates the classification tree is not suited to the data and its
inherent information needs to be further exploited. False negatives indicate mis-
matches by the tracker where the test and distribution ()
j
PN at node
j
N are not
ideal representations of the data. Instead of rebuilding the entire tree, we can simply
reclassify all the patches at node
j
Nadding the incorrectly classified patch. This has
the effect of shifting the distribution to better represent all the observed data and may
lead to new nodes being added to the tree. The final adaptive step in the update is to
select the most discriminative color space for tracking. This follows the criterion set
out in [13] where 49 color spaces are searched to identify the most discriminative.
This uses the variance ration equations outlined in Eq. 3.
2.2 Extracting Intrinsic Tissue Motion
To demonstrate the practical application of this technique, the cardiac and respiratory
motion of the tissue are extracted by performing Independent Component Analysis
(ICA) of the tracked features. ICA is a statistical technique for separating signals into
additive subcomponents assuming mutual statistical independence. ICA can be for-
mulated to consider the recovered 3D motion (computed using stereo geometry) of
the surface of the tissue to be the latent variables (, , )mxyz= and the components of
intrinsic motion as (, )shr=. It attempts to find the transformation Wsuch
thatsWmn=+ where nis zero mean Gaussian noise. The components of mcan
be written as the weighted sum of the independent components, i.e., kk
mas=,
where k
ais a vector of mixing weights which make up the mixing matrix
1
()
n
Aaa= where 1
WA
=. The source s and the mixing matrix A are estimated
adaptively with cost function T
k
swm= to maximize nongaussianality.
3 Experiments and Results
To evaluate the performance of the proposed online learnt tracker, results from simu-
lated, porcine and in vivo data are compared to those from four conventional tracking
techniques (Lucas Kanade[7] with template update, SIFT[6], and two mean-shift
algorithms [13]).
3.1 Simulated Experiment with Known Ground Truth
For synthetic data, an image from a MIS procedure was taken and textured onto a 3D
mesh, which was then warped with a mixture of Gaussian model to simulate the car-
diac and respiratory induced tissue deformation. The mesh was projected onto a vir-
tual camera for subsequent feature tracking. To better represent the real-life data,
Guassian noise was added to the images. Fig. 2 (b) illustrates the tracking result for
the synthetic data. It is evident that the LK tracker performed relatively well at the
beginning of the experiment, but the performance rapidly declines due to error propa-
gation resulting from tissue deformation. The detect/match approach of SIFT in this
case also performed poorly. The number of points does not decline over time, it
Soft Tissue Tracking for MIS: Learning Local Deformation Online 369
Fig. 2. Relative tracking performance for a synthetic data for the different tracking algorithms
considered. (a) The simulated data by warping an image taken from a MIS procedure with
known ground truth deformation characteristics. (b) Relative performance values for the five
different tracking techniques compared; green – our online learnt tracker, red – SIFT, dark
blue – Lucas Kanade, black – mean-shift 1 and light blue – mean-shift 2.
Fig. 3. Relative tracking performance for in vivo sequences. (a,c,e,g,i) Example frames taken
from in vivo data available at [14], (b,d,f,h,j) the associated quantitative analysis results of the
tracking algorithms. Five trackers are compared; green – our online learnt tracker, red – SIFT,
dark blue – Lucas Kanade, black – mean-shift 1 and light blue – mean-shift 2.
oscillates as the tissue deforms, making it less attractive for continuous in vivo track-
ing. The performance of the two mean-shift algorithm is similar. This is not surprising
as mean-shift only works well on self contained blobs of distinct color, which is
370 P. Mountney and G.-Z. Yang
difficult to hold for in vivo applications. Large movements can also result in the fea-
ture falling outside the trackers basin of attraction which further contributes to the
relatively poor performance achieved. For the proposed tracker with online learning,
the overall performance is maintained, and the derived sensitivity outperforms all of
the alternative techniques compared.
3.2 In Vivo Experiments
The performance of the proposed learning based online tracker was quantitatively
evaluated on five in vivo sequences. The ground truth for the tracked features was
obtained manually at 50 frame intervals. Fig. 3 demonstrates the five sequences used
and the corresponding tracking results as compared to the four conventional tracking
algorithms. Figs. 3 (a-d) show two beating heart sequences where artifacts due to
bleeding, specular reflections and instrument occlusion have introduced significant
problems to the LK, SIFT and mean-shift trackers. Similar to the synthetic experi-
ment, the LK tracker exhibits drift and its performance degrades as the tracking proc-
ess progresses. The SIFT and mean-shift trackers perform worse in the sequence
shown in Fig. 3 (a) than Fig. 3 (b) as deformation in this sequence is more pro-
nounced. The graphs in Fig. 3 (b) and (d) show more features can be tracked using the
learning based method.
The effect of introducing instrument occlusion into the surgical field-of-view is
shown in Figs. 3 (e-j). In Figs. 3 (e,g,i), instrument occlusion was introduced to the
surgical field-of-view. Deformation in these sequences is mainly from respiration. In
Fig 3 (e), only a small number of features are occluded, whereas in Fig 3 (g) the num-
ber is increased. In Fig 3 (i), almost all features are occluded at some point. Tracking
in the last sequence is made more difficult as a suction device is used to remove
blood, thus significantly changing the appearance of features.
Fig. 4. (a) A single feature tracked over time showing drift with LK tracking in blue and the
robustness of our approach in green. (b) Illustrates the problem of occlusion by a tool. Green –
our online learnt tracker, red – SIFT. SIFT tracking is not continuous. (c) The first and (d)
second components recovered using ICA from online tracking for motion compensation.
Soft Tissue Tracking for MIS: Learning Local Deformation Online 371
It is evident that once the feature is lost in the LK and mean-shift trackers, it can no
longer be recovered. Detect and match tracking approaches such as SIFT are naturally
suited to dealing with occlusion, but are not well suited to continuous tracking of
deforming tissue. In contrast, the proposed learning based online tracker holds well
for the experiment performed.
Figs. 4 (a) and (b) illustrate the problems of drift and occlusion in 3D spatio-
temporal plots for the different techniques considered in this study. It demonstrates
how feature representation with online learning can successfully overcome these
problems. In this example, the online tracker was used to track deformation of the
epicardial surface and the resulting ICA motion extraction shown in Fig 4 (c) and (d)
clearly depicts cardiac and respiratory induced deformation.
4 Conclusion
In this paper, we have proposed a novel approach for feature tracking with online
learning. The approach has been validated on simulated, porcine and in vivo data and
compared to four conventional tracking techniques. We have demonstrated that the
technique is robust to drift and capable of recovering from occlusion. The proposed
technique is well suited to dealing with deforming tissue and unknown image trans-
formations. Robust feature tracking is important for a range of applications in robotic
assisted MIS including real-time depth recovery, pre- and intra-operative image regis-
tration, as well as prescribing dynamic active constraints.
References
1. Wengert, C., Bossard, L., Häberling, A., Baur, C., Székely, G., Cattin, P.C.: Endoscopic
Navigation for Minimally Invasive Suturing. In: Ayache, N., Ourselin, S., Maeder, A.
(eds.) MICCAI 2007, Part II. LNCS, vol. 4792, pp. 620–627. Springer, Heidelberg (2007)
2. Ortmaier, T., Groger, M., Boehm, D.H., Falk, V., Hirzinger, G.: Motion Estimation in
Beating Heart Surgery. IEEE Trans. on Biomedical Engineering (52), 1729–1740 (2005)
3. Mountney, P., Lo, B.P.L., Thiemjarus, S., Stoyanov, D., Yang, G.-Z.: A Probabilistic
Framework for Tracking Deformable Soft Tissue in Minimally Invasive Surgery. In: Ay-
ache, N., Ourselin, S., Maeder, A. (eds.) MICCAI 2007, Part II. LNCS, vol. 4792, pp. 34–
41. Springer, Heidelberg (2007)
4. Mylonas, G., Stoyanov, D., Deligianni, F., Darzi, A., Yang, G.-Z.: Gaze-contingent soft
tissue deformation tracking for minimally invasive robotic surgery. In: Duncan, J.S.,
Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 843–850. Springer, Heidelberg
(2005)
5. Comaniciu, D., Ramesh, V., Meer, P.: Kernel-Based Object Tracking. IEEE Trans. on Pat-
tern Analysis and Machine Intelligence (25), 564–577 (2003)
6. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Jour-
nal of Computer Vision (60), 91–110 (2004)
7. Lucas, B.D., Kanade, T.: An Iterative Image Registration Technique with an Application
to Stereo Vision. In: Proc. IJCAI, pp. 674–679 (1981)
8. Amit, Y., Geman, D.: Shape Quantization and Recognition with Randomized Trees. Neu-
ral Computation 9(7), 1545–1588 (1997)
372 P. Mountney and G.-Z. Yang
9. Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In:
Proc CVPR, vol. (2), pp. 775–781 (2005)
10. Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leo-
nardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443.
Springer, Heidelberg (2006)
11. Shi, J., Tomasi, C.: Good Features to Track. In: Proc of CVPR, pp. 593–600 (1994)
12. Quinlan, J.R.: Induction of decision trees. Machine Learning 1 (1986)
13. Collins, R., Liu, Y., Leordeanu, M.: On-Line Selection of Discriminative Tracking Fea-
tures. IEEE Trans Pattern Analysis and Machine Intelligence 10(27), 1631–1643 (2005)
14. Imperial College Visual Information Processing In: Vivo database,
http://vip.doc.ic.ac.uk/
... Despite recent efforts in adapted well-known feature detecting and tracking algorithms such as Kanade-Lucas-Tomasi (KLT) Tracker, most of the proposed prototypes [6], [20], [14] [25], [7] fail to provide a reliable and accurate long-term tracking under a surgical environment [15]. This is mostly due to the challenges posed by endoscopic imagery such as dynamic nature of the surgical environment, occlusions, sudden tissue deformations, specular highlights, image clutter caused by blood or smoke, and large texture-less areas [21]. ...
... As a result, off-the-shelf computer vision approaches simply fail when applied to the endoscopic images and usually require major revisions in order to make them applicable to such scenarios. Different approaches taken by scientist in order to address poor performance of of KLT includes exploiting Extended Kalman Filter(EKF) to utilize temporal information [7], on-line appearance learning and treating tracking as a classification problem [20], Thin Plate Spline (TPS) to track deforming surface [14], fusing intensity from stereo pair images for intensity matching [29], hierarchical feature matching [28]. ...
Preprint
Full-text available
Feature tracking is the building block of many applications such as visual odometry, augmented reality, and target tracking. Unfortunately, the state-of-the-art vision-based tracking algorithms fail in surgical images due to the challenges imposed by the nature of such environments. In this paper, we proposed a novel and unified deep learning-based approach that can learn how to track features reliably as well as learn how to detect such reliable features for tracking purposes. The proposed network dubbed as Deep-PT, consists of a tracker network which is a convolutional neural network simulating cross-correlation in terms of deep learning and two fully connected networks that operate on the output of intermediate layers of the tracker to detect features and predict trackability of the detected points. The ability to detect features based on the capabilities of the tracker distinguishes the proposed method from previous algorithms used in this area and improves the robustness of the algorithms against dynamics of the scene. The network is trained using multiple datasets due to the lack of specialized dataset for feature tracking datasets and extensive comparisons are conducted to compare the accuracy of Deep-PT against recent pixel tracking algorithms. As the experiments suggest, the proposed deep architecture deliberately learns what to track and how to track and outperforms the state-of-the-art methods.
... With the advancement of machine learning methods, contextual information will play an increasingly important role in future visual tracking studies. Under this circumstance, only (Mountney, 2008) proposed a new technique for accurate tracking of deformed tissue organs based on context-specific feature descriptors that can adapt to the tissue environment and identify the most discriminative tracking information. A tracking method that uses contextual information to merge general constraints on object shape and motion typically performs better than tracking methods that do not utilize this information. ...
Article
Full-text available
The cross fusion of rehabilitation medicine and computer graphics is becoming a research hotspot. Due to the problems of low definition and unobvious features of the initial video data of medical images, the initial data is filtered and enhanced by adding image preprocessing, including image rotation and contrast enhancement, in order to improve the performance of the tracking algorithm. For the moving barium meal, the discrete point tracking and improved inter frame difference method are proposed; for the position calibration of tissues and organs, the Kernel Correlation Filter (KCF) and Discriminative Scale Space Tracker (DSST) correlation filtering method and the corresponding multi-target tracking method are proposed, and the experimental results show that the tracking effect is better. The two algorithms modify each other to further improve the accuracy of calibration and tracking barium meal flow and soft tissue organ motion, and optimize the whole swallowing process of moving target tracking model.
... location, scale, orientation, etc.) of the object 1 in the subsequent (target candidate 2 ) frames with the help of prior knowledge on the state of the object of a single (target 3 ) frame. It is widely applicable in computer vision such as surveillance [95,63], traffic transportation (including traffic flow monitoring [184], traffic accident [110,211], pedestrian counting [58,97]), vehicle navigation [128,206], human computer interaction (like hand gesture recognition [61,48,80] and mobile video conferencing [180]), medical imaging [45,117,164], biological research [82,223], video indexing [193,262], etc. ...
Article
Full-text available
Object tracking is a very interesting problem in computer vision. Numerous algorithms have been developed to solve object tracking problems for several decades. Among various techniques, in this article, we review most of the existing traditional supervised machine learning-based moving object tracking approaches before the year 2017. We also discuss the several evaluation measures and various datasets considered in the literature. We hope that this survey helps the readers to acquire valuable knowledge about the literature of traditional supervised learning-based tracking algorithms and to choose the most suitable algorithm for their particular tracking tasks.
... location, scale, orientation, etc.) of the object 1 in the subsequent (target candidate 2 ) frames with the help of prior knowledge on the state of the object of a single (target 3 ) frame. It is widely applicable in computer vision such as surveillance [95,63], traffic transportation (including traffic flow monitoring [184], traffic accident [110,211], pedestrian counting [58,97]), vehicle navigation [128,206], human computer interaction (like hand gesture recognition [61,48,80] and mobile video conferencing [180]), medical imaging [45,117,164], biological research [82,223], video indexing [193,262], etc. ...
Preprint
Full-text available
Object tracking is a very interesting problem in computer vision. Numerous algorithms have been developed to solve object tracking problems for several decades. Among various techniques, in this article, we review most of the existing traditional supervised machine learning-based moving object tracking approaches before the year 2017. We also discuss the several evaluation measures and various datasets considered in the literature. We hope that this survey helps the readers to acquire valuable knowledge about the literature of traditional supervised learning-based tracking algorithms and to choose the most suitable algorithm for their particular tracking tasks.
... Learning strategies have also been applied to soft tissue tracking in MIS. Mountney and Yang [61] introduced an online learning framework that updates the feature tracker over time by selecting correct features using decision tree classification. Ye et al. [62] proposed a detection approach that incorporates a structured support vector machine (SVM) and online random forest for re-targeting a preselected optical biopsy region on soft tissue surfaces of the GI tract. ...
Preprint
Artificial Intelligence (AI) is gradually changing the practice of surgery with the advanced technological development of imaging, navigation and robotic intervention. In this article, the recent successful and influential applications of AI in surgery are reviewed from pre-operative planning and intra-operative guidance to the integration of surgical robots. We end with summarizing the current state, emerging trends and major challenges in the future development of AI in surgery.
... Work on the imaging part of motion compensation involves tracking of features on the heart [17,18] and image stabilization. For the latter, interesting work is presented in Ref. [19], where the image was rectified using the CUDA framework. ...
Chapter
Full-text available
Motion compensation in coronary artery bypass graft surgery refers to the virtual stabilization of the beating heart, along with the mechanical synchronization of the robotic arms with the pulsating heart surface. The stabilized image of the heart is presented to the surgeon to operate on, while the heart motion is compensated by the robot, and the surgeon essentially operates on a virtual still heart. In this chapter, we present an introduction to the concept of motion compensation and a brief history of research efforts. We analyze a unifying framework which naturally binds together the image stabilization, mechanical synchronization, and shared control tasks. This framework serves as a baseline upon which more complicated assistive modes are built, for example, active and haptic assistance. These modes are discussed more thoroughly, and their efficacy is assessed via laboratory experimental trials in a simulation setup, which are presented in detail.
Article
Minimally invasive surgical instrument visual detection and tracking is one of the core algorithms of minimally invasive surgical robots. With the development of machine vision and robotics, related technologies such as virtual reality, three-dimensional reconstruction, path planning, and human-machine collaboration can be applied to surgical operations to assist clinicians or use surgical robots to complete clinical operations. The minimally invasive surgical instrument vision detection and tracking algorithm analyzes the image transmitted by the surgical robot endoscope, extracting the position of the surgical instrument tip in the image, so as to provide the surgical navigation. This technology can greatly improve the accuracy and success rate of surgical operations. The purpose of this paper is to further study the visual detection and tracking technology of minimally invasive surgical instruments, summarize the existing research results, and apply it to the surgical robot project. By reading the literature, the author summarized the theoretical basis and related algorithms of this technology in recent years. Finally, the author compares the accuracy, speed and application scenario of each algorithm, and analyzes the advantages and disadvantages of each algorithm. The papers included in the review were selected through Web of Science, Google Scholar, PubMed and CNKI searches using the keywords: “object detection”, “object tracking”, “surgical tool detection”, “surgical tool tracking”, “surgical instrument detection” and “surgical instrument tracking” limiting results to the year range 1985 - 2021. our study shows that this technology will have a great development prospect in the aspects of accuracy and real-time improvement in the future.
Article
Full-text available
Computer vision is an important cornerstone for the foundation of many modern technologies. The development of modern computer‐aided‐surgery, especially in the context of surgical navigation for minimally invasive surgery, is one example. Surgical navigation provides the necessary spatial information in computer‐aided‐surgery. Amongst the various forms of perception, vision‐based sensing has been proposed as a promising candidate for tracking and localisation application largely due to its ability to provide timely intra‐operative feedback and contactless sensing. The motivation for vision‐based sensing in surgical navigation stems from many factors, including the challenges faced by other forms of navigation systems. A common surgical navigation system performs tracking of surgical tools with external tracking systems, which may suffer from both technical and usability issues. Vision‐based tracking offers a relatively streamlined framework compared to those approaches implemented with external tracking systems. This review study aims to discuss contemporary research and development in vision‐based sensing for surgical navigation. The selected review materials are expected to provide a comprehensive appreciation of state‐of‐the‐art technology and technical issues enabling holistic discussions of the challenges and knowledge gaps in contemporary development. Original views on the significance and development prospect of vision‐based sensing in surgical navigation are presented.
Article
Full-text available
Computer vision is an important cornerstone for the foundation of many modern technologies. The development of modern computer-aided-surgery, especially in the context of surgical navigation for minimally invasive surgery, is one example. Surgical navigation provides the necessary spatial information in computer-aided-surgery. Amongst the various forms of perception, vision-based sensing has been proposed as a promising candidate for tracking and localisation application largely due to its ability to provide timely intra-operative feedback and contactless sensing. The motivation for vision-based sensing in surgical navigation stems from many factors, including the challenges faced by other forms of navigation systems. A common surgical navigation system performs tracking of surgical tools with external tracking systems, which may suffer from both technical and usability issues. Vision-based tracking offers a relatively streamlined framework compared to those approaches implemented with external tracking systems. This review study aims to discuss contemporary research and development in vision-based sensing for surgical navigation. The selected review materials are expected to provide a comprehensive appreciation of state-of-the-art technology and technical issues enabling holistic discussions of the challenges and knowledge gaps in contemporary development. Original views on the significance and development prospect of vision-based sensing in surgical navigation are presented.
Article
Artificial intelligence (AI) is gradually changing the practice of surgery with technological advancements in imaging, navigation, and robotic intervention. In this article, we review the recent successful and influential applications of AI in surgery from preoperative planning and intraoperative guidance to its integration into surgical robots. We conclude this review by summarizing the current state, emerging trends, and major challenges in the future development of AI in surgery.
Article
Full-text available
We explore a new approach to shape recognition based on a virtually infinite family of binary features (queries) of the image data, designed to accommodate prior information about shape invariance and regularity. Each query corresponds to a spatial arrangement of several local topographic codes (or tags), which are in themselves too primitive and common to be informative about shape. All the discriminating power derives from relative angles and distances among the tags. The important attributes of the queries are a natural partial ordering corresponding to increasing structure and complexity; semi-invariance, meaning that most shapes of a given class will answer the same way to two queries that are successive in the ordering; and stability, since the queries are not based on distinguished points and substructures. No classifier based on the full feature set can be evaluated, and it is impossible to determine a priori which arrangements are informative. Our approach is to select informative features and build tree classifiers at the same time by inductive learning. In effect, each tree provides an approximation to the full posterior where the features chosen depend on the branch that is traversed. Due to the number and nature of the queries, standard decision tree construction based on a fixed-length feature vector is not feasible. Instead we entertain only a small random sample of queries at each node, constrain their complexity to increase with tree depth, and grow multiple trees. The terminal nodes are labeled by estimates of the corresponding posterior distribution over shape classes. An image is classified by sending it down every tree and aggregating the resulting distributions. The method is applied to classifying handwritten digits and synthetic linear and nonlinear deformations of three hundred [Formula: see text] symbols. State-of-the-art error rates are achieved on the National Institute of Standards and Technology database of digits. The principal goal of the experiments on [Formula: see text] symbols is to analyze invariance, generalization error and related issues, and a comparison with artificial neural networks methods is presented in this context. [Figure: see text]
Conference Paper
Full-text available
Where feature points are used in real-time frame-rate applications, a high-speed feature detector is necessary. Feature detectors such as SIFT (DoG), Harris and SUSAN are good methods which yield high quality features, however they are too computationally intensive for use in real-time applications of any complexity. Here we show that machine learning can be used to derive a feature detector which can fully process live PAL video using less than 7% of the available processing time. By comparison neither the Harris detector (120%) nor the detection stage of SIFT (300%) can operate at full frame rate. Clearly a high-speed detector is of limited use if the features produced are unsuitable for downstream processing. In particular, the same scene viewed from two different positions should yield features which correspond to the same real-world 3D locations [1]. Hence the second contribution of this paper is a comparison corner detectors based on this criterion applied to 3D scenes. This comparison supports a number of claims made elsewhere concerning existing corner detectors. Further, contrary to our initial expectations, we show that despite being principally constructed for speed, our detector significantly outperforms existing feature detectors according to this criterion.
Conference Paper
Full-text available
Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is taster because it examines far fewer potential matches between the images than existing techniques Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show how our technique can be adapted tor use in a stereo vision system.
Article
Full-text available
This paper presents an online feature selection mechanism for evaluating multiple features while tracking and adjusting the set of features used to improve tracking performance. Our hypothesis is that the features that best discriminate between object and background are also best for tracking the object. Given a set of seed features, we compute log likelihood ratios of class conditional sample densities from object and background to form a new set of candidate features tailored to the local object/background discrimination task. The two-class variance ratio is used to rank these new features according to how well they separate sample distributions of object and background pixels. This feature evaluation mechanism is embedded in a mean-shift tracking system that adaptively selects the top-ranked discriminative features for tracking. Examples are presented that demonstrate how this method adapts to changing appearances of both tracked object and scene background. We note susceptibility of the variance ratio feature selection method to distraction by spatially correlated background clutter and develop an additional approach that seeks to minimize the likelihood of distraction.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Article
The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Article
Minimally invasive beating-heart surgery offers substantial benefits for the patient, compared to conventional open surgery. Nevertheless, the motion of the heart poses increased requirements to the surgeon. To support the surgeon, algorithms for an advanced robotic surgery system are proposed, which offer motion compensation of the beating heart. This implies the measurement of heart motion, which can be achieved by tracking natural landmarks. In most cases, the investigated affine tracking scheme can be reduced to an efficient block matching algorithm allowing for realtime tracking of multiple landmarks. Fourier analysis of the motion parameters shows two dominant peaks, which correspond to the heart and respiration rates of the patient. The robustness in case of disturbance or occlusion can be improved by specially developed prediction schemes. Local prediction is well suited for the detection of single tracking outliers. A global prediction scheme takes several landmarks into account simultaneously and is able to bridge longer disturbances. As the heart motion is strongly correlated with the patient's electrocardiogram and respiration pressure signal, this information is included in a novel robust multisensor prediction scheme. Prediction results are compared to those of an artificial neural network and of a linear prediction approach, which shows the superior performance of the proposed algorithms.
Conference Paper
The introduction of surgical robots in Minimally Invasive Surgery (MIS) has allowed enhanced manual dexterity through the use of microprocessor controlled mechanical wrists. Although fully autonomous robots are attractive, both ethical and legal barriers can prohibit their practical use in surgery. The purpose of this paper is to demonstrate that it is possible to use real-time binocular eye tracking for empowering robots with human vision by using knowledge acquired in situ. By utilizing the close relationship between the horizontal disparity and the depth perception varying with the viewing distance, it is possible to use ocular vergence for recovering 3D motion and deformation of the soft tissue during MIS procedures. Both phantom and in vivo experiments were carried out to assess the potential frequency limit of the system and its intrinsic depth recovery accuracy. The potential applications of the technique include motion stabilization and intra-operative planning in the presence of large tissue deformation.