ArticlePDF Available

Instrument State Recognition and Tracking for Effective Control of Robotized Laparoscopic Systems

  • Odin Vision

Abstract and Figures

Surgical robots are an important component for delivering advanced paradigm shifting technology such as image guided surgery and navigation. However, for robotic systems to be readily adopted into the operating room they must be easy and convenient to control and facilitate a smooth surgical workflow. In minimally invasive surgery, the laparoscope may be held by a robot but controlling and moving the laparoscope remains challenging. It is disruptive to the workflow for the surgeon to put down the tools to move the robot in particular for solo surgery approaches. This paper proposes a novel approach for naturally controlling the robot mounted laparoscope's position by detecting a surgical grasping tool and recognizing if its state is open or close. This approach does not require markers or fiducials and uses a machine learning framework for tool and state recognition which exploits naturally occurring visual cues. Furthermore a virtual user interface on the laparoscopic image is proposed that uses the surgical tool as a pointing device to overcome common problems in depth perception. Instrument detection and state recognition are evaluated on in-vivo and ex-vivo porcine datasets. To demonstrate the practical surgical application and real time performance the system is validated in a simulated surgical environment.
Content may be subject to copyright.
Instrument State Recognition and Tracking for
Effective Control of Robotized Laparoscopic
Manish Sahu, Daniil Moerman, and Philip Mewes
Siemens AG, Healthcare Sector, Forchheim, Germany
Peter Mountney
Siemens Corporation, Corporate Technology, Princeton, NJ, USA
Georg Rose
Otto-von-Guericke University Magdeburg
AbstractSurgical robots are an important component for
delivering advanced paradigm shifting technology such as
image guided surgery and navigation. However, for robotic
systems to be readily adopted into the operating room they
must be easy and convenient to control and facilitate a
smooth surgical workflow. In minimally invasive surgery,
the laparoscope may be held by a robot but controlling and
moving the laparoscope remains challenging. It is disruptive
to the workflow for the surgeon to put down the tools to
move the robot in particular for solo surgery approaches.
This paper proposes a novel approach for naturally
controlling the robot mounted laparoscope’s position by
detecting a surgical grasping tool and recognizing if its state
is open or close. This approach does not require markers or
fiducials and uses a machine learning framework for tool
and state recognition which exploits naturally occurring
visual cues. Furthermore a virtual user interface on the
laparoscopic image is proposed that uses the surgical tool as
a pointing device to overcome common problems in depth
perception. Instrument detection and state recognition are
evaluated on in-vivo and ex-vivo porcine datasets. To
demonstrate the practical surgical application and real time
performance the system is validated in a simulated surgical
Index Termsinstrument tracking, laparoscopic surgery,
machine learning, surgical robotics, visual servoing
Surgical robots have greatly changed the way many
procedures are performed. However, there are still a large
number which could benefit from robotic platforms and
the advanced imaging they can facilitate. One of the
barriers for integrating robotics into the operating room
(OR) is robotic control. Fully autonomous control has
regulatory challenges and therefore current research
focuses on developing intuitive control interfaces which
Manuscript received May 11, 2015; revised October 21, 2015.
enhance surgical workflow in the challenging OR
For minimally invasive abdominal procedures, having
a robot with a small footprint which can control the
laparoscope has been a goal for long time [1], [2]. The
key benefit is to facilitate solo surgery. To control the
laparoscope’s motion a number of solutions have been
proposed. A joystick [3] can be used, but this requires the
surgeon to put down their tools eventually. The AESOP
system [1] uses pre-defined voice commands and the
EndoAssist [2] system uses head gestures captured from a
tracker mounted on the surgeon’s head. [4] introduces the
concept of Gaze contingent control and [5] proposes a
fully automated motion compensation system.
Translating these approaches to the OR can be
challenging because they are either not well suited to the
OR environment (noisy, dynamic, space constrained) or
the surgical workflow. Robotic control should be
instinctive and fit seamlessly into the workflow without
introducing additional time consuming tasks such as
manual interaction.
A promising area of research is the application of
visual servoing, where surgical instruments are detected
in the laparoscopic image and used to guide the robot’s
movements. This is attractive because the surgeon
already uses the tools and is comfortable controlling them,
it does not require additional hardware, and there is little
disruption to the surgical workflow. Such systems are
comprised of two components: instrument
detection/tracking and robot control.
Instrument detection can be simplified with markers or
fiducials [6] but as this requires modifying hardware, it is
preferable to use natural image feature. Color space
features such as HSV with saturation enhancement [7]
can be used to segment tools but it may be sensitive to
changes in lighting. In [8] HSV is combined with Bayes
classifier to detect tools parts and the type of instrument
is detected by comparing against 3D models. 3D models
can be used to improve instrument detection [9] and
Journal of Mechanical Engineering and Robotics Research Vol. 5, No. 1, January 2016
doi: 10.18178/ijmerr.5.1.33-38
© 2016 Int. J. Mech. Eng. Rob. Res.
specific parts of articulated instruments and fuse these in
3D using stereo. Such approaches require a 3D model or
are focused on detecting the pose of the instrument but
not the state (open or close grasper).
Current vision based robotic controlled laparoscopic
systems [5]-[13] work by localizing the instrument
position in 2D, planning a path and moving the robot. For
controlling the depth, the geometrical relations between
the instrument [13] or the relation between the visible
tool/tools and the size of the whole scene [11], [12] are
utilized. Although the point may be defined by a tool but
this can cause problems; first the depth can be hard to
estimate accurately, secondly the end position of the
laparoscope may not have the desired field of view so this
approach to navigation is less intuitive.
This paper proposes an intuitive robotic navigation
system. It enables the surgeon to move a laparoscopic
camera by detecting and tracking the instruments in the
laparoscopic video. It does not require additional
hardware, fiducials or markers. Machine learning is used
to robustly detect surgical instruments and a novel
intuitive navigation system is proposed. Additionally we
explore the feasibility of using surgical instrument state
recognition to improve surgical workflow. Instrument
detection and state recognition are evaluated on in-vivo
and ex-vivo porcine dataset and the robotic navigation
system is validated in a simulated surgical environment.
The system is comprised of a 7-axis Kuka LWR 5
robot holding a monocular HD laparoscope. The
laparoscope is inserted into the abdomen through a trocar
port and held by the laparoscopic robot. The operator
introduces a standard grasping or cutting instrument into
the abdomen though a second port and into the view of
the laparoscope. The robot control interface is overlaid on
the live laparoscopic video stream to facilitate navigation.
An overview of the system is provided in Fig. 1.
Figure 1. Replica of system setup with robot with plastic porcine liver
(up) and virtual interface (down)
A novel robot control interface is proposed which
provides a natural and intuitive navigation of the
laparoscope with four degrees of freedom. A simple and
effective solution to navigate laparoscope in/out along
optical axis is presented which does not rely on
estimating the pose or depth of the instrument or defining
a point in 3D.
Figure 2. Control design for virtual laparoscopic interface (left to right)
The user interface (shown in Fig. 1) is displayed on the
laparoscopic video monitor and is directly overlaid on the
live video stream. The interface is only overlaid on the
laparoscopic image when the surgeon wants to adjust the
laparoscope’s position, this could for instance be
triggered by an input device such as a foot pedal. The
robot can only move when the interface is shown. The
user interface has two components which control two
separate types of movement (see Fig. 2):
a) Pseudo in-plane movement: triggered when the
instrument state is recognized as close.
b) Movement in direction of the optical axis:
triggered when the instrument state is recognized
Journal of Mechanical Engineering and Robotics Research Vol. 5, No. 1, January 2016
Reiter et al. [10] used a Random Forest classifier to detect
© 2016 Int. J. Mech. Eng. Rob. Res.
as open and the instrument position is inside
predefined regions: move in and move out.
To prevent the robot from moving as soon as the user
interface is switched on the tracking process starts only if
the instrument is detected inside the rectangular start-up
region (red box of 640x640 pixels, see Fig. 1) and the
instrument state is open.
Pseudo in-plane movement corresponds to the natural
user navigation of moving the laparoscope up, down, left
and right. To the end user this appears to be in plane
motion, however because the laparoscope is inserted
through a trocar port it has a remote center of motion and
therefore it is not truly in-plane. Pseudo in plane
movement is triggered only when the instrument state is
detected as close. If the tool is in the open state the in-
plane robot movement is disabled. Once the instrument is
detected as close the deviations from the center of the
central region are computed (see Fig. 1).
      
Then these pixel deviations are transformed to the
robot rotational commands, and and transferred to
the robot.
  
 
The controller gains and are added for smooth
displacement of the robot. The robot continues to move
until the detected tool state is close or the instrument
reaches center of image i.e. pixel deviation is zero.
Movement in direction of the optical axis of the
laparoscope corresponds to moving the laparoscope in
and out of the trocar port. The user interface defines two
regions shown in Fig. 1 and labelled as “Move in” and
“Move out”. If the tool is detected in these regions in the
open state position then the laparoscope will be
forwarded or reversed along the optical axis of the
laparoscopic camera with a predefined constant value dz
(see Fig. 2-2b). This constant value is then transformed to
the robot rotational commands of movement along optical
 
where, is the controller gain.
Figure 3. Tool tracking loop
Our proposed algorithm uses a machine learning
framework for tool detection which exploits naturally
occurring visual cues. The overall instrument tracking
approach (see Fig. 3) can be broken down into three main
Instrument tip recognition which includes feature
extraction and instrument detection.
Instrument state recognition which determines if
the state of the instrument is ‘Open’ or ‘Close’.
Instrument tip tracking to increase tracking
The appearance of an instrument can change with the
factors: lighting conditions, pose variation,
scale/resolution and occlusion. Our proposed virtual
interface design helps to reduce the effect of some of
these factors by introducing some simple constraints to
the operator when s/he expects to adjust the laparoscopic
a) The operator must keep the state of the grasper
either fully open or fully close.
b) The operator must keep the tool in visibility
range i.e. avoid occlusion, conditions like
extreme deformation along instrument tip point
or sudden movements causing blurring effects.
The scale factor is considered by using multi-scale
object detection scheme and the features acquired from
the grasper tool are part-based structural features which
are robust to illumination and small deformations in pose.
The remaining factors: lighting variation and pose are
considered by training the grasper samples with different
laparoscopic lights conditions and instrument poses.
A. Feature Extraxtion and Learning
As mentioned in Section 1, the color space features are
sensitive to light thus we focused on exploiting of
structural features of the instrument grasper for
instrument tip detection and state recognition procedure.
Local Binary Patterns (LBP) was initially presented as
compact, discriminative texture description with
tolerance against monotonic gray scale changes caused by
illumination at low computation cost. Uniform LBP [14]
were later introduced to reduce the negative effects
caused by noises. Uniform LBP can be viewed as an
operator which encodes information about different types
of gradients like corners, edges, spots, flat areas et al. The
spatial histogram of Uniform LBP image can be used to
capture part based structure information of the object.
Since part based model schemes provide expressive
description of objects structure considering the
relationships between parts, therefore it robust to partial
occlusion and small deformation.
Adaptive Boosting is a learning technique which is
used to boost the classification performance by
combining the results of multiple “weak” classifiers into
Journal of Mechanical Engineering and Robotics Research Vol. 5, No. 1, January 2016
© 2016 Int. J. Mech. Eng. Rob. Res.
a single “strong” classifier. In our approach, we expect a
noisy image due to specular reflections and therefore we
use Gentle AdaBoost [15] because it uses Newton
stepping instead of exact optimization at each step and
thus provide better performance when the training data is
noisy and has outliers [16]. Decision trees are fast to learn
and non-linear in nature and thus often used as weak
learners for boosting.
For computation of structural features, the image is
first converted to gray scale, and then the contrast of the
image is enhanced by histogram equalization followed by
labelling the image with Uniform Local Binary Pattern
(ULBP) operator. Once the image is labeled, it is divided
into 2x2 sub-windows and histogram for each sub-
window is concatenated in a single 1-D histogram (see
Fig. 4). These part based structure feature descriptors are
then trained through boosted decision trees.
Figure 4. Feature extraction pipeline
B. Instrument Detection
The instrument detection step comprises of scanning
the laparoscopic image at multiple scales and locations by
using sliding window object detection scheme. Features
described above are extracted from each window patch
and classified into “tool” and “no tool”. Since our
detection algorithm searches for different scales and
location, multiple detections would occur around
instrument tip. For reducing multiple detections to a
single detection, we inherited the design of integration of
multiple detections from [17] and assigned the regression
value of the AdaBoost classifier as weights to the
corresponding detected windows.
C. Instrument State Recognition
Instrument state recognition is a critical part of the
proposed novel approach to robotic control. Once the tool
is detected an additional classification is performed on
the detected tool window to determine the state of the
tool i.e. open or close grasper. The state classification is
based on same set of part-based structure features
mentioned above and using a second Gentle AdaBoost
D. Instrument Tracking
After the instrument tip is detected, a window
(320x320 pixels in the native scale of resolution
1920x1080 pixels) is created around the instrument tip
location and instrument detection is performed inside this
constraint window for the next frame.
In order to demonstrate the practical application of the
proposed robotic navigation system a number of
validation experiments were performed to evaluate the 1)
instrument detection and state recognition and 2)
feasibility of virtual interface based robot navigation
A. Datasets
For creating the training samples, we acquired four ex-
vivo and two in-vivo video datasets. Each video dataset
contain multiple subsets of video data corresponding to
different lighting conditions and pose variations. From
the above acquired video datasets, we cropped the tool tip
and resized it to the base scale of 64x64 pixels for
creation of positive samples for the train/test data set.
Thus there are four ex-vivo and two in-vivo image
datasets, each containing images of instrument grasper at
different lighting conditions and pose. For creation of
negative samples datasets, six ex-vivo and in-vivo video
datasets from Hamlyn video dataset [18] were exploited
with samples stemming from parts other than the
instrument tip obtained from our own datasets. 20
training samples are shown in Fig. 5.
Figure 5. Example training image patches cropped to as size of 64x64
B. Classification Results on Image Patches
For the evaluation of our algorithm we split four ex-
vivo and two in-vivo dataset in two ways:
a) Training set: three ex- vivo and two in-vivo image
dataset; Testing set: one ex-vivo image dataset
b) Training set: four ex- vivo and one in-vivo image
dataset; Testing set: one in-vivo image dataset
Each training image dataset contains a total of 640 tool
grasper samples with 320 samples each for open and
close grasper and the testing image dataset contains a
total of 128 tool grasper samples with 64 samples each
for open and close. To keep a balance between the
positive and negative samples and avoid over-fitting for
the negative samples, we used 3000 randomly selected
samples from the acquired negative datasets with 2100
for training and 900 for testing set respectively.
Our testing results yield an accuracy of 98.47% and
96.63% for the detection of the grasper tool and 96.67%
and 94.32% for the state recognition of the tool (see
Table I and Table II) for ex-vivo and in-vivo image
dataset respectively. These classification results are based
Journal of Mechanical Engineering and Robotics Research Vol. 5, No. 1, January 2016
© 2016 Int. J. Mech. Eng. Rob. Res.
on AdaBoost classifiers with decision trees as weak
learners (discussed in section IV.A). Other classifiers:
Random Forest and Linear Support Vector Machine are
considered but not mentioned as they are outperformed
by AdaBoost.
Tool -
No Tool
Open -
ty Accuracy
Tool -
No Tool
98.61% 96.63%
Open -
95.83% 94.32%
C. Reatl-Time Ex-Vivo Experiment
The robotic navigation system was evaluated in a
replica surgical environment. In this experiment a
laparoscope is mounted on the Kuka LWR 5 and a freshly
resected pig liver is placed in the field of view of the
laparoscope. The laparoscope acquires images of
1920x1080 pixels resolution at 25 frames per second. A
remote center of motion was simulated to replicate the
effect of the port on the laparoscope and a surgical
grasper was use as the instrument. A non-expert user was
given the task of control by using the surgical instrument.
The user was able to naturally control the robot’s motions
in all degrees of freedom with a shallow learning curve.
To further validate the strength of our approach in this
experimental setup, we analyzed a total of 692 frames.
After running our proposed algorithm for tool detection, a
total of 589 were recognized with 58 false detections as
shown in Table III. Some of the instances of the live
experiment are shown in Fig. 6.
Total Frames
Detected Frames
False Detections
The system has been implemented in C++ on a CPU.
On an Intel Core-i7 2.70-GHz instrument detection runs
at five fps and instrument tracking seven fps.
In this paper, a method is proposed for the challenging
problem of intuitive laparoscopic robot navigation in the
OR. The approach is motivated by consideration of
available technology of the OR and with the objective of
minimizing the disruption to the current clinical
workflow. The proposed system controls the movement
of a robotic laparoscope by detecting instruments in
laparoscope video. Machine learning is used to detect and
track the instruments and recognize instrument states
which are used to trigger robotic movement. The system
is validated on in vivo and ex vivo porcine image datasets
and the practical application of robotic control is
demonstrated on a replica surgical setup. We could
achieve a successful detection rate of the tool on 85% of
the frames in a real-time ex-vivo experiment.
Figure 6. Example image showing detection of the instrument during
ex-vivo experiments
The authors wish to thank Vincent Agnus, Stéphane
Nicolau and Luc Soler (IHU/IRCAD Strasbourg, France)
for making the in-vivo images available within the
framework of the LASAR (Laparoscopic Assisted
Surgery with Augmented Reality) project.
[1] B. Kraft, The AESOP robot system in laparoscopic surgery:
Increased risk or advantage for surgeon and patient? Surgical
Endoscopy, vol. 18, pp. 1216-1223, 2004.
[2] S. Kommu, Initial experience with the EndoAssist camera
holding robot in laparoscopic urological surgery, Journal of
Robotic Surgery, vol. 1, pp. 133-137, 2007.
[3] P. Hourlay, “How to maintain the quality of laparoscopic surgery
in the era of lack of hands?” Acta Chirurgica Belgica, vol. 106, no.
1, pp. 22-26, 2006.
[4] D. P. Noonan and P. David, Gaze contingent control for an
articulated mechatronic laparoscope, in Proc. 3rd IEEE
International Conference on Biomedical Robotics and
Biomechatronics, 2010.
[5] R. Ginhoux, “Active filtering of physiological motion in robotized
surgery using predictive control,” IEEE Trans. Robot., vol. 21, no.
1, pp. 67-79, 2005.
[6] L. Bouarfa, In-Vivo real-time tracking of surgical instruments in
endoscopic video, Minimally Invasive Therapy & Allied
Technologies, vol. 21, no. 3, pp. l29-l34, 2012.
Journal of Mechanical Engineering and Robotics Research Vol. 5, No. 1, January 2016
© 2016 Int. J. Mech. Eng. Rob. Res.
[7] C. Doignon, Real-Time segmentation of surgical instruments
inside the abdominal cavity using a joint hue saturation color
feature, Real-Time Imaging, vol. 11, no. 5, pp. 429-442, 2005.
[8] S. Speidel, Automatic classification of minimally invasive
instruments based on endoscopic image sequences, SPIE Medical
Imaging. International Society for Optics and Photonics, 2009.
[9] Z. Pezzementi, “Articulated object tracking by rendering
consistent appearance parts,” in Proc. International Conference on
Robotics and Automation, Kobe, Japan, May 12-17, 2009, pp.
[10] A. Reiter, Feature classification for tracking articulated surgical
tools, in Medical Image Computing and Computer-Assisted
InterventionMICCAI, Springer Berlin Heidelberg, 2012, pp. 592-
[11] S. Voros, “Automatic detection of instruments in laparoscopic
images: A first step towards high-level command of robotic
endoscopic holders, The International Journal of Robotics
Research, vol. 26, no. 11-12 , pp. 1173-1190, 2007.
[12] A. Casals and J. Amat, Automatic Guidance of an Assistant Robot
in Laparoscopic Surgery, 1996.
[13] K. T. Song and C. J. Chen, Autonomous and stable tracking of
endoscope instrument tools with monocular camera, in Proc.
IEEE/ASME International Conference on Advanced Intelligent
Mechatronics, 2012.
[14] M. Pietikäinen, Local binary patterns for still images, in
Computer Vision Using Local Binary Patterns, Springer London,
2011, pp. 13-47.
[15] J. Friedman, et al., Additive logistic regression: A statistical view
of boosting (with discussion and a rejoinder by the authors), The
Annals of Statistics, vol. 28, no. 2, pp. 337-407, 2000.
[16] R. Lienhart, A. Kuranov, and V. Pisarevsky, Empirical analysis
of detection cascades of boosted classifiers for rapid object
detection, in Pattern Recognition, Springer Berlin Heidelberg,
2003, pp. 297-304.
[17] P. Viola and J. Michael, Rapid object detection using a boosted
cascade of simple features, in Proc. IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, 2001,
vol. 1.
[18] P. Mountney, Three-dimensional tissue deformation recovery and
tracking: Introducing techniques based on laparoscopic or
endoscopic images, IEEE Signal Processing Magazine, vol. 27,
no. 4, pp. 14-24, July 2010.
Journal of Mechanical Engineering and Robotics Research Vol. 5, No. 1, January 2016
© 2016 Int. J. Mech. Eng. Rob. Res.
... In the final selection, 19 studies, were included in the analysis [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26] . Fig. 1 shows the PRISMA flowchart for selection ( Fig. 1). ...
... Later, we splitted our table, isolating all the papers strictly connected to administration (5/19), that provide a proposal of organizational or predictive models of duration and cancellation of surgical cases [8][9][10][11][12]. The remaining studies (14/19) analyze, instead, outcomes that could be used as indirect parameters in the OR management [13][14][15][16][17][18][19][20][21][22][23][24][25][26]. ...
... Analyzing the typology of ML, only one study employed an unsupervised technique [21], with the most used represented by the supervised. The most used algorithms were decision t r e e s a n d r a n d o m f o r e s t ( m u l t i p l e d e c i s i o n trees) [10,12,13,14,19,20,26]. ...
Full-text available
We conducted a systematic review of literature to better understand the role of new technologies in the perioperative period; in particular we focus on the administrative and managerial Operating Room (OR) perspective. Studies conducted on adult (≥ 18 years) patients between 2015 and February 2019 were deemed eligible. A total of 19 papers were included. Our review suggests that the use of Machine Learning (ML) in the field of OR organization has many potentials. Predictions of the surgical case duration were obtain with a good performance; their use could therefore allow a more precise scheduling, limiting waste of resources. ML is able to support even more complex models, which can coordinate multiple spaces simultaneously, as in the case of the post-anesthesia care unit and operating rooms. Types of Artificial Intelligence could also be used to limit another organizational problem, which has important economic repercussions: cancellation. Random Forest has proven effective in identifing surgeries with high risks of cancellation, allowing to plan preventive measures to reduce the cancellation rate accordingly. In conclusion, although data in literature are still limited, we believe that ML has great potential in the field of OR organization; however, further studies are needed to assess the effective role of these new technologies in the perioperative medicine.
... Tool detection generally is an intermediate step for tool tracking, the process of monitoring tool location over time (Du et al., 2016;Rieke et al., 2016a;Lee et al., 2017b;Zhao et al., 2017;Czajkowska et al., 2018;Ryu et al., 2018;Keller et al., 2018), and pose estimation, the process of inferring a 2-D pose (Rieke et al., 2016b;Kurmann et al., 2017;Alsheakhali et al., 2016b;Du et al., 2018;Wesierski and Jezierska, 2018) or a 3-D pose Gessert et al., 2018) based on the location of tool elements. Tasks associated with tool detection also include velocity estimation (Marban et al., 2017) and instrument state recognition (Sahu et al., 2016a). All the above tasks are directly useful to the surgeon: they can be used for improved visualization, through augmented or mixed reality (Frikha et al., 2016 Nov-Dec;Bodenstedt et al., 2016 Feb-Mar;Lee et al., 2017b,a). ...
... Various computer vision algorithms have been proposed to address these tasks. Until early 2017, tool detection relied heavily on handcrafted features, including Gabor filters (Czajkowska et al., 2018), Frangi filters (Agustinos and Voros, 2015;Chang et al., 2016), color-based features (Primus et al., 2016;Rieke et al., 2016a), histograms of oriented gradients (Rieke et al., 2016a;Czajkowska et al., 2018), SIFT features (Du et al., 2016), ORB features (Primus et al., 2016) and local binary patterns (Sahu et al., 2016a). For tool segmentation, similar features have been extracted within superpixels (Bodenstedt et al., 2016 Feb-Mar). ...
... For tool segmentation, similar features have been extracted within superpixels (Bodenstedt et al., 2016 Feb-Mar). These features were processed either by a machine learning algorithm, such as a support vector machine (Primus et al., 2016;Wesierski and Jezierska, 2018), a random forest (Bodenstedt et al., 2016 Feb-Mar;Rieke et al., 2016a,b) or AdaBoost (Sahu et al., 2016a), or by a parametric model, such as a generalized Hough transform (Du et al., 2016;Frikha et al., 2016 Nov-Dec;Czajkowska et al., 2018) or a B-spline model (Chang et al., 2016). Note that template matching techniques have also been used to deal with articulated instruments (Ye et al., 2016;Wesierski and Jezierska, 2018). ...
Surgical tool detection is attracting increasing attention from the medical image analysis community. The goal generally is not to precisely locate tools in images, but rather to indicate which tools are being used by the surgeon at each instant. The main motivation for annotating tool usage is to design efficient solutions for surgical workflow analysis, with potential applications in report generation, surgical training and even real-time decision support. Most existing tool annotation algorithms focus on laparoscopic surgeries. However, with 19 million interventions per year, the most common surgical procedure in the world is cataract surgery. The CATARACTS challenge was organized in 2017 to evaluate tool annotation algorithms in the specific context of cataract surgery. It relies on more than nine hours of videos, from 50 cataract surgeries, in which the presence of 21 surgical tools was manually annotated by two experts. With 14 participating teams, this challenge can be considered a success. As might be expected, the submitted solutions are based on deep learning. This paper thoroughly evaluates these solutions: in particular, the quality of their annotations are compared to that of human interpretations. Next, lessons learnt from the differential analysis of these solutions are discussed. We expect that they will guide the design of efficient surgery monitoring tools in the near future.
... To improve surgical performance, robotic-assisted surgery systems have attracted increasing attention [7]. For example, Shan robots that handle the endoscope or laparoscope have been explored [8], [9], the applications of soft robotic devices in MIS have been studied to reduce patients' pain and damage [5]. Moreover, overlaying pre-and intra-operative imaging with surgical videos could improve surgeons' capabilities [10]. ...
The intelligent perception of endoscopic vision is appealing in many computer-assisted and robotic surgeries. Achieving good vision-based analysis with deep learning techniques requires large labeled datasets, but manual data labeling is expensive and time-consuming in medical problems. When applying a trained model to a different but relevant dataset, a new labeled dataset may be required for training to avoid performance degradation. In this work, we investigate a novel cross-domain strategy to reduce the need for manual data labeling by proposing an image-to-image translation model called live-cadaver GAN (LC-GAN) based on generative adversarial networks (GANs). More specifically, we consider a situation when a labeled cadaveric surgery dataset is available while the task is instrument segmentation on a live surgery dataset. We train LC-GAN to learn the mappings between the cadaveric and live datasets. To achieve instrument segmentation on live images, we can first translate the live images to fake-cadaveric images with LC-GAN, and then perform segmentation on the fake-cadaveric images with models trained on the real cadaveric dataset. With this cross-domain strategy, we fully leverage the labeled cadaveric dataset for segmentation on live images without the need to label the live dataset again. Two generators with different architectures are designed for LC-GAN to make use of the deep feature representation learned from the cadaveric image based instrument segmentation task. Moreover, we propose structural similarity loss and segmentation consistency loss to improve the semantic consistency during translation. The results demonstrate that LC-GAN achieves better image-to-image translation results, and leads to improved segmentation performance in the proposed cross-domain segmentation task.
... With the widespread use of devices to record surgical procedures in minimally invasive surgery, automated analysis of surgical tools in videos has become a popular research area, mainly involving classification, segmentation, tracking, detection, and other directions. Unlike the earlier methods [6]- [11], rely on various handcrafted features, the existing approaches mainly use deep learning to extract more high-level features for surgical workflow recognition and tool detection. The traditional analysis of surgical phases is based on a number of statistical models, involving Conditional Random Fields [12]- [15], Hidden Markov Models [7], [16], [17], Hidden semi-Markov Models [18], [19], Linear Dynamical Systems [20] and so on. ...
Full-text available
Minimally invasive surgery like laparoscopic surgery is an active research area of clinical practice for less pain and a faster recovery rate. Detection of surgical tools with more accurate spatial locations in surgical videos not only helps to ensure patient safety by reducing the incidence of complications but also makes a difference to assess the surgeon performance. In this paper, we propose a novel Modulated Anchoring Network for detection of laparoscopic surgery tools based on Faster R-CNN, which inherits the merits of two-stage approaches while also maintains high efficiency of comparable speed as state-of-the-art one-stage methods. Since objects like surgical instruments with a wide aspect ratio are difficult to recognize, we develop a novel training scheme named as modulated anchoring to explicitly predict arbitrary anchor shapes of objects of interest. For taking the relationship of different tools into consideration, it is useful to embed the relation module in our network. We evaluate our method using an existing dataset (m2cai16-tool-locations) and a new private dataset (AJU-Set), both collected from cholecystectomy surgical videos in hospital, covering information of seven surgical tools with spatial bounds. We show that our detector yields excellent detection accuracy of 69.6% and 76.5% over the introduced datasets superior to other recently used architectures. We further verify the efficiency of our method by analyzing the usage patterns of tools, the economy of the movement, and the dexterity of operations to assess surgical quality.
... On the other hand, the performance could be improved by enhancing the noise reduction or employing additional stereo vision cameras in order to have multiple viewpoints. Moreover, it could be of great interest to exploit a human-tracking algorithm to predict future movement and collision or instrument-tracking in order to check the actual state of the tool, as it has been investigated by Sahu et al. [27]. The use of an industrial robot arm allows for a large-scale ...
Full-text available
This paper presents a preliminary robotic solution for constrained teleoperation tasks in an uncertain and dynamic environment. The robotic system is supported by a reasoning agent which makes the control action reactive and context-sensitive. The investigation is motivated by the future Human-Robot collaboration, therefore, it focuses on minimizing or avoiding collisions within the robot and the surroundings objects. The report describes the developed control architecture, which, in its modular and hierarchical structure, combines knowledge from different areas such as control theory, path and trajectory planning, computer vision, collision avoidance, and decision-making theory. The software is implemented in a ROS framework, in order to support a clear and modular design, suitable for future extensions and integration on different hardware components. The experiments are run on both real and simulated systems. The results show an autonomous robot capable of continuously adapting its movements despite the external agent interruptions, with a 99% success rate. We can conclude that an adaptive robotic system capable of performing constrained tasks and simultaneously reacting to external stimuli in an uncertain and dynamic environment is potentially obtainable.
Minimally invasive surgical instrument visual detection and tracking is one of the core algorithms of minimally invasive surgical robots. With the development of machine vision and robotics, related technologies such as virtual reality, three-dimensional reconstruction, path planning, and human-machine collaboration can be applied to surgical operations to assist clinicians or use surgical robots to complete clinical operations. The minimally invasive surgical instrument vision detection and tracking algorithm analyzes the image transmitted by the surgical robot endoscope, extracting the position of the surgical instrument tip in the image, so as to provide the surgical navigation. This technology can greatly improve the accuracy and success rate of surgical operations. The purpose of this paper is to further study the visual detection and tracking technology of minimally invasive surgical instruments, summarize the existing research results, and apply it to the surgical robot project. By reading the literature, the author summarized the theoretical basis and related algorithms of this technology in recent years. Finally, the author compares the accuracy, speed and application scenario of each algorithm, and analyzes the advantages and disadvantages of each algorithm. The papers included in the review were selected through Web of Science, Google Scholar, PubMed and CNKI searches using the keywords: “object detection”, “object tracking”, “surgical tool detection”, “surgical tool tracking”, “surgical instrument detection” and “surgical instrument tracking” limiting results to the year range 1985 - 2021. our study shows that this technology will have a great development prospect in the aspects of accuracy and real-time improvement in the future.
Robot-assisted surgery (RAS) is a type of minimally invasive surgery which is completely different from the traditional surgery. RAS reduces surgeon’s fatigue and the number of doctors participating in surgery. At the same time, it causes less pain and has a faster recovery rate. Real-time surgical tools detection is important for computer-assisted surgery because the prerequisite for controlling surgical tools is to know the location of surgical tools. In order to achieve comparable performance, most Convolutional Neural Network (CNN) employed for detecting surgical tools generate a huge number of feature maps from expensive operation, which results in redundant computation and long inference time. In this paper, we propose an efficient and novel CNN architecture which generate ghost feature maps cheaply based on intrinsic feature maps. The proposed detector is more efficient and simpler than the state-of-the-art detectors. We believe the proposed method is the first to generate ghost feature maps for detecting surgical tools. Experimental results show that the proposed method achieves 91.6% mAP on the Cholec80-locations dataset and 100% mAP on the Endovis Challenge dataset with the detection speed of 38.5 fps, and realizes real-time and accurate surgical tools detection in the Laparoscopic surgery video.
Full-text available
Robot-assisted surgery (RAS), a type of minimally invasive surgery, is used in a variety of clinical surgeries because it has a faster recovery rate and causes less pain. Automatic video analysis of RAS is an active research area, where precise surgical tool detection in real time is an important step. However, most deep learning methods currently employed for surgical tool detection are based on anchor boxes, which results in low detection speeds. In this paper, we propose an anchor-free convolutional neural network (CNN) architecture, a novel frame-by-frame method using a compact stacked hourglass network, which models the surgical tool as a single point: the center point of its bounding box. Our detector eliminates the need to design a set of anchor boxes, and is end-to-end differentiable, simpler, more accurate, and more efficient than anchor-box-based detectors. We believe our method is the first to incorporate the anchor-free idea for surgical tool detection in RAS videos. Experimental results show that our method achieves 98.5% mAP and 100% mAP at 37.0 fps on the ATLAS Dione and Endovis Challenge datasets, respectively, and truly realizes real-time surgical tool detection in RAS videos.
Full-text available
Minimally invasive surgery is nowadays a frequently applied technique and can be regarded as a major breakthrough in surgery. The surgeon has to adopt special operation-techniques and deal with difficulties like the complex hand-eye coordination and restricted mobility. To alleviate these constraints we propose to enhance the surgeon's capabilities by providing a context-aware assistance using augmented reality techniques. To analyze the current situation for context-aware assistance, we need intraoperatively gained sensor data and a model of the intervention. A situation consists of information about the performed activity, the used instruments, the surgical objects, the anatomical structures and defines the state of an intervention for a given moment in time. The endoscopic images provide a rich source of information which can be used for an image-based analysis. Different visual cues are observed in order to perform an image-based analysis with the objective to gain as much information as possible about the current situation. An important visual cue is the automatic recognition of the instruments which appear in the scene. In this paper we present the classification of minimally invasive instruments using the endoscopic images. The instruments are not modified by markers. The system segments the instruments in the current image and recognizes the instrument type based on three-dimensional instrument models.
Full-text available
Although the advantages of laparoscopic surgery are well documented, one disadvantage is that, for optimum performance, an experienced camera driver is required who can provide the necessary views for the operating surgeon. In this paper we describe our experience with urological laparoscopic techniques using the novel EndoAssist robotic camera holder and review the current status of alternative devices. A total of 51 urological procedures (25 using the EndoAssist device and 26 using a conventional human camera driver) conducted by three experienced surgeons were studied prospectively, including nephrectomy (simple and radical), pyeloplasty, radical prostatectomy, and radical cystoprostatectomy. The surgeon noted the extent of body comfort and muscle fatigue in each case. Other aspects documented were ease of scope movement, i.e. usability, need to clean the telescope, time of set-up, surgical performance, and whether it was necessary to change the position of the arm during the surgery. All three surgeons involved in the evaluation felt comfortable throughout all procedures, with no loss of autonomy. It was, however, obvious that the large arc generated whilst doing a nephrectomy led to more episodes of lens cleaning, and the arm had to be relocated on some occasions. Clearer benefits were seen while performing pelvic surgery or pyeloplasty, perhaps because the arc of movement was smaller. The EndoAssist is an effective, easy to use device for robotic camera driving which reduces the constraint of having to have an experienced camera driver for optimum visualisation during laparoscopic urological procedures.
Conference Paper
With an autonomous tool tracking system, surgeons do not need to handle the endoscope during surgery. This paper aims to develop an image tracking system with a stable view for minimally invasive surgery (MIS) by using monocular endoscope. In order to provide a stable view for the surgery, we propose to set a buffer zone in the center of image frame for the robotic camera holder to track the endoscope instruments. If endoscope instruments are inside the buffer zone, the endoscope robot will keep still. However, if one or both of them move away from the buffer zone, the robot will start to track the instrument. Furthermore, by calculating the distance between two instruments in the image plane, the robotic camera holder can track the depth to the instruments. The robot will therefore zoom in if the distance is too short. A safe distance from the camera lens to instrument tools can therefore be maintained during the image tracking. Preliminary experimental results show that the proposed design achieves satisfactory tracking accuracy in real time and provides a stable view simultaneously.
This chapter provides an in-depth description of the LBP operator in spatial image domain. The generic LBP operator, and its rotation-invariant and multiscale versions are introduced. The use of complementary contrast information is also discussed. The success of LBP methods in various computer vision problems and applications has inspired much new research on different variants. The basic LBP has also some problems that need to be addressed. Therefore, several extensions and modifications of LBP have been proposed to increase its robustness and discriminative power.
Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descriptions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications.
Conference Paper
Tool tracking is an accepted capability for computer-aided surgical intervention which has numerous applications, both in robotic and manual minimally-invasive procedures. In this paper, we describe a tracking system which learns visual feature descriptors as class-specific landmarks on an articulated tool. The features are localized in 3D using stereo vision and are fused with the robot kinematics to track all of the joints of the dexterous manipulator. Experiments are performed using previously-collected porcine data from a surgical robot.
Conference Paper
This paper introduces two techniques for controlling an articulated mechatronic laparoscope through the eyes of the surgeon during minimally invasive surgery. The system consists of a 2D eye tracking unit interfaced with a mechatronic laparoscope that has five controllable degrees-of-freedom (DoF) located at the distal end of a rigid shaft. Through the use of image feedback from a tip mounted camera, a closed-loop gaze contingent framework featuring two separate control techniques (“Individual Joint Selection” and “Automatic Joint Selection”) was developed. Under this framework, the location of a surgeon's 2D fixation point is converted into commands that servo the laparoscope. Experimental results illustrate the ability of both techniques to perform real-time gaze contingent laparoscope control. A key advantage of the proposed system is the ability to provide the operator with sufficient distal dexterity to achieve stable off-axis visualisation in an intuitive, hands-free manner, thus allowing other handheld instruments to be controlled simultaneously. Potential applications include Single Incision Laparoscopic Surgery (SILS) or Natural Orifice Trans-Endoluminal Surgery (NOTES), where the use of multiple instruments passing through a single incision presents both visualization and ergonomic challenges.
Conference Paper
Recently Viola et al. have introduced a rapid object detection scheme based on a boosted cascade of simple feature classifiers. In this paper we introduce and empirically analysis two extensions to their approach: Firstly, a novel set of rotated haar-like features is introduced. These novel features significantly enrich the simple features of (6) and can also be calculated efficiently. With these new rotated features our sample face detector shows off on average a 10% lower false alarm rate at a given hit rate. Secondly, we present a through analysis of different boosting algorithms (namely Discrete, Real and Gentle Adaboost) and weak classifiers on the detection performance and computational complexity. We will see that Gentle Adaboost with small CART trees as base classifiers outperform Discrete Adaboost and stumps. The complete object detection training and detection system as well as a trained face detector are available in the Open Computer Vision Library at (8).