Conference PaperPDF Available

A stereoscopic fibroscope for camera motion and 3D depth recovery during Minimally Invasive Surgery

Authors:
  • Odin Vision

Abstract and Figures

This paper introduces a stereoscopic fibroscope imaging system for minimally invasive surgery (MIS) and examines the feasibility of utilizing images transmitted from the distal fibroscope tip to a proximally mounted CCD camera to recover both camera motion and 3D scene information. Fibre image guides facilitate instrument miniaturization and have the advantage of being more easily integrated with articulated robotic instruments. In this paper, twin 10,000 pixel coherent fibre bundles (590mum diameter) have been integrated into a bespoke laparoscopic imaging instrument. Images captured by the system have been used to build a 3D map of the environment and reconstruct the laparoscope's 3D pose and motion using a SLAM algorithm. Detailed phantom validation of the system demonstrates its practical value and potential for flexible MIS instrument integration due to the small footprint and flexible nature of the fibre image guides.
Content may be subject to copyright.
Abstract— This paper introduces a stereoscopic fibroscope
imaging system for Minimally Invasive Surgery (MIS) and
examines the feasibility of utilizing images transmitted from the
distal fibroscope tip to a proximally mounted CCD camera to
recover both camera motion and 3D scene information. Fibre
image guides facilitate instrument miniaturization and have the
advantage of being more easily integrated with articulated
robotic instruments. In this paper, twin 10,000 pixel coherent
fibre bundles (590µm diameter) have been integrated into a
bespoke laparoscopic imaging instrument. Images captured by
the system have been used to build a 3D map of the
environment and reconstruct the laparoscope’s 3D pose and
motion using a SLAM algorithm. Detailed phantom validation
of the system demonstrates its practical value and potential for
flexible MIS instrument integration due to the small footprint
and flexible nature of the fibre image guides.
I. INTRODUCTION
S the number of Minimally Invasive Surgical (MIS)
procedures performed with robotic assistance
multiplies, there is an increasing demand to improve the
functionality and usability of such systems to allow for more
complex procedures to be performed. Existing robotic
assisted MIS platforms, such as the daVinci surgical robot
(Intuitive Surgical, Sunnyvale, CA), allow a surgeon to
interact with the operative environment through a master-
slave architecture while viewing a magnified 3D
representation of the surgical scene. The provision of
immersive stereo vision has proved to be one of the major
strengths of the system when manipulating complex
anatomical structures.
Currently, one of the main focuses of MIS robot research
is in the design of flexible instruments that can follow
curved anatomical pathways with stereo vision, allowing
regional and global integration of the 3D surgical
environment. While traditional stereo-laparoscope systems,
similar to that utilised by the daVinci, are not compatible
with such an approach (due to the use of rigid, rod lens for
the optical systems), miniaturised coherent fibre-optic
bundles offer the advantages of flexibility and
miniaturization required for integration with articulated
instruments, but at a cost of decreased image resolution.
The purpose of this paper is to present a stereo imaging
instrument to evaluate the feasibility of using fibre bundles
for instrument localisation and soft tissue mapping within a
Manuscript received September 15, 2008.
D. Noonan, P. Mountney, D. Elson, A. Darzi, G-Z.Yang are with the
Institute of Biomedical Engineering , Dept. of Biosurgery & Surgical
Technology, Dept. of Computing, Imperial College London, London, SW7
2AZ, UK (e-mail: g.z.yang@imperial.ac.uk).
sequential vision only SLAM (Simultaneous Localisation
and Mapping) system. Key technical issues associated with
developing the stereo fibroscope imaging system and its 3D
vision algorithms are presented. Such a system has the
potential to provide in situ 3D reconstruction required for
implementing advanced safety techniques, such as active
constraints and motion stabilisation. Results obtained from a
silicone tissue phantom and ex-vivo porcine tissue were
validated using optical tracking and a registered CT scan.
A. Robotic Assisted Minimally Invasive Surgery
In MIS, a miniaturised CCD or fibre optic camera is
commonly used to pass through a natural orifice or small
incision of the body to gain remote vision. Specialised
instruments are also inserted through additional incisions to
perform the actual surgical tasks. The use of small incisions
results in reduced patient trauma, blood loss and
hospitalisation costs [1], thus making it an attractive
alternative to open surgery. However, while procedures
completed in this manner offer several advantages, the
inherent technical difficulty is significantly higher as the
distal dexterity is severely impaired by the long, rigid
instruments and gross movements are subject to a fulcrum
effect at the trocar port [2], as is illustrated in Fig. 1.
Ergonomically, the visualisation provided is misaligned with
the motor axis and is often through a monoscopic display,
which lacks depth perception. This often leads to fatigue,
poor hand-eye coordination and increased surgical errors [3].
Clinically, several robotic platforms have been developed
to overcome these difficulties. The daVinci surgical robot,
for example, operates as a tele-manipulator, where the
surgeon controls miniaturized slave instruments on three or
four robotic arms via a master console [4]. The system
successfully tackles some of the traditional difficulties
associated with MIS by providing stereoscopic visualisation,
an ergonomic seating position, improved distal dexterity,
motion scaling and tremor filtering at 6Hz.
In order to operate along curved anatomical pathways and
access regions which are not in a direct line of sight from the
incision point, there is currently increasing research interest
into the development of flexible or articulated robotic
systems. Example systems include the Highly Articulated
Robotic Probe (HARP) for epicardial atrial ablation [5], a
“snake” like robotic system designed to provide additional
dexterity at the instrument tip for Ear, Nose and Throat
(ENT) surgery [6], and a high-dexterity, modular instrument
for coronary artery bypass grafting [7]. Systems with
alternative white light and fluorescence imaging [8] have
also been proposed. A natural extension of such systems is
A Stereoscopic Fibroscope for Camera Motion and 3D Depth
Recovery during Minimally Invasive Surgery
David P. Noonan, Peter Mountney, Daniel S. Elson, Ara Darzi and Guang-Zhong Yang
A
2009 IEEE International Conference on Robotics and Automation
Kobe International Conference Center
Kobe, Japan, May 12-17, 2009
978-1-4244-2789-5/09/$25.00 ©2009 IEEE 4463
Authorized licensed use limited to: Imperial College London. Downloaded on May 10,2010 at 10:57:23 UTC from IEEE Xplore. Restrictions apply.
the provision of camera position with simultaneous 3D scene
reconstruction through stereo vision so that advanced
functions such as adaptive motion stabilisation, augmented
reality, active constraints and dynamic view expansion can
be deployed [9] [10] [11]. The stereo fibre image guide
based system described in this paper is ideally placed for
integration with such flexible systems where miniaturization
is a key requirement.
B. Simultaneous Localization and Mapping (SLAM)
Estimating the position of a camera relative to its
environment and a 3D model of that environment is an
important and challenging problem in robotic vision. The
ability of SLAM to build long term maps and remain robust
to drift has led to the development of many systems using a
variety of hardware from ultrasound to laser range finders
and cameras. The majority of these systems have been
developed for mobile robots navigating in urban
environments, and the size of the hardware is not compatible
with robotic assisted surgery. It has been shown that optical
approaches can be used to recover 3D structure in MIS [10,
11]. Such approaches are non invasive and make use of
hardware which is already available during surgery.
However, these methods face a number of challenges due to
the complexity of the environment. 1) Features on the
surface of tissue may be sparse and change in appearance as
the anatomical feature may be below the surface. 2) Specular
highlights need to be detected and ignored, they may also
occlude features. 3) The lighting conditions can vary
significantly, changing the appearance of features. 4) Tissue
is not rigid and can deform as a result of respiration, cardiac
motion and tissue tool interaction.
The use of miniaturised fibre bundles introduces additional
challenges in the form of image resolution. Pixel count is
compromised for a reduction in bend radius of the fibre
bundles, thus leading to low quality images. Additionally,
SLAM is made more challenging due to the small baseline
between the stereo pair and the short working distance and
limited field-of-view of the GRIN lens, which is used to
focus the light into the bundles.
In [12], we demonstrated the principal that SLAM could
be used in MIS with high quality stereo cameras in a rigid
laparoscope. In [13], a monocular SLAM system is
presented for ENT surgery, however the environment
mapped is small and features appear to be in the scene the
entire time and no loops are closed. A system developed by
[11] is used to map larger areas however the approach relies
on the use of an Optotrak to track the laparoscope making
the assumption that the scope is rigid.
While the previous work utilized the high quality images
captured using the stereo laparoscope of the daVinci system,
an equivalent image resolution and field-of-view is not
currently available with flexible fibre image guides. As such
the system described in this paper was developed to identify
and overcome technical difficulties from mechanical,
calibration and software algorithm perspectives, in order to
evaluate the feasibility of accurate camera localisation and
tissue mapping.
Figure 1: Schematic illustration of a typical endoscope motion in-vivo.
II. EXPERIMENTAL SETUP
A. Mechanical & Optical System Design
The stereo video sequences used in this paper were
recorded using free-hand data acquisition with a custom
stereo fibroscope test-rig as shown in Fig. 2. The system was
designed to allow for the acquisition of stereo images using
fibre image guides and to facilitate the validation of the
algorithms which were then experimentally tested on the
resulting images.
Figure 2: Schematic illustration of the stereo fibroscope indicating the
location of 1) 3-axis joint to allow for free-hand camera motion, 2) Rigid
body to mount optical tracking markers to provide ground truth data for
camera motion validation, 3) Protective tubing for fibre bundles 4) 10,000
pixel coherent fibre image guide (x2) 5) Grub screw (x2) to adjust camera
vergence and 6) Tubing path to image acquisition system. The camera
baseline, b, of 3.8mm is also marked.
The system features stereo flexible, coherent fibre image
guides (Sumitomo IGN-05/10, 10,000 fibres, length 1.5 m,
diameter 0.59mm, min. bend radius 25mm) running down a
rigid shaft of diameter 10mm in a configuration similar to a
laparoscope. The fibres, (4) in Fig. 2, are housed in twin
protective polyurethane sheaths (3) which are clamped both
within the shaft and just prior to an optical mounting stage.
The fibres exit the sheaths and are clamped into place before
passing through twin adjustable distal tip mounting arms.
The separation of the arms (and thus the distance between
the fibres) can be adjusted using grub screws threaded
through the aluminium outer casing of the shaft (5). This
allows the baseline, b, and vergence of the stereo pair to be
adjusted as required. The baseline used during the
experiments described in this paper was 3.8mm. A graded
index (GRIN) lens (Grintech GmbH) is cemented onto the
end of each image guide (diameter 0.5mm, working distance
10mm, NA 0.5) to image an area of 35×35mm2 at a working
distance of 20mm onto the distal end of each image guide.
4464
Authorized licensed use limited to: Imperial College London. Downloaded on May 10,2010 at 10:57:23 UTC from IEEE Xplore. Restrictions apply.
The fibres are both clamped into a single fibre mount and
imaged onto a CCD camera (UEye, UI-2250-C/CM) using
an achromatic ×10 microscope objective and 100mm focal
length lens, as shown in Fig. 3.
Figure 3: Schematic illustration of the optical setup. The flexible image
guides are housed in a custom clamp attached to an XY positioning stage.
This allows for fine focussing of the images onto the objective lens and thus
the CCD. Both left and right images are captured on one CCD and
segmented offline.
Focussing of the fibres is performed by adjusting the
position of the fibre mount. This is achieved with
micrometre precision using an XY positioning stage. The
fibre bundles are pivoted around a point 315mm from their
distal tips (close to the first image plane) to allow free-hand
rotations in a manner similar to a laparoscope passing
through a trocar port.
For validation, a removable rigid body with four optical
tracking markers was attached 117mm from the distal tip of
the fibroscope. This aspect of the system will be further
discussed in Section III.
The following calibration steps were then required to
allow for data acquisition:
The orientation of the camera co-ordinate system in the
left camera image was defined manually to account for
the arbitrary rotation about the camera co-ordinate
system’s z-axis which occurs due to the rotation of the
fibre bundle between its two clamping points
To account for this same arbitrary rotation about the z-
axis in the right image, its camera co-ordinate system
was calibrated to co-align with that of the left image
Stereo camera calibration was performed to calculate the
intrinsic and extrinsic parameters and to correct for non-
linear radial lens distortion [14]. This step was performed
manually due to the low resolution causing failure of the
automatic corner detection
A hand-eye calibration to compute the relative rotation
and translation from the rigid body to the left camera
centre was then performed using the technique proposed
by Tsai and Lenz [15].
Example stereo images taken with the system on both an ex-
vivo porcine tissue sample and a silicone soft tissue phantom
are shown in Fig. 4. The completed system, showing the
fibroscope, rigid body and the optical system, is depicted in
Fig. 5.
Figure 4: Sample images captured with the stereo fibroscope of ex-vivo
porcine tissue (left) and a silicone soft tissue phantom (right).
Figure 5: Image showing the complete system. The optical setup including
fibre mount, objective lens and camera is shown on the lower left. The rigid
body used for validation purposes is shown in the top right.
B. SLAM Algorithm Design
A SLAM approach was adopted using an Extended
Kalman Filter system and stereo images giving 6DOF
SLAM similar to [12]. A “constant velocity, constant
angular velocity” motion model is used with a deterministic
and a stochastic element to model unknown user motion.
1) Map management
The type of tissue or organ, distance between camera and
tissue and the illumination all affect the visual appearance of
the tissue. This problem is exacerbated by the limited
resolution of fibroscopes. To cope with this challenging
environment and improve runtime performance a sparse
feature map is used tracking up to 20 features at a time.
Features are detected using a Difference of Gaussian
detector, matched in the right and left image by searching
along the epipolar line and using a normalized cross
correlation. Outliers were removed using RANSAC. The
features were triangulated to estimate their 3D position
relative to the camera. This position was then reprojected
into the image plane and features with a large reprojection
error were rejected. In initial experiments we found that due
to the visual appearance of tissue the features clustered
around one or two regions in the image leading to poor
quality maps making accurate localization difficult. It has
been shown [16] that using a fish eye lens to increase the
field of view can improve SLAM, however here we are
limited to a small field-of-view and short working distance,
making feature selection and map management more
important. Ideally we want to observe the same features for
as long as possible in order to reduce uncertainty in the
features 3D position. We found the best approach to this
problem was to use features close to the edge of the image.
Although this makes map building and localisation more
robust, changes in illumination alter the appearance of
features making tracking more challenging. Specular
highlights can cause significant problems during tracking in
4465
Authorized licensed use limited to: Imperial College London. Downloaded on May 10,2010 at 10:57:23 UTC from IEEE Xplore. Restrictions apply.
a MIS environment. Specular highlights in the images were
detected using a manually defined threshold in the HSV
colour space.
2) 3D Surface reconstruction
The solid surface representation is generated by
performing Delaunay triangulation on the SLAM map. This
meshing approach provides an estimate for every 3D point
within the observed and mapped environment. The mesh is
textured with images taken from the left fibroscope to build
up a realistic representation of the environment. Image
rectification is performed before the textures are applied to
the mesh in order to remove distortion. To improve the
visual appearance of the 3D reconstruction we search for
images which cover the largest number of points in the map
in order to generate models which are more consistent.
3) Honeycomb artifact removal
The light directed down the two image guides of the
fibroscope is captured by a distal CCD camera. As a result,
the structure of the individual fibres is visible in the image as
a honey comb structure (see Fig. 6), which can adversely
effect feature detection and tracking. Several different
approaches have been proposed for removal of the
honeycomb effect or defocusing the proximal imaging
optics, including estimations based on Bayer CCD patterns
and shaped Fourier filters [17] aimed at estimating the honey
comb structure.
Figure 6: a) Original test image captured by fibre bundle b) Test image after
honeycomb removal c) Original test image d) Fourier of original image e)
band pass filter applied in Fourier domain f - top) close up of (b); f- bottom)
close up of (a)
During the experiments, we found that sub-millimetre
movements of the optics relative to the CCD could lead to
changes in the image and movement of the honeycomb
structure on the CCD chip. As we could not rely on the
honeycomb structure being static relative to the camera we
needed a reliable and robust approach which did not need re-
calibrating each time the system is used. We found that
sufficient image restoration for tracking could be achieved
using a band pass filter [18] in the Fourier frequency space
followed by Gaussian smoothing. This successfully removed
the structure in the image without affecting the performance
of the tracker.
4) Feature tracking
Tracking tissue features is challenging as they may be
sparse, varying with different lighting conditions and
affected by specular highlights. Furthermore, the images
acquired by the proposed system are low in resolution due to
the limited number of fibres used. The intensity of the light
transmitted by the fibre bundles can vary leading to changes
in the visual appearance of features. To cope with this
environment a feature tracking system similar to [19] was
used in an active search context. This approach adapts to the
image content learning features online and directly from the
image space. The method is particularly suitable for MIS
images where features appear similar and may not be
globally distinctive. This approach learns the most
discriminative information for feature tracking, allowing it
to robustly track locally unique features. The approach has
been extended in this paper to include synthetically
generated data. Synthetic data is generated by warping the
image patch around the detected feature with an affine
transformation in order to make the feature tracking more
robust and able to track reliably from a single learning
frame. This is important for fiberscopic images because the
field of view is small and features may only appear for a
short period of time.
III. VALIDATION SETUP
A. Camera Motion Ground Truth Acquisition
In order to validate the accuracy of the camera motion as
reconstructed by the SLAM algorithm, a rigid body was
attached approximately 117mm from the distal tip of the
stereo fibroscope using four optical tracking markers
(Northern Digital Inc, Ontario, Canada). A Rigid Body co-
ordinate system () was defined at the origin of the
four markers. The position and orientation of this system
with respect to the World co-ordinate system () is
known at all instances in time. The measured rotations and
translations of this rigid body w.r.t. the world co-ordinate
system were transformed to the camera co-ordinate system
using the following transformation:
 



Where:
 is provided by the optical tracker.

 is obtained using a Hand-Eye transformation from the
origin of  to the camera centre of the left fibre
bundle,

 . This was performed using techniques
similar to [15].
B. 3D Model Validation
To validate the SLAM algorithm, a silicone soft tissue
phantom was constructed and latex paints were used to
simulate specular reflections. A Computed Tomography
(CT) scan of the phantom was performed in order provide
ground truth. Prior to scanning, the model was embedded
with CT visible markers which were easily identifiable in the
resulting scan. During the data acquisition phase of the
experiment the location of each of the markers was
identified using a stylus which contained a second rigid body
of four optical tracking markers with a co-ordinate system
. This allowed for each of the markers to be identified
4466
Authorized licensed use limited to: Imperial College London. Downloaded on May 10,2010 at 10:57:23 UTC from IEEE Xplore. Restrictions apply.
with respect to the world co-ordinate system and thus the
camera co-ordinate system.
A comparison between the surface of the CT
reconstruction and the point map generated by the SLAM
algorithm was performed. This required a process to find
points on the surface of the CT which corresponded to the
3D features in the SLAM map. Features detected in the
image were projected into the registered CT model from the
camera’s position given by the Optotrak. The projected ray
was traced through the 3D CT model to detect the first plane
it intersects. This point is taken to be the corresponding point
in the CT surface.
IV. RESULTS
A. Camera Motion
Fig. 7 shows four reconstructed surfaces from the video
sequence. The blue line represents the ground truth camera
trajectory and the green line represents the trajectory
reconstructed by the SLAM algorithm. The stereo fibroscope
was moved by hand to explore unknown regions of the
phantom and to close a loop. It can be seen that the loop was
successfully closed. As the fibroscope moves into unknown
regions towards the end of the trajectory, an error
propagation leads to a small amount of drift being
introduced to the position estimate. This in part can be
attributed to the low resolution of the camera limiting the 3D
reconstruction and tracking accuracy.
Figure 7: Ground truth camera trajectory (blue) and SLAM reconstructed
camera trajectory (green) at four different frame intervals
Fig. 8 illustrates the trajectories when decomposed into
motions along the x, y and z axes for 1400 frames. The
absolute error in the three different axes was 1.94mm,
0.7mm and 1.7mm respectively. There was no rotation
around the z axis and only a minimal amount of rotation
around the x and y axes.
An additional ex-vivo experiment was carried out using
excised porcine tissue. The camera was moved in a motion
in which to close a loop and continue to explore in a similar
manner to the phantom experiment. As shown in Fig. 9, the
loop was successfully closed by the SLAM algorithm.
Figure 8: Trajectories decomposed into individual X, Y and Z components.
The ground truth from optical tracking markers is shown in blue and the
motion as reconstructed by the SLAM algorithm is shown in green
Figure 9: Reconstructed 3D surface and camera motion as generated by the
SLAM on an ex-vivo porcine tissue sample.
B. Surface Reconstruction
Fig. 10 illustrates the 3D surface generated by the SLAM
algorithm (right) alongside the ground truth 3D surface
extracted from the CT scan of the phantom (left) from three
different views. It can be seen that the scale, orientation and
geometry of the surfaces are visually similar. Local
differences in geometry can be attributed to the sparseness of
the SLAM map. Due to the meshing of the sparse map,
which fits planes between points, the recovered surface is
likely to be less accurate at representing local changes in
geometry. The overall reconstruction errors for the surface
for x,y and z are 2mm, 1.3mm and 2.9mm respectively. The
surface was approximately 35mm from the camera position
during the data capture. The reconstruction error is larger in
the z axis as expected since the resolution of the images and
small baseline between the fibre image guides makes stereo
triangulation less accurate.
f = 50 f = 300 f = 700 f = 1400
f = 50 f = 150 f = 500 f = 900
4467
Authorized licensed use limited to: Imperial College London. Downloaded on May 10,2010 at 10:57:23 UTC from IEEE Xplore. Restrictions apply.
Figure 10: Reconstructed 3D surface as generated by the SLAM (right) and
co-registered CT ground truth data (left)
V. CONCLUSION
The paper demonstrates the feasibility of integrating twin
flexible fibre image guides in a stereo configuration to
capture images in an MIS environment. The challenges
overcome to construct and calibrate this bespoke image
system, along with image enhancement and robust feature
tracking techniques are presented. The resulting images were
successfully employed by a SLAM algorithm to both track
camera pose and motion and generate a 3D model of the
environment. It was anticipated that the limited resolution
offered by coherent fibre bundles might make this approach
infeasible. However, although the image resolution does
affect the final results, they clearly demonstrate that such an
approach is possible.
One of the limitations of a feature based optical approach
is that it can be affected by the paucity of tissue surface
features. One potential solution is to use structured light. The
sparse map limits the 3D model reconstruction accuracy.
This information could be improved by including dense
reconstruction information or combining it with other
approaches such as shape from shading. The next major
challenge to address for this system is that of tissue
deformation. Deformation occurs due to tool-interaction,
respiration and cardiac induced tissue motion. This can
violate the static world assumption made by SLAM.
Although the current system can cope with a very small
amount of deformation, as this increases the 3D map will be
inaccurate because it does not represent the deformation and
the fibroscope position estimate will be less accurate. One
potential application of the proposed framework is within a
catheter which utitilizes the stereo vision for targeting and
depth information for accurate focused energy delivery.
ACKNOWLEDGMENT
The authors would like to thank the Hamlyn Centre for
Robotic Surgery for funding this proof-of-concept study and
Drs Andrew Davison, Danail Stoyanov and Phillip Edwards
for their support and advice.
REFERENCES
[1] K. H. Fuchs, "Minimally Invasive Surgery," Endoscopy, vol. 34, pp.
154-159, 2002.
[2] I. Crothers, A. Gallagher, N. McClure, D.T.D. James, and J.
McGuigan, "Experienced laparoscopic surgeons are automated to the
"fulcrum effect": an ergonomic demonstration," Endoscopy, vol. 318,
pp. 365-369, 1999.
[3] O. Elhage, D. Murphy, B. Challacombe, A. Shortland, and P.
Dasgupta, "Ergonomics in Minimally Invasive Surgery "
International Journal of Clinical Practice, vol. 61, pp. 181-188,
2007.
[4] P. Dario, B. Hannaford, and A. Menciassi, "Smart Surgical Tools and
Augmenting Devices," IEEE Transactions on Robotics and
Automation, vol. 19, pp. 782-792, 2003.
[5] T. Ota, A. Degani, B. Zubiate, A. Wolf, H. Choset, D. Schwartzman,
and M. Zenati, "Epicardial Atrial Ablation Using a Novel Articulated
Robotic Medical Probe Via a Percutaneous Subxiphoid Approach,"
Innovations: Technology & Techniques in Cardiothoracic &
Vascular Surgery, vol. 1, pp. 335-340, 2006.
[6] N. Simaan, R. Taylor, and P. Flint, "A Dexterous System for
Laryngeal Surgery," in International Conference on Robotics and
Automation, 2004, pp. 351-357.
[7] D. Salle, P. Bidaud, and G. Morel, "Optimal Design of High
Dexterity Modular MIS Instrument for Coronary Artery Bypass
Grafting," in IEEE International Conferene on Robotics and
Automation, 2004, pp. 1276-1281.
[8] D. P. Noonan, D. Elson, G. Mylonas, A. Darzi, and G.-Z. Yang,
"Laser Induced Fluorescence and Reflected White Light Imaging for
Robot-Assisted Minimally Invasive Surgery," IEEE Transactions on
Biomedical Engineering, 2008. In Press.
[9] G. Mylonas, K.-W. Kwok, A. Darzi, and G.-Z. Yang, "Gaze-
Contingent Motor Channelling and Haptic Constraints for Minimally
Invasive Robotic Surgery," in MICCAI, 2008, pp. 347-355.
[10] D. Stoyanov, A. Darzi, and G.-Z. Yang, "Dense 3D Depth Recovery
for Soft Tissue Deformation During Robotically Assisted
Laparoscopic Surgery," in MICCAI, 2004, pp. 41-48.
[11] C. Wengert, L. Bossard, A. Haberling, C. Baur, G. Szekely, and P. C.
Cattin, "Endoscopic Navigation for Minimally Invasive Suturing," in
MICCAI, 2007, pp. 620-627.
[12] P. Mountney, D. Stoyanov, A. Davison, and G.-Z. Yang,
"Simultaneous Stereoscope Localization and Soft-Tissue Mapping
for Minimal Invasive Surgery," in MICCAI, 2006, pp. 347-354.
[13] D. Burschka, M. Li, M. Ishii, R. H. Taylor, and G. D. Hager, "Scale-
invariant registration of monocular endoscopic images to CT-scans
for sinus surgery," in MICCAI, 2004, pp. 413-426.
[14] Z. Zhang, "A Flexible New Technique for Camera Calibration,"
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 22, p. 1330-1334, 2000.
[15] R. Tsai and R. Lenz, "Real Time Versatile Robotic Hand/Eye
Calibration using 3D Machine Vision," in IEEE International
Conference on Robotics and Automation, 1988, pp. 554-561.
[16] A. J. Davison, Y. G. Cid, and N.Kita, "Real-Time 3D SLAM with
Wide-Angle Vision " in IFAC Symposium on Intelligent Autonomous
Vehicles, 2004.
[17] C. Winter, S. Rupp, M. Elter, C. Munzenmayer, H. Gerhauser, and T.
Wittenberg, "Automatic adaptive enhancemnt for images obtained
with fiberscopic endoscopes," IEEE Transactions on Biomedical
Engineering, vol. 53, pp. 2035-2046, 2006.
[18] M. M. Dickens, D. J. Bornhop, and S. Mitra, "Removal of Optical
Fiber Interference in Color Micro-Endoscopic Images," in 11th IEEE
Symposium on Computer Based Medical Systems, 1998, p. 246.
[19] P. Mountney and G.-Z. Yang, "Soft Tissue Tracking for Minimally
Invasive Surgery: Learning Local Deformation Online," in MICCAI,
2008, pp. 364-372.
4468
Authorized licensed use limited to: Imperial College London. Downloaded on May 10,2010 at 10:57:23 UTC from IEEE Xplore. Restrictions apply.
... Its applications span across various domains, such as the military, entertainment, sports, and healthcare, while also playing a crucial role in validating computer vision systems and enhancing robotic performance. [6] [7] Particularly in the realms of filmmaking and video game development, motion capture is instrumental in recording the actions of human actors and translating this data into the animation of digital character models, be they 2D or 3D representations. [5] When it extends to capturing subtle facial expressions and intricate finger movements, it's often termed "performance capture." ...
Article
Motion capture is a pivotal technology that has revolutionized various industries, from entertainment and sports to healthcare and robotics.This paper provides a concise overview of motion capture technology, including its methods, applications, and the associated advantages and disadvantages. Motion capture is a versatile tool with applications in fields such as entertainment, healthcare, and robotics. The advantages of precision and efficiency it offers are balanced by challenges related to cost, equipment, and privacy concerns. Understanding these aspects is crucial for harnessing the full potential of motion capture in several trades.
... The position of the pedestrian in the previous frame will be obtained from the average of the pedestrian positions in the two current frames. We apply PaddlePaddle-YOLO (PP-YOLO) to detect the position of pedestrians [9]. After the PP-YOLO detection, the identified pedestrians are marked by the bounding box. ...
Chapter
We design a 3D Motion Capture Animation Synthesis and Compression pipeline that allows reproducing people’s movement in a 3D environment and compressing it at ultra-low bitrates. The method deploys a stage-wise strategy. A surveillance video is used as the input to obtain the movement trajectory by adopting an object detection algorithm. The acquired trajectory is lifted to 3D space by spatial transformations. Based on the reference MoCap animation, an adaptive animation synthesis algorithm processes the input through position and rotation correction; frame interpolation and deletion; trajectory smoothing. Finally, a perceptually adaptive compression algorithm driven by perplexity is applied for MoCap animation compression. Experimental results demonstrate that our animation synthesis method eliminates artifacts, such as foot-sliding, while ensuring accurate position and smooth transition. Our proposed compression algorithm can achieve results that are perceptually similar compared to the PCA-based benchmark, while requiring only about one-tenth of the storage space. KeywordsAnimationMulti-modal integration
... Although the traditional methods of photographing, recording, and video recording are convenient for collection and production, they cannot record the body movements of dancers in detail, let alone carry out scientific analysis and research on them, and apply them to choreography and creation. erefore, in view of the lack of two-dimensional recording technology of text, photos, and videos, starting from the characteristics of dance, we use my country's motion capture technology to carry out all-round threedimensional digital minority dance protection, protect minority dance, and provide a digital platform for future film and television programs [1][2][3]. It expands the scope of intangible cultural heritage protection and has far-reaching implications for the preservation and scientific advancement of national culture. ...
Article
Full-text available
Ethnic dance is a part of minority culture and an important chapter of intangible cultural heritage. Because the art of dance does not exist in a certain physical form in a certain environment, it is difficult to record and preserve dance. In traditional recording work, dancers' dance movements are mostly recorded through text, photos, videos, etc. However, these methods can only record and preserve dance movements in a two-dimensional manner from a limited number of angles and cannot achieve accurate and comprehensive record protection of dance poses. Moreover, data preserved by traditional methods can only be simply copied and transferred, and cannot be exploited, developed, or innovated. Therefore, motion capture technology can be applied to the protection of ethnic dance art, and motion capture technology can be used to obtain 3D gestures of ethnic minority dance spaces, restore and optimize digital movements, such as establishing a national dance art movement technology database, and combine the protection of the original ecological cultural atmosphere. It can accurately and comprehensively preserve the essence of ethnic minority dance art and provide effective reference for many ethnic minority dances with ethnic characteristics, digital protection of sports, and the future.
... This motion capture involves parts of the human body such as the face, legs and other objects that can be moved. Taking part of this object will also be divided by sub-points and imitated for artificial objects or what can be called motion matching [5]. ...
Conference Paper
Full-text available
Motion capture is a motion assisting technology that can resemble the motion of objects being captured and is now widely used for film development, especially in visual effects or animation. From that, the author has the idea, namely to use a Blender application to implement a markerless facial motion capture on motion of object 3D modeling based on webcam because it didn’t have a menu or feature for markerless motion capture. Markerless is meant to process the assignment of annotation points to objects with a computer system so that this method is very simple and saves time in its operation, especially in facial motion capture. In this research, the face detection method uses haar cascade and to provide facial landmark annotation using detector landmark lbf model. We are using an example of 3D modeling a human face that has a bone on the neck, head, mouth, eyes, and eyebrows that will follow the movement of the human face. The combination of face landmark detection and rigging in 3D modeling a human face uses the transformation geometry method to produce real-time motion. From this research study, 3D animated face objects can follow motion on real face targets but there is still noise.
... The method relies solely on sequential visual acquisition to compute 6 DOF camera motion. The feasibility of using a stereoscopic fibroscope to recover camera motion and 3D scene information was studied by Noonan et al. [58]. In this work, image transmission from distal fibroscope tip to proximally mounted CCD camera was designed to concurrently track stereo imaging instruments and mapping of the soft tissue surface. ...
Article
Full-text available
Computer vision is an important cornerstone for the foundation of many modern technologies. The development of modern computer‐aided‐surgery, especially in the context of surgical navigation for minimally invasive surgery, is one example. Surgical navigation provides the necessary spatial information in computer‐aided‐surgery. Amongst the various forms of perception, vision‐based sensing has been proposed as a promising candidate for tracking and localisation application largely due to its ability to provide timely intra‐operative feedback and contactless sensing. The motivation for vision‐based sensing in surgical navigation stems from many factors, including the challenges faced by other forms of navigation systems. A common surgical navigation system performs tracking of surgical tools with external tracking systems, which may suffer from both technical and usability issues. Vision‐based tracking offers a relatively streamlined framework compared to those approaches implemented with external tracking systems. This review study aims to discuss contemporary research and development in vision‐based sensing for surgical navigation. The selected review materials are expected to provide a comprehensive appreciation of state‐of‐the‐art technology and technical issues enabling holistic discussions of the challenges and knowledge gaps in contemporary development. Original views on the significance and development prospect of vision‐based sensing in surgical navigation are presented.
... The system is based on a DLP-based projector coupled to an imaging fiber bundle in combination with a chip-on-tip camera. The size of the sensor head is relatively large with dimensions of 20 30 150 mm and features a triangulation angle of 45 .Noonan et al.[54] present an algorithm for 3-D captures based on a fiberoptic stereo endoscope. As the technique is based on stereo-photogrammetry (see Section 3.2.6), the surface needs to exhibit distinct texture features for accurate 3-D reconstruction. ...
... When it involves face and fingers or captures subtle expressions, it is often referred to as performance capture. In many areas, motion capture is sometimes referred to as motion tracking, but in film and game making, motion tracking usually refers more to matching motion [3]. ...
Article
Full-text available
In the 21st century, much progress has been made following the industrial revolution 4.0. Therefore, the field of animation has also been injected into technological advances. The transmission of information is difficult to convey accurately and interactively to users. In providing users with more exposure and convenience in a more attractive and convenient way. Therefore, animation and motion capture are the best options for use today. As a result, the results of this project can have a huge impact on the animation industry in making video or animation more interesting.
... Commonly used in human motion studies, movement sciences, and free-body movement (unmanned aerial vehicles, UAVs), these systems use infrared-reflective markers that triangulate a body to which three or more markers are attached. The downside of the MOCAP system is that it must be set up in a static location and limited to a pre-calibrated collection volume, with multiple cameras covering the entire working area so as to provide full coverage of the object as it moves (Noonan et al., 2009). If the HoloLens provides an accurate understanding of user movement and entropy, it will allow the collection of non-invasive and non-obtrusive data about movement and workload in many more settings, with fewer constraints. ...
Article
The next phase of augmented reality (AR) technologies suggest that as both the hardware and software continue to improve, we can expect that AR will become more commonly used as a tool for a variety of applications in complex operational contexts (i.e. training, manufacturing, mission planning). As new applications are designed and developed within these contexts, there is a necessity to be able to measure the effectiveness of these systems and to understand their impact on human performance and workload, so that only the most appropriate designs are selected for use, growing the technology in usefulness, not novel hindrances. A unique opportunity presented by the Microsoft HoloLens platform, as an example of head-worn AR systems, is the ability to collect positional and movement data, which lends itself to the computation of behavioral (or steering) entropy data, which can be related to human workload and performance within the system environment. However, little reference exists to be able to verify the accuracy of tracking of the device with regards to the output data available for collection. Within this practitioner-oriented paper, we extend current entropy measurement theory typically used in control settings within a heads-up display type ‘controls' environment. Our findings indicate that in-situ measurements of entropy utilizing the onboard sensors within the AR platform are more accurate to those collected within a Motion Capture Facility. Extending this work, these measurements can be used as a correlate of performance.
Article
An approach has been developed to the synthesis of systems for measuring the shape of hard-to-reach parts with obtaining information about the three-dimensional shape. Existing methods and installations for solving such problems are considered. A prototype of a small-sized high-speed system for measuring flaw detection was obtained using the proposed approach. The method of stereo reconstruction using structured laser illumination is considered, which makes it possible to obtain information about the three-dimensional shape of a smooth surface the image of which does not have a sharp contrast drop. The system was tested on model objects.
Chapter
Full-text available
In the context of digitalization and Industry 4.0, the world of work is changing comprehensively. Smaller lot sizes and increasing variability of products in the modern industrial production present new challenges for operators working in manual assembly. Industrial assistance systems help the worker during these production tasks to enhance their capabilities. The development of these systems is not only characterized by questions of the potential feasibility of new technical systems but also by the possibilities of a closer cooperation between humans and machines with the aim to synergize the outstanding abilities of humans with the special features of machines to bring together the best from both worlds. This chapter presents solutions for human–machine interaction and automation and delivers insight into different possibilities to enhance the various types of operator’s skills in industrial assembly. With this knowledge, each worker can be individually equipped with suitable supporting systems in order to be best prepared for future challenges in the daily production.
Conference Paper
Full-text available
We present a scale-invariant registration method for 3Dstructures reconstructed from a monocular endoscopic camera to pre-operative CT-scans. The presented approach is based on a previously presented method [2] for reconstruction of a scaled 3D model of the environment from unknown camera motion. We use this scaleless reconstruction as input to a PCA-based algorithm that recovers the scale and pose parameters of the camera in the coordinate frame of the CT scan. The result is used in an ICP registration method to refine the registration estimates. The presented approach is used for localization during sinus surgeries. It simplifies the navigation of the instrument by localizing it relative to the CT scan that was used for pre-operative procedure planning. The details of our approach and the experimental results with a phantom of a human skull are presented in this paper.
Conference Paper
Full-text available
Recovering tissue deformation during robotic assisted minimally in- vasive surgery is an important step towards motion compensation and stabiliza- tion. This paper presents a practical strategy for dense 3D depth recovery and temporal motion tracking for deformable surfaces. The method combines image rectification with constrained disparity registration for reliable depth estimation. The accuracy and practical value of the technique is validated with a tissue phantom with known 3D geometry and motion characteristics. It has been shown that the performance of the proposed approach compares favorably against existing methods. Example results of the technique applied to in vivo robotic assisted minimally invasive surgery data are also provided.
Article
Full-text available
This paper presents an articulated robotic-controlled device to facilitate large-area in vivo tissue imaging and characterization through the integration of miniaturized reflected white light and fluorescence intensity imaging for minimally invasive surgery (MIS). The device is composed of a long, rigid shaft with a robotically controlled distal tip featuring three degrees of in-plane articulation and one degree of rotational freedom. The constraints imposed by the articulated section, coupled with the small footprint available in MIS devices, require a novel optical configuration to ensure effective target illumination and image acquisition. A tunable coherent supercontinuum laser source is used to provide sequential white light and fluorescence illumination through a multimode fiber (200 microm diameter), and the reflected images are transmitted to an image acquisition system using a 10,000 pixel flexible fiber image guide (590 microm diameter). By using controlled joint actuation to trace overlapping trajectories, the device allows effective imaging of a larger field of view than a traditional dual-mode laparoscope. A first-generation prototype of the device and its initial phantom and ex vivo tissue characterization results are described. The results demonstrate the potential of the device to be used as a new platform for in vivo tissue characterization and navigation for MIS.
Conference Paper
Full-text available
The use of master-slave surgical robots for Minimally Invasive Surgery (MIS) has created a physical separation between the surgeon and the patient. Reconnecting the essential visuomotor sensory feedback is important for the safe practice of robotic assisted MIS procedures. This paper introduces a novel gaze contingent framework with real-time haptic feedback by transforming visual sensory information into physical constraints that can interact with the motor sensory channel. We demonstrate how motor tracking of deforming tissue can be made more effective and accurate through the concept of gaze-contingent motor channelling. The method also uses 3D eye gaze to dynamically prescribe and update safety boundaries during robotic assisted MIS without prior knowledge of the soft-tissue morphology. Initial validation results on both simulated and robotic assisted phantom procedures demonstrate the potential clinical value of the technique.
Conference Paper
Full-text available
Accurate estimation and tracking of dynamic tissue deformation is important to motion compensation, intra-operative surgical guidance and navigation in minimally invasive surgery. Current approaches to tissue deformation tracking are generally based on machine vision techniques for natural scenes which are not well suited to MIS because tissue deformation cannot be easily modeled by using ad hoc representations. Such techniques do not deal well with inter-reflection changes and may be susceptible to instrument occlusion. The purpose of this paper is to present an online learning based feature tracking method suitable for in vivo applications. It makes no assumptions about the type of image transformations and visual characteristics, and is updated continuously as the tracking progresses. The performance of the algorithm is compared with existing tracking algorithms and validated on simulated, as well as in vivo cardiovascular and abdominal MIS data. The strength of the algorithm in dealing with drift and occlusion is validated and the practical value of the method is demonstrated by decoupling cardiac and respiratory motion in robotic assisted surgery.
Conference Paper
Flexible micro-endoscopes produce images that are obscured by a `honeycomb' pattern due to the negative space between the individual optical fibers contained in the imaging conduit. This pattern is found to exhibit a spatial frequency that is visibly distinct from that of the imaged object. By applying a frequency filter, it was possible to remove the honeycomb pattern without significant degradation to the visual quality of the image. This process greatly increases the perceived quality of the information being obtained by the endoscopes and aids in their effective use for medical diagnosis. The technique described employs Fourier spectral analysis to determine the `noise' component in the original image. A discrete band-reject frequency filter was designed by visually examining the spectral information and creating the necessary filter to block out the undesired frequency bond. The honeycomb pattern was no longer distinguishable after applying this filter to a test gray-level image of an Air Force calibration target. Next, the filter was applied to the individual color planes of a sample color image. The color planes were recombined to produce a full-color image that was free from interference. A full description of the methods involved is presented
Article
The inversion of the normal laparoscopic image around the Y-axis has been shown to facilitate the rate of learning of a laparoscopic task in novice subjects. The aim of this study was to investigate the effect of Y-axis image inversion on the performance of experienced laparoscopic surgeons; this had not been previously investigated. A total of 16 experienced surgeons,who had already carried out more than 50 operative laparoscopic procedures, and 16 novice participants, who had carried out no procedures, were required to make multiple defined incisions under laparoscopic laboratory conditions within ten 1-minute periods. Participants were randomly allocated to perform the task under either normal or Y-axis inverted imaging conditions (eight experienced surgeons and eight novices in each condition). Y-axis inversion had a significant detrimental effect on the performance of the surgeons, whilst facilitating the performance of novices. The surgeons however, adapted to the inverted condition rapidly, showing a significant improvement in performance over the ten trials. The Y-axis-inverted image has a detrimental effect on the performance of experienced surgeons, indicating that they have automated to the "fulcrum effect" of the abdominal wall on instrument manipulation. Y-axis-image inversion was found to facilitate significant learning trends, regardless of the participants' level of experience.
Article
During the last 10 years, minimally invasive surgery has influenced the techniques used in every specialty of surgical medicine. This development has not only led to the replacement of conventional procedures with minimally invasive ones, but has also stimulated surgeons to reevaluate conventional approaches with regard to perioperative parameters such as pain medication. However, two major drawbacks have emerged with the introduction of this new technique: firstly, the prolonged learning curve for most surgeons, in comparison with the learning process in open surgery; and secondly, increased costs due to investment in the equipment required and the use of disposable instruments, as well as longer operating times. In the various health-care systems around the world, these increased costs are not always compensated for by shorter hospital stays. This review focuses on major areas of indication for minimally invasive surgery in the gastrointestinal tract. These include functional disorders of the upper and lower gastrointestinal tract, obesity surgery, minimally invasive techniques in gastric and hepatobiliary surgery and in other solid organs, and laparoscopic colorectal surgery. The shortening of the hospitalization period has led to increasing use of outpatient laparoscopic surgery, and many centers specializing in day-care surgery are using these techniques. The frontiers are being pushed even further, as the size of the instruments is reduced to achieve better cosmetic results. Clinical research has also focused on the topic of expanding the indications for minimally invasive approaches in the elderly and in high-risk patients, to take advantage of the shorter hospital stays and reduced surgical trauma that are possible. A considerable amount of basic research has been carried out on the stress response during and after minimally invasive procedures, and an improved immune response with the minimally invasive approach has been observed, leading to better results after extensive oncological procedures. Robotic surgery and telesurgery involve new computer-aided methods that allow greater precision in surgical technique, as well as offering an opportunity to supply surgical skill and expertise remotely, over long distances. Minimally invasive surgical techniques are thus now fully established in routine use, and the indications are continuing to expand.
Article
In this paper, we present a novel method for intra-operative registration directly from monocular endoscopic images. This technique has the potential to provide a more accurate surface registration at the surgical site than existing methods. It can operate autonomously from as few as two images and can be particularly useful in revision cases where surgical landmarks may be absent. A by-product of video registration is an estimate of the local surface structure of the anatomy, thus providing the opportunity to dynamically update anatomical models as the surgery progresses. Our approach is based on a previously presented method [Burschka, D., Hager, G.D., 2004. V-GPS (SLAM):--Vision-based inertial system for mobile robots. In: Proceedings of ICRA, 409-415] for reconstruction of a scaled 3D model of the environment from unknown camera motion. We use this scaled reconstruction as input to a PCA-based algorithm that registers the reconstructed data to the CT data and recovers the scale and pose parameters of the camera in the coordinate frame of the CT scan. The result is used in an ICP registration step to refine the registration estimates. The details of our approach and the experimental results with a phantom of a human skull and a head of a pig cadaver are presented in this paper.