ChapterPDF Available

Interactive Endoscopy: A Next-Generation, Streamlined User Interface for Lung Surgery Navigation

Authors:

Abstract and Figures

Computer generated graphics are superimposed onto live video emanating from an endoscope, offering the surgeon visual information that is hiding in the native scene--this describes the classical scenario of augmented reality in minimally invasive surgery. Research efforts have, over the past few decades, pressed considerably against the challenges of infusing a priori knowledge into endoscopic streams. As framed, these contributions emulate perception at the level of the surgeon expert, perpetuating debates on the technical, clinical, and societal viability of the proposition. We herein introduce interactive endoscopy, transforming passive visualization into an interface that allows the surgeon to label noteworthy anatomical features found in the endoscopic video, and have the virtual annotations remember their tissue locations during surgical manipulation. The streamlined interface combines vision-based tool tracking and speech recognition to enable interactive selection and labeling, followed by tissue tracking and optical flow for label persistence. These discrete capabilities have matured rapidly in recent years, promising technical viability of the system; it can help clinicians offload the cognitive demands of visually deciphering soft tissues; and supports societal viability by engaging, rather than emulating, surgeon expertise. Through a video-assisted thoracotomy use case, we develop a proof-of-concept to improve workflow by tracking surgical tools and visualizing tissue, while serving as a bridge to the classical promise of augmented reality in surgery.
Content may be subject to copyright.
Interactive Endoscopy:
A Next-Generation, Streamlined User Interface
for Lung Surgery Navigation
Paul Thienphrapa1, Torre Bydlon1, Alvin Chen1, Prasad Vagdargi2,
Nicole Varble1, Douglas Stanton1, and Aleksandra Popovic1
1Philips Research North America, Cambridge, MA, USA
2I-STAR Lab, Johns Hopkins University, Baltimore, MD, USA
Abstract. Computer generated graphics are superimposed onto live
video emanating from an endoscope, offering the surgeon visual infor-
mation that is hiding in the native scene—this describes the classical
scenario of augmented reality in minimally invasive surgery. Research
efforts have, over the past few decades, pressed considerably against the
challenges of infusing a priori knowledge into endoscopic streams. As
framed, these contributions emulate perception at the level of the sur-
geon expert, perpetuating debates on the technical, clinical, and societal
viability of the proposition.
We herein introduce interactive endoscopy, transforming passive visu-
alization into an interface that allows the surgeon to label noteworthy
anatomical features found in the endoscopic video, and have the virtual
annotations remember their tissue locations during surgical manipula-
tion. The streamlined interface combines vision-based tool tracking and
speech recognition to enable interactive selection and labeling, followed
by tissue tracking and optical flow for label persistence. These discrete
capabilities have matured rapidly in recent years, promising technical vi-
ability of the system; it can help clinicians offload the cognitive demands
of visually deciphering soft tissues; and supports societal viability by
engaging, rather than emulating, surgeon expertise. Through a video-
assisted thoracotomy use case, we develop a proof-of-concept to improve
workflow by tracking surgical tools and visualizing tissue, while serving
as a bridge to the classical promise of augmented reality in surgery.
Keywords: Interactive Endoscopy ·Lung surgery ·VATS ·Augmented
Reality ·Human-Computer Interaction
1 Introduction and Motivation
Lung cancer is the deadliest form of cancer worldwide, with 1.6 million new
diagnoses and 1.4 million deaths each year, more than cancers of the breast,
prostate, and colon—the three next most prevalent cancers—combined. In re-
sponse to this epidemic, major screening trials have been enacted including the
Dutch/Belgian NELSON trial, the US NLST, and Danish trials. These studies
2 P. Thienphrapa et al.
found that proactive screening using low dose computed tomography (CT) can
detect lung cancer at an earlier, treatable stage at a rate of 71%, leading to
a 20% reduction in mortality [2]. This prompted Medicare to reimburse lung
cancer screening in 2015 and with that, the number of patients presenting with
smaller, treatable tumors was expected to rise dramatically. The projected in-
crease was observed within the Veterans Health Administration [17], and while
this population bears a heightened incidence of lung cancer due to occupational
hazards, the need to optimize patient care was foretold.
Surgical resection is the preferred curative therapy due to the ability to re-
move units of anatomy that sustain the tumor, as well as lymph nodes for stag-
ing. Most of the 100,000 surgical resections performed in the US annually are
minimally invasive, with 57% as video-assisted thoracoscopic surgery (VATS)
and 7% as robotic surgery [1]. Anatomically, the lung follows a tree structure
with airways that root at the trachea and narrow as they branch towards the
ribcage; blood vessels hug the airways and join them at the alveoli, or air sacs,
where oxygen and carbon dioxide interchange. Removing a tumor naturally de-
taches downstream airways, vessels, and connective lung tissue, so tumor location
and size prescribe the type of resection performed. Large or central tumors are
removed via pneumonectomy (full lung) or lobectomy (full lobe), while small
or peripheral tumors may be “wedged” out. Segmentectomy, or removal of a
sub-lobar segment, is gaining currency because the procedure balances disease
removal with tissue preservation; and because the trend towards smaller, periph-
eral tumors supports it.
2 Background
In an archetypal VATS procedure, the surgeon examines a preoperative CT; here
the lung is inflated. They note the location of the tumor, relative to adjacent
structures. Now under the thoracoscope, the lung is collapsed, the tumor invis-
ible. The surgeon roughly estimates the tumor location. They look for known
structures; move the scope; manipulate the tissue; reveal a structure; remem-
ber it. They carefully dissect around critical structures [14]; discover another;
remember it. A few iterations and their familiarity grows. They mentally align
new visuals with their knowledge, experience, and the CT. Thusly, they converge
on an inference of the true tumor location.
The foregoing exercise is cognitively strenuous, time consuming, yet merely a
precursor to the primary task of tumor resection. It is emblematic of endoscopic
surgery in general and of segmentectomy in particular, as the surgeon continues
to mind critical structures under a limited visual field [18]. Consequently, endo-
scopic scenes are difficult to contextualize in isolation, thereby turning the lung
into a jigsaw puzzle in which the pieces may deform, and must be memorized. In-
deed, surgeons routinely retract the scope or zoom out to construct associations
and context. Moreover, the lung appearance may not be visually distinctive nor
instantly informative, further intensifying the challenges, and thus the inefficien-
cies, of minimally invasive lung surgery.
Interactive Endoscopy 3
The research community has responded vigorously, imbuing greater context
into endoscopy using augmented reality [5, 24] by registering coherent anatomical
models onto disjoint endoscopic snapshots. For example, Puerto-Souza et al. [25]
maintain registration of preoperative images to endoscopy by managing tissue
anchors amidst surgical activity. Du et al. [11] combine features with lighting, and
Collins et al. [9], texture with boundaries, to track surfaces in 3D. Lin et al. [19]
achieve surface reconstruction using hyperspectral imaging and structured light.
Simultaneous localization and mapping (SLAM) approaches have been extended
to handle tissue motion [23] and imaging conditions [21] found in endoscopy.
Stereo endoscopes are used to reconstruct tissue surfaces with high fidelity, and
have the potential to render overlays in 3D or guide surgical robots [29, 32].
Ref. [22] reviews the optical techniques that have been developed for tissue
surface reconstruction.
In recent clinical experiments, Chauvet et al. [8] project tumor margins onto
the surfaces of ex vivo porcine kidneys for partial nephrectomy. Liu et al. [20]
develop augmented reality for robotic VATS and robotic transoral surgery, per-
forming preclinical evaluations on ovine and porcine models, respectively, in ele-
vating the state of the art. These studies uncovered insights on registering models
to endoscopy and portraying these models faithfully. However, whether pursu-
ing a clinical application or technical specialty, researchers have faced a timeless
obstacle: tissue deformation. Modeling deformation is an ill-posed problem, and
this coinciding domain has likewise undergone extensive investigation [28]. In
the next section, we introduce an alternative technology for endoscopy that cir-
cumvents the challenges of deformable modeling.
3 Interactive Endoscopy
3.1 Contributions
The surgeon begins a VATS procedure as usual, examining the CT, placing sur-
gical ports, and estimating the tumor location under thoracoscopy. They adjust
both scope and lung in search of known landmarks, the pulmonary artery for
instance. Upon discovery, now under the proposed interactive endoscopy system
(Fig. 1, from a full ex vivo demo), they point their forceps at the target and ver-
bally instruct, “Mark the pulmonary artery.” An audible chime acknowledges,
and a miniature yet distinctive icon appears in the live video at the forceps tip,
accompanied by the semantic label. The surgeon continues to label the anatomy
and as they move the scope or tissue, the virtual annotations follow.
The combination of a limited visual field and amorphous free-form tissue
induces the surgeon to perform motions that are, in technical parlance, compu-
tationally intractable—it is the practice of surgery itself that postpones surgical
augmented reality ever farther into the future. In that future, critical debates on
validation and psychophysical effects await, yet the ongoing challenges of endo-
scopic surgery have persisted for decades. The present contribution transforms
endoscopy from a passive visualization tool to an interactive interface for labeling
live video. We recast static preoperative interactivity from Kim et al. [16] into
4 P. Thienphrapa et al.
Fig. 1. Interactive endoscopy annotation system on an ex vivo porcine lung; the red
simulated tumor is part of the live demonstration. (Left ) The surgeon points a tool at
a feature of interest then speaks the desired label, “tumor margin”. (Right) Multiple
labels are tracked as the surgeon manipulates the tissue. Note that the system is non-
disruptive to existing workflows and requires no setup.
an intraoperative scheme; and repurpose OCT review interactivity from Balicki
et al. [4] to provide live spatial storage for the expert’s knowledge. We show how
the system can help surgeons through a cognitively strenuous and irreproducible
exploration routine, examine circumstances that would enable clinical viability,
and discuss how the approach both complements and enables augmented reality
in surgery, as envisioned a generation ago (Fuchs et al., 1998 [15]).
3.2 Key Components
For the proposed interactive endoscopy system, the experimental setup and usage
scenario are pictured in Fig. 2. Its key components include (1) vision-based
tool tracking, (2) a speech interface, and (3) persistent tissue tracking. While
these discrete capabilities have been historically available, they have undergone
marked improvement in recent years due to the emergence of graphical processing
units (GPUs), online storage infrastructure, and machine learning. A system
capitalizing on these developments has the potential to reach clinical reliability in
the near future. While these technologies continue to evolve rapidly, we construct
a proof-of-concept integration of agnostic building blocks as a means of assessing
baseline performance.
Tool Tracking. Upon discovering each landmark, the surgeon points out its
location to the system. A workflow-compatible pointer can be repurposed from
a tool already in use, such as a forceps, by tracking it in the endoscope. We use
a hierarchical heuristic method (Fig. 3) with similar assumptions as in [10] that
thoracic tools are rigid, straight, and of a certain hue. The low-risk demands of
the pointing task motivates our simple approach: 2D tool tracking can reliably
Interactive Endoscopy 5
Ex vivo
porcine lung
Endoscope view with
virtua l annotations
Endoscope +
surgical tools
Surgeon says “Mark the
(landmark name)”
Labeled marker appears in video and
moves with assoc iated tissue
Surgeon looks for
anatomical landmarks
Landmark
found?
No
Yes
Surgeon uses tool to
point to landmark
Fig. 2. (Left) Experimental setup. Minimal instrumentation beyond that of standard
VATS is required, primarily a microphone and a computer (the C-arm is presently
unused). (Right) Workflow for applying a label. Pointing and verbal annotation (yellow)
are likewise minimal steps.
(a) (b) (c) (d) (e) (f)
Fig. 3. Tool tracking pipeline: (a) Original (b) HSV threshold (c) Foreground learning
using MoG (d) Foreground mask (e) Contour detection (f) Tip detection.
map 3D surface anatomy due to the projective nature of endoscopic views. Our
ex vivo tests indicate 2D tip localization to within 1.0 mm 92% of the time that
the tool points to a new location, and more advanced methods [6, 27] suggest
that clinical-grade tool tracking is well within reach.
Speech Interface. Pointing the forceps at a landmark, the surgeon uses speech
to generate a corresponding label. This natural, hands-free interface is conducive
to workflow and sterility, as previously acknowledged in the voice-controlled
AESOP endoscope robot [3]. Recognition latency and accuracy were at the time
prohibitive, but modern advances have driven widespread use. The proliferation
of voice-controlled virtual assistants (e.g., Alexa) obliges us to revisit speech as
a surgical interface.
We use Google Cloud Speech-to-Text in our experiments. The online service
allows the surgeon to apply arbitrary semantic labels; offline tools or preset
6 P. Thienphrapa et al.
Fig. 4. Persistent tissue tracking through folding and unfolding. SURF features (A and
B) are labeled so that annotations can be maintained though surgical manipulation.
vocabularies may be preferred in resource-constrained settings. Qualitatively,
we observed satisfactory recognition during demos using a commodity wireless
speaker-microphone, and this performance was retained in a noisy exhibition
room by switching to a wireless headset, suggesting that the clinical benefits of
speech interfaces may soon be realizable.
Persistent Tissue Tracking. After the surgeon creates a label, the system
maintains its adherence by encoding the underlying tissue and tracking it as the
lung moves, deforms, or reappears in view following an exit or occlusion. This
provides an intuition of the lung state, with similar issues faced in liver surgery.
The labeling task asks that arbitrary patches of tissue be persistently identified
whenever they appear—a combination of online detection and tracking which for
endoscopy is well served by SURF and optical flow [12]. SURF can identify tissue
through motion and stretching 83% and 99% of the time respectively [30]. In
ex vivo experiments, uniquely identified features could be recovered successfully
upon returning into view so long as imaging conditions remain reasonably stable,
as illustrated in Fig. 4.
Labels should be displayed, at minimum, when the tissue is at rest, and mod-
ern techniques in matching sub-image elements [13, 33] show promise in overcom-
ing the challenges of biomedical images [26]. Approaches such as FlowNet can
then be used to track moving tissue and enhance the realism of virtual label
adherence. In short, there is a new set of tools to address traditional computer
vision problems in endoscopy.
3.3 Capturing Surgeon Expertise
Interactive endoscopy ties maturing technologies together into a novel applica-
tion with forgiving performance requirements, paving the way to the level of
robustness needed for clinical use. The simplicity of the concept belies its po-
tential to alleviate cognitive load, which can impact both judgment and motor
skills [7]. When the surgeon exerts mental energy in parsing what they see, the
system lets them translate that expertise directly onto the virtual surgical field.
This mitigates redundant exploration, and the visibility of labels can help them
infer context more readily.
Interactive Endoscopy 7
In fact, many surgeons already use radiopaque, tethered (iVATS), dye [31],
and ad hoc electrocautery markers to aid localization prior to or during surgery.
These varied practices introduce risk and overhead, whereas virtual markers
are easy to use and provide a reason for use, potentially bridging gaps between
clinical practices and supporting technology. Moreover, surgeon engagement with
technology has a broader implication: digitization of the innards of surgery, which
has been a black box. Digital labels offer a chance to capture semantic, positional,
temporal, visual, and procedural elements of surgery, forming a statistical basis
for understanding—and anticipating—surgical acts at multiple scales. This, in
turn, can help make augmented reality a clinical reality.
4 Conclusions
The promise of augmented reality in surgery has been tempered by challenges
such as soft tissue deformation, and efforts to overcome this timeless adversary
has inadvertently suspended critical debates on the role of augmented percep-
tion in medicine altogether. We present, as a technological bridge, a streamlined
user interface that allows surgeons to tag the disjoint views that comprise en-
doscopic surgery. These virtual labels persist as the organ moves, so surgeons
can potentially manage unfamiliar tissue more deterministically. This approach
embraces the finiteness of human cognition and alleviates reliance on cognitive
state, capturing expert perception and judgment without attempting to emulate
it. We design a minimal feature set and a choice architecture with symmetric
freedom to use or not, respecting differences between surgeons. Our baseline
system demonstrates promising performance in a lab setting, while rapid ongo-
ing developments in the constituent technologies offer a path towards clinical
robustness. These circumstances present the opportunity for surgeons to change
surgery, without being compelled to change.
References
1. Healthcare Cost and Utilization Project, https://hcupnet.ahrq.gov/#setup
2. Reduced lung-cancer mortality with low-dose computed tomographic screening.
New England Journal of Medicine 365(5), 395–409 (2011)
3. Allaf, M.E., Jackman, S.V., Schulam, P.G., Cadeddu, J.A., Lee, B.R., Moore, R.G.,
Kavoussi, L.R.: Laparoscopic visual field. Surg. Endosc. 12(12), 1415–1418 (1998)
4. Balicki, M., Richa, R., Vagvolgyi, B., Kazanzides, P., Gehlbach, P., Handa, J.,
Kang, J., Taylor, R.: Interactive OCT annotation and visualization for vitreoretinal
surgery. In: Augmented Environments for Computer-Assisted Interventions (2013)
5. Bernhardt, S., Nicolau, S.A., Soler, L., Doignon, C.: The status of augmented
reality in laparoscopic surgery as of 2016. Med. Image Anal. 37, 66–90 (2017)
6. Bodenstedt, S., et al.: Comparative evaluation of instrument segmentation and
tracking methods in minimally invasive surgery (2018)
7. Carswell, C.M., Clarke, D., Seales, W.B.: Assessing mental workload during la-
paroscopic surgery. Surg. Innov. 12(1), 80–90 (2005)
8 P. Thienphrapa et al.
8. Chauvet, P., Collins, T., Debize, C., Novais-Gameiro, L., Pereira, B., Bartoli, A.,
Canis, M., Bourdel, N.: Augmented reality in a tumor resection model. Surg. En-
dosc. 32(3), 1192–1201 (2018)
9. Collins, T., Bartoli, A., Bourdel, N., Canis, M.: Robust, real-time, dense and de-
formable 3D organ tracking in laparoscopic videos. In: Medical Image Computing
and Computer-Assisted Intervention. pp. 404–412 (2016)
10. Doignon, C., Nageotte, F., de Mathelin, M.: Segmentation and guidance of mul-
tiple rigid objects for intra-operative endoscopic vision. In: Vidal, R., et al. (eds.)
Dynamical Vision. pp. 314–327. Springer Berlin Heidelberg (2007)
11. Du, X., Clancy, N., Arya, S., Hanna, G.B., Kelly, J., Elson, D.S., Stoyanov, D.: Ro-
bust surface tracking combining features, intensity and illumination compensation.
Int. J. Comput. Assist. Radiol. Surg. 10(12), 1915–1926 (2015)
12. Elhawary, H., Popovic, A.: Robust feature tracking on the beating heart for a
robotic-guided endoscope. Int. J. Med. Robot. Comput. Assist. Surg. 7(4) (2011)
13. Fischer, P., Dosovitskiy, A., Brox, T.: Descriptor matching with convolutional neu-
ral networks: A comparison to SIFT (2014)
14. Flores, R.M., et al.: Video-assisted thoracoscopic surgery (VATS) lobectomy:
Catastrophic intraoperative complications. J. Thorac. Cardiovasc. Surg. 142(6),
1412–1417 (2011)
15. Fuchs, H., et al.: Augmented reality visualization for laparoscopic surgery. In: Med-
ical Image Computing and Computer-Assisted Intervention. pp. 934–943. Springer
Berlin Heidelberg (1998)
16. Kim, J.H., Bartoli, A., Collins, T., Hartley, R.: Tracking by detection for interactive
image augmentation in laparoscopy. In: Dawant, B.M., et al. (eds.) Biomedical
Image Registration. pp. 246–255. Springer Berlin Heidelberg (2012)
17. Kinsinger, L.S., et al.: Implementation of lung cancer screening in the Veterans
Health Administration. JAMA Internal Medicine 177(3), 399–406 (2017)
18. Lee, C.Y., Chan, H., Ujiie, H., Fujino, K., Kinoshita, T., Irish, J.C., Yasufuku, K.:
Novel thoracoscopic navigation system with augmented real-time image guidance
for chest wall tumors. Ann. Thorac. Surg. 106(5), 1468–1475 (2018)
19. Lin, J., Clancy, N.T., Qi, J., Hu, Y., Tatla, T., Stoyanov, D., Maier-Hein, L., Elson,
D.S.: Dual-modality endoscopic probe for tissue surface shape reconstruction and
hyperspectral imaging enabled by deep neural networks. Med. Image Anal. 48,
162–176 (2018)
20. Liu, W.P., Richmon, J.D., Sorger, J.M., Azizian, M., Taylor, R.H.: Augmented
reality and CBCT guidance for transoral robotic surgery. J. Robot. Surg. 9(3),
223–233 (2015)
21. Mahmoud, N., Collins, T., Hostettler, A., Soler, L., Doignon, C., Montiel, J.M.M.:
Live tracking and dense reconstruction for handheld monocular endoscopy. IEEE
Trans. Med. Imaging 38(1), 79–89 (2019)
22. Maier-Hein, L., Mountney, P., Bartoli, A., Elhawary, H., Elson, D., Groch, A., Kolb,
A., Rodrigues, M., Sorger, J., Speidel, S., Stoyanov, D.: Optical techniques for 3D
surface reconstruction in computer-assisted laparoscopic surgery. Med. Image Anal.
17(8), 974–996 (2013)
23. Mountney, P., Yang, G.Z.: Motion compensated SLAM for image guided surgery.
In: Medical Image Computing and Computer-Assisted Intervention. pp. 496–504.
Springer Berlin Heidelberg (2010)
24. Nicolau, S., Soler, L., Mutter, D., Marescaux, J.: Augmented reality in laparoscopic
surgical oncology. Surg. Oncol. 20(3), 189–201 (2011)
Interactive Endoscopy 9
25. Puerto-Souza, G.A., Cadeddu, J.A., Mariottini, G.L.: Toward long-term and ac-
curate augmented-reality for monocular endoscopic videos. IEEE Trans. Biomed.
Eng. 61(10), 2609–2620 (2014)
26. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomed-
ical image segmentation. In: Medical Image Computing and Computer-Assisted
Intervention. pp. 234–241. Springer International Publishing (2015)
27. Shvets, A.A., Rakhlin, A., Kalinin, A.A., Iglovikov, V.I.: Automatic instrument
segmentation in robot-assisted surgery using deep learning. In: IEEE Int. Conf. on
Machine Learning and Applications (ICMLA). pp. 624–628 (2018)
28. Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical image registration:
A survey. IEEE Trans. Med. Imaging 32(7), 1153–1190 (2013)
29. Stoyanov, D., Scarzanella, M.V., Pratt, P., Yang, G.Z.: Real-time stereo recon-
struction in robotically assisted minimally invasive surgery. In: Medical Image
Computing and Computer-Assisted Intervention. pp. 275–282 (2010)
30. Thienphrapa, P., Bydlon, T., Chen, A., Popovic, A.: Evaluation of surface feature
persistence during lung surgery. In: BMES Annual Meeting. Atlanta, GA (2018)
31. Willekes, L., Boutros, C., Goldfarb, M.A.: VATS intraoperative tattooing to facil-
itate solitary pulmonary nodule resection. J. Cardiothorac. Surg. 3(1), 13 (2008)
32. Yip, M.C., Lowe, D.G., Salcudean, S.E., Rohling, R.N., Nguan, C.Y.: Tissue track-
ing and registration for image-guided surgery. IEEE Trans. Med. Imaging 31(11),
2169–2182 (2012)
33. Zagoruyko, S., Komodakis, N.: Learning to compare image patches via CNNs. In:
IEEE Conf. on Computer Vision and Pattern Recognition. pp. 4353–4361 (2015)
... For in-vivo applications, where generalizability may be essential and training data may be limited, future research in domain transfer [149] and semi-supervised learning [160] may become increasingly important. Moreover, for clinical validation, these solutions must be real-time, easyto-use, and robust, highlighting the need for efficient architectures [104] and thoughtful user interface design [161]. ...
Preprint
Full-text available
This article reviews deep learning applications in biomedical optics with a particular emphasis on image formation. The review is organized by imaging domains within biomedical optics and includes microscopy, fluorescence lifetime imaging, in vivo microscopy, widefield endoscopy, optical coherence tomography, photoacoustic imaging, diffuse tomography, and functional optical brain imaging. For each of these domains, we summarize how deep learning has been applied and highlight methods by which deep learning can enable new capabilities for optics in medicine. Challenges and opportunities to improve translation and adoption of deep learning in biomedical optics are also summarized.
... For in vivo applications, where generalizability may be essential and training data may be limited, future research in domain transfer [148] and semi-supervised learning [159] may become increasingly important. Moreover, for clinical validation, these solutions must be real-time, easy-to-use, and robust, highlighting the need for efficient architectures [103] and thoughtful user interface design [160]. ...
Article
Full-text available
This article reviews deep learning applications in biomedical optics with a particular emphasis on image formation. The review is organized by imaging domains within biomedical optics and includes microscopy, fluorescence lifetime imaging, in vivo microscopy, widefield endoscopy, optical coherence tomography, photoacoustic imaging, diffuse tomography, and functional optical brain imaging. For each of these domains, we summarize how deep learning has been applied and highlight methods by which deep learning can enable new capabilities for optics in medicine. Challenges and opportunities to improve translation and adoption of deep learning in biomedical optics are also summarized.
... While the system was not very complicated, with the C-arm and fiducials to align CT scans to video image, the surgeon had to translate the 3D reconstructed CT and x-ray images with fiducials into real life [40]. As lung deflation is a major obstacle in following the lesion, a system to mark the landmarks in AR with a tooltip and continuous tissue tracking, is currently under development by Phillips [41]. Proper setup of future OR equipped with cyber-infrastructure allows displaying 3D lung dynamics from the deformable models on the surface of the patient's body [42]. ...
Article
Full-text available
Augmented reality (AR) delivers virtual information or some of its elements to the real world. This technology, which has been used primarily for entertainment and military applications, has vigorously entered medicine, especially in radiology and surgery, yet has never been used in organ transplantation. AR could be useful in training transplant surgeons, promoting organ donations, graft retrieval and allocation, and microscopic diagnosis of rejection, treatment of complications, and post-transplantation neoplasms. The availability of AR display tools such as Smartphone screens and head-mounted goggles, accessibility of software for automated image segmentation and 3-dimensional reconstruction, and algorithms allowing registration, make augmented reality an attractive tool for surgery including transplantation. The shortage of hospital IT specialists and insufficient investments from medical equipment manufacturers into the development of AR technology remain the most significant obstacles in its broader application.
Article
Background In recent years, numerous innovative yet challenging surgeries, such as minimally invasive procedures, have introduced an overwhelming amount of new technologies, increasing the cognitive load for surgeons and potentially diluting their attention. Cognitive support technologies (CSTs) have been in development to reduce surgeons’ cognitive load and minimize errors. Despite its huge demands, it still lacks a systematic review. Methods Literature was searched up until May 21st, 2021. Pubmed, Web of Science, and IEEExplore. Studies that aimed at reducing the cognitive load of surgeons were included. Additionally, studies that contained an experimental trial with real patients and real surgeons were prioritized, although phantom and animal studies were also included. Major outcomes that were assessed included surgical error, anatomical localization accuracy, total procedural time, and patient outcome. Results A total of 37 studies were included. Overall, the implementation of CSTs had better surgical performance than the traditional methods. Most studies reported decreased error rate and increased efficiency. In terms of accuracy, most CSTs had over 90% accuracy in identifying anatomical markers with an error margin below 5 mm. Most studies reported a decrease in surgical time, although some were statistically insignificant. Discussion CSTs have been shown to reduce the mental workload of surgeons. However, the limited ergonomic design of current CSTs has hindered their widespread use in the clinical setting. Overall, more clinical data on actual patients is needed to provide concrete evidence before the ubiquitous implementation of CSTs.
Article
In recent intelligent-robot-assisted surgery studies, an urgent issue is how to detect the motion of instruments and soft tissue accurately from intra-operative images. Although optical flow technology from computer vision is a powerful solution to the motion-tracking problem, it has difficulty obtaining the pixel-wise optical flow ground truth of real surgery videos for supervised learning. Thus, unsupervised learning methods are critical. However, current unsupervised methods face the challenge of heavy occlusion in the surgical scene. This paper proposes a novel unsupervised learning framework to estimate the motion from surgical images under occlusion. The framework consists of a Motion Decoupling Network to estimate the tissue and the instrument motion with different constraints. Notably, the network integrates a segmentation subnet that estimates the segmentation map of instruments in an unsupervised manner to obtain the occlusion region and improve the dual motion estimation. Additionally, a hybrid self-supervised strategy with occlusion completion is introduced to recover realistic vision clues. Extensive experiments on two surgical datasets show that the proposed method achieves accurate motion estimation for intra-operative scenes and outperforms other unsupervised methods, with a margin of 15% in accuracy. The average estimation error for tissue is less than 2.2 pixels on average for both surgical datasets.
Conference Paper
Full-text available
Minimally invasive surgery is the preferred treatment option for patients with early stage lung cancer. As lung cancer screening programs are adopted, localizing small, non-palpable nodules during surgery will become more challenging. CT/CBCT guided placement of markers has evolved to aid in nodule localization [1]. However, there are several disadvantages with marker insertion, so marker-less approaches are under investigation. Some alternatives include the fusion of pre-operative anatomical images (i.e., CT, MR) with intra-operative endoscope images [2] and visual servoing [3]. In either method, tracking algorithms can be used to track the endoscope position. In this paper, we evaluate the persistence of tracking lung surface features visible in endoscopy during lung surgery.
Preprint
Full-text available
Semantic segmentation of robotic instruments is an important problem for the robot-assisted surgery. One of the main challenges is to correctly detect an instrument's position for the tracking and pose estimation in the vicinity of surgical scenes. Accurate pixel-wise instrument segmentation is needed to address this challenge. In this paper we describe our winning solution for MICCAI Endoscopic Vision SubChallenge: Robotic Instrument Segmentation and its further refinement. Our approach demonstrates an improvement over the state-of-the-art results using several novel deep neural network architectures. It addressed the binary segmentation problem, where every pixel in an image is labeled as an instrument or background from the surgery video feed. In addition, we solve a multi-class segmentation problem, in which we distinguish between different instruments or different parts of an instrument from the background. In this setting, our approach outper-forms other methods in every task subcategory for automatic instrument segmentation thereby providing state-of-the-art results for these problems. The source code for our solution is made publicly available at https://github.com/ternaus/robot-surgery-segmentation
Article
Full-text available
Background: Augmented Reality (AR) guidance is a technology that allows a surgeon to see sub-surface structures, by overlaying pre-operative imaging data on a live laparoscopic video. Our objectives were to evaluate a state-of-the-art AR guidance system in a tumor surgical resection model, comparing the accuracy of the resection with and without the system. Our system has three phases. Phase 1: using the MRI images, the kidney's and pseudotumor's surfaces are segmented to construct a 3D model. Phase 2: the intra-operative 3D model of the kidney is computed. Phase 3: the pre-operative and intra-operative models are registered, and the laparoscopic view is augmented with the pre-operative data. Methods: We performed a prospective experimental study on ex vivo porcine kidneys. Alginate was injected into the parenchyma to create pseudotumors measuring 4-10 mm. The kidneys were then analyzed by MRI. Next, the kidneys were placed into pelvictrainers, and the pseudotumors were laparoscopically resected. The AR guidance system allows the surgeon to see tumors and margins using classical laparoscopic instruments, and a classical screen. The resection margins were measured microscopically to evaluate the accuracy of resection. Results: Ninety tumors were segmented: 28 were used to optimize the AR software, and 62 were used to randomly compare surgical resection: 29 tumors were resected using AR and 33 without AR. The analysis of our pathological results showed 4 failures (tumor with positive margins) (13.8%) in the AR group, and 10 (30.3%) in the Non-AR group. There was no complete miss in the AR group, while there were 4 complete misses in the non-AR group. In total, 14 (42.4%) tumors were completely missed or had a positive margin in the non-AR group. Conclusions: Our AR system enhances the accuracy of surgical resection, particularly for small tumors. Crucial information such as resection margins and vascularization could also be displayed.
Article
Background: We developed thoracoscopic surgical navigation system with real-time augmented image guidance to assess the potential benefits for minimally invasive resection of chest wall tumors. The accuracy of localization of tumor and resection margin, and the effect on task workload and confidence were evaluated in a chest wall tumor phantom. Methods: After scanning a realistic tumor phantom by cone beam computed tomography and registering CT data into the system, three-dimensional contoured tumor and resection margin was displayed. Fifteen surgeons were asked to localize the tumor margin as well as surgical margins with the thoracoscope alone. The same procedure was performed with the surgical navigation system activated, and results compared between each attempt. A questionnaire and NASA task load index was completed after. Results: Localization error for the medial and superior tumor margin, which was difficult to visualize by thoracoscopy alone, was significantly reduced with the surgical navigation system (p =0.002 and p<0.001). All surgical resection margins were improved circumferentially, including margins that were readily visible by thoracoscopy. NASA-TLX response scores showed a statistically significant reduction in workload in all subscales. There was a >50% mean reduction in workload for performance (10.1vs 4.4, p=0.001) and frustration (13.0 vs 5.4, p=0.001). Conclusions: This study has shown that thoracoscopic surgical navigation system providing augmented image guidance decreased tumor localization error for regions difficult to visualize thoracoscopically, as well as reduced surgical margin error circumferentially, regardless of thoracoscopic visibility. This system also reduced workload and increased surgeon's confidence in localizing challenging chest wall tumors.
Article
Contemporary endoscopic simultaneous localization and mapping (SLAM) methods accurately compute endoscope poses; however, they only provide a sparse 3-D reconstruction that poorly describes the surgical scene. We propose a novel dense SLAM method whose qualities are: 1) monocular, requiring only RGB images of a handheld monocular endoscope; 2) fast, providing endoscope positional tracking and 3-D scene reconstruction, running in parallel threads; 3) dense, yielding an accurate dense reconstruction; 4) robust, to the severe illumination changes, poor texture and small deformations that are typical in endoscopy; and 5) self-contained, without needing any fiducials nor external tracking devices and, therefore, it can be smoothly integrated into the surgical workflow. It works as follows. First, accurate cluster frame poses are estimated using the sparse SLAM feature matches. The system segments clusters of video frames according to parallax criteria. Next, dense matches between cluster frames are computed in parallel by a variational approach that combines zero mean normalized cross correlation and a gradient Huber norm regularizer. This combination copes with challenging lighting and textures at an affordable time budget on a modern GPU. It can outperform pure stereo reconstructions, because the frames cluster can provide larger parallax from the endoscope’s motion. We provide an extensive experimental validation on real sequences of the porcine abdominal cavity, both in-vivo and ex-vivo . We also show a qualitative evaluation on human liver. In addition, we show a comparison with the other dense SLAM methods showing the performance gain in terms of accuracy, density, and computation time.
Article
Importance: The US Preventive Services Task Force recommends annual lung cancer screening (LCS) with low-dose computed tomography for current and former heavy smokers aged 55 to 80 years. There is little published experience regarding implementing this recommendation in clinical practice. Objectives: To describe organizational- and patient-level experiences with implementing an LCS program in selected Veterans Health Administration (VHA) hospitals and to estimate the number of VHA patients who may be candidates for LCS. Design, setting, and participants: This clinical demonstration project was conducted at 8 academic VHA hospitals among 93 033 primary care patients who were assessed on screening criteria; 2106 patients underwent LCS between July 1, 2013, and June 30, 2015. Interventions: Implementation Guide and support, full-time LCS coordinators, electronic tools, tracking database, patient education materials, and radiologic and nodule follow-up guidelines. Main outcomes and measures: Description of implementation processes; percentages of patients who agreed to undergo LCS, had positive findings on results of low-dose computed tomographic scans (nodules to be tracked or suspicious findings), were found to have lung cancer, or had incidental findings; and estimated number of VHA patients who met the criteria for LCS. Results: Of the 4246 patients who met the criteria for LCS, 2452 (57.7%) agreed to undergo screening and 2106 (2028 men and 78 women; mean [SD] age, 64.9 [5.1] years) underwent LCS. Wide variation in processes and patient experiences occurred among the 8 sites. Of the 2106 patients screened, 1257 (59.7%) had nodules; 1184 of these patients (56.2%) required tracking, 42 (2.0%) required further evaluation but the findings were not cancer, and 31 (1.5%) had lung cancer. A variety of incidental findings, such as emphysema, other pulmonary abnormalities, and coronary artery calcification, were noted on the scans of 857 patients (40.7%). Conclusions and relevance: It is estimated that nearly 900 000 of a population of 6.7 million VHA patients met the criteria for LCS. Implementation of LCS in the VHA will likely lead to large numbers of patients eligible for LCS and will require substantial clinical effort for both patients and staff.
Conference Paper
An open problem in computer-assisted surgery is to robustly track soft-tissue 3D organ models in laparoscopic videos in real-time and over long durations. Previous real-time approaches use locally-tracked features such as SIFT or SURF to drive the process, usually with KLT tracking. However this is not robust and breaks down with occlusions, blur, specularities, rapid motion and poor texture. We have developed a fundamentally different framework that can deal with most of the above challenges and in real-time. This works by densely matching tissue texture at the pixel level, without requiring feature detection or matching. It naturally handles texture distortion caused by deformation and/or viewpoint change, does not cause drift, is robust to occlusions from tools and other structures, and handles blurred frames. It also integrates robust boundary contour matching, which provides tracking constraints at the organ’s boundaries. We show that it can track over long durations and can handles challenging cases that were previously unsolvable.
Article
This article establishes a comprehensive review of all the different methods proposed by the literature concerning augmented reality in intra-abdominal minimally invasive surgery (also known as laparoscopic surgery). A solid background of surgical augmented reality is first provided in order to support the survey. Then, the various methods of laparoscopic augmented reality as well as their key tasks are categorized in order to better grasp the current landscape of the field. Finally, the various issues gathered from these reviewed approaches are organized in order to outline the remaining challenges of augmented reality in laparoscopic surgery.