Article

Real-time ultrasound transducer localization in fluoroscopy images by transfer learning from synthetic training data

Authors:
  • Odin Vision
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The fusion of image data from trans-esophageal echography (TEE) and X-ray fluoroscopy is attracting increasing interest in minimally-invasive treatment of structural heart disease. In order to calculate the needed transformation between both imaging systems, we employ a discriminative learning (DL) based approach to localize the TEE transducer in X-ray images. The successful application of DL methods is strongly dependent on the available training data, which entails three challenges: (1) the transducer can move with six degrees of freedom meaning it requires a large number of images to represent its appearance, (2) manual labeling is time consuming, and (3) manual labeling has inherent errors. This paper proposes to generate the required training data automatically from a single volumetric image of the transducer. In order to adapt this system to real X-ray data, we use unlabeled fluoroscopy images to estimate differences in feature space density and correct covariate shift by instance weighting. Two approaches for instance weighting, probabilistic classification and Kullback-Leibler importance estimation (KLIEP), are evaluated for different stages of the proposed DL pipeline. An analysis on more than 1900 images reveals that our approach reduces detection failures from 7.3% in cross validation on the test set to zero and improves the localization error from 1.5 to 0.8mm. Due to the automatic generation of training data, the proposed system is highly flexible and can be adapted to any medical device with minimal efforts.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Simulations stand as a powerful support to understand complex physiological phenomena, and as a potential way to enrich real databases with large amounts of realistic data [1,2]. However, strong differences may exist between the simulations obtained from distinct models, and between simulated and real data. ...
... Simple examples in medical imaging include the global adaptation of data distributions, illustrated on real cardiac meshes from different magnetic resonance protocols [8] or real vs. simulated fluoroscopy images to train a transducer localization algorithm [1]. The merging of databases can be made more sample-specific after reducing the dimensionality of the data. ...
... Recent works investigated the estimation of joint spaces that merge heterogeneous data features [9,10]. We are more interested in adapting the data from one database to another one, as pursued globally in [8,1]. A mathematically sound framework for Population Model #1 20 healthy cases processed as in [13]: -11 cases from cMAC-STACOM 2011 [12], -9 cases from our clinical collaborators. ...
Chapter
We tackle the determination of a relevant data space to quantify differences between two databases coming from different sources. In the present paper, we propose to quantify differences between cardiac simulations from two different biomechanical models, assessed through myocardial deformation patterns. At stake is the evaluation of a given model with respect to another one, and the potential correction of bias necessary to merge two databases. We address this from a domain adaptation perspective. We first represent the data using non-linear dimensionality reduction on each database. Then, we formulate the mapping between databases using cases that are shared between the two databases: either as a linear change of basis derived from the learnt eigenvectors, or as a non-linear regression based on the low-dimensional coordinates from each database. We demonstrate these concepts by examining the principal variations in deformation patterns obtained from two cardiac biomechanical models personalized to 20 and 15 real healthy cases, respectively, from which 11 cases were simulated with both models.
... Average detection time was 0.53 seconds. In [4], the work from [3] was extended by focusing on a framework for adapting a classifier generated with in silico training data to perform better on in-vivo test data. Impressive results for detection of in-plane TEE pose parameters were obtained in terms of localization accuracy, low false positive rate, and detection speed. ...
... Training Datasets For the TEE probe, the classifier was trained on simulated XRF images. Similar to the method from [4], hybrid images were created by blending anatomical background images from TAVR cases with digitally reconstructed radiographs (DRRs) of the TEE probe. For the PV, 389 clinical images from TAVR cases were manually annotated and used for training. ...
... The rate of successful detections was 95.8% for the TEE probe and 90.1% for the PV. This was competitive with previously reported results for the TEE probe [3,4], especially when considering that the HF was trained on simulated images. For successful detections, both devices resulted in localization errors less than 1.5 mm on average, and orientation errors less than 3.0 • . ...
Conference Paper
Full-text available
A method for real-time localization of devices in fluoroscopic images is presented. Device pose is estimated using a Hough forest based detection framework. The method was applied to two types of devices used for transcatheter aortic valve replacement: a transesophageal echo (TEE) probe and prosthetic valve (PV). Validation was performed on clinical datasets, where both the TEE probe and PV were successfully detected in 95.8% and 90.1% of images, respectively. TEE probe position and orientation errors were 1.42 ± 0.79 mm and 2.59° ± 1.87°, while PV position and orientation errors were 1.04 ± 0.77 mm and 2.90° ± 2.37°.The Hough forest was implemented in CUDA C, and was able to generate device location hypotheses in less than 50 ms for all experiments.
... For the evaluation of the overall framework, the patch-based implementation and a sampling step of k s = 2.0 voxels per sampling step is used. An average runtime of 86, 1 ms and a best runtime of 73, 1 ms is reached in the evaluation setup, which is considered real-time capable for dynamic 2-D/3-D registration [8]. In comparison with a single-core CPU implementation using an OpenGL DLI-rendering a speed-up of ×41.2 is achieved, see Tab ...
... account. Real-time capability [8] and a runtime performance of 13 frames per second at best is achieved with the GPU-framework and an adapted volume sampling step. Only one approach for 2-D/3-D registration reaches a similar framerate and a higher runtime-performance is not reached by approaches reported in literature [9] to the best knowledge of the author. ...
Chapter
2D/3D image fusion is used for a variety of interventional procedures. Overlays of 2D images with perspective-correctly rendered 3D images provide the physicians additional information during the interventions. In this work, a real-time capable 2D/3D registration framework is presented. An adapted parallelization using GPU is investigated for the depth-aware registration algorithm. The GPU hardware architecture is specially taken into account by optimizing memory access patterns and exploiting CUDA-texture memory. The real-time capability is achieved with a median runtime of one 2D/3D registration iteration of 86.1ms with an median accuracy of up to 1.15 mm.
... To further relieve the possible overfitting issues, we can also use the idea of transfer learning [52][53][54][55]. Specifically, in the unsupervised pre-training step, we can borrow MR images from other body parts (e.g., heart) to initialize our deep network, thus capturing more general MR image appearance. ...
... We believe that this initialization could benefit the fine-tuning step and thus overcome the small sample problem. Note that similar strategies have been widely used in the field of computer vision and machine learning [31,[51][52][53][54][55][56]. ...
Article
Automatic and reliable segmentation of the prostate is an important but difficult task for various clinical applications such as prostate cancer radiotherapy. The main challenges for accurate MR prostate localization lie in two aspects: (1) inhomogeneous and inconsistent appearance around prostate boundary, and (2) the large shape variation across different patients. To tackle these two problems, we propose a new deformable MR prostate segmentation method by unifying deep feature learning with the sparse patch matching. First, instead of directly using handcrafted features, we propose to learn the latent feature representation from prostate MR images by the stacked sparse auto-encoder (SSAE). Since the deep learning algorithm learns the feature hierarchy from the data, the learned features are often more concise and effective than the handcrafted features in describing the underlying data. To improve the discriminability of learned features, we further refine the feature representation in a supervised fashion. Second, based on the learned features, a sparse patch matching method is proposed to infer a prostate likelihood map by transferring the prostate labels from multiple atlases to the new prostate MR image. Finally, a deformable segmentation is used to integrate a sparse shape model with the prostate likelihood map for achieving the final segmentation. The proposed method has been extensively evaluated on the dataset that contains 66 T2-wighted prostate MR images. Experimental results show that the deep-learned features are more effective than the handcrafted features in guiding MR prostate segmentation. Moreover, our method shows superior performance than other state-of-the-art segmentation methods.
... Similarly to how synthetic depth data made real-time pose estimation from depth cameras possible , robust detectors were trained on synthesised fluoroscopic images (Heimann et al., 2014). Geremia et al. (2013) learnt to estimate tumour cell density estimation on clinical images by training models on simulated multi-modal MR images using biophysical models growing synthetic cerebral tumours at various positions in the brain. ...
... Les données synthétiques de profondeur ont rendu l'estimation de la pose en temps réel possible et les détecteurs robustes ont été entraînés sur les images radioscopiques synthétisés (Heimann et al., 2014). Geremia et al. (2013) a appris à estimer la densité cellulaire des tumeurs sur les images cliniques grâce à la simulation des images multimodales en utilisant les modèles biophysiques de croissance des tumeurs cérébrales synthétiques à diverses positions dans le cerveau. ...
Thesis
Full-text available
L’explosion récente de données d’imagerie cardiaque a été phénoménale. L'utilisation intelligente des grandes bases de données annotées pourrait constituer une aide précieuse au diagnostic et à la planification de thérapie. En plus des défis inhérents à la grande taille de ces banques de données, elles sont difficilement utilisables en l'état. Les données ne sont pas structurées, le contenu des images est variable et mal indexé, et les métadonnées ne sont pas standardisées. L'objectif de cette thèse est donc le traitement, l’analyse et l’interpretation automatique de ces bases de données afin de faciliter leur utilisation par les spécialistes de cardiologie. Dans ce but, la thèse explore les outils d'apprentissage automatique supervisé, ce qui aide à exploiter ces grandes quantités d'images cardiaques et trouver de meilleures représentations. Tout d'abord, la visualisation et l’interpretation d'images est améliorée en développant une méthode de reconnaissance automatique des plans d'acquisition couramment utilisés en imagerie cardiaque. La méthode se base sur l'apprentissage par forêtsaléatoires et par réseaux de neurones à convolution, en utilisant des larges banques d'images, où des types de vues cardiaques sont préalablement établies. La thèse s’attache dans un deuxième temps au traitement automatique des images cardiaques, avec en perspectivel'extraction d'indices cliniques pertinents. La segmentation des structures cardiaques est une étape clé de ce processus. A cet effet une méthode basée sur les forêts aléatoires qui exploite des attributs spatio-temporels originaux pour la segmentation automatique dans des images 3D et 3D+t est proposée. En troisième partie, l'apprentissage supervisé de sémantique cardiaque est enrichi grâce à une méthode de collecte en ligne d'annotations d'usagers. Enfin, la dernière partie utilise l'apprentissage automatique basée sur les forêts aléatoires pour cartographier des banques d'images cardiaques, tout en établissant les notions de distance et de voisinage d'images. Une application est proposée afin de retrouver dans une banque de données, les images les plus similaires à celle d'un nouveau patient.
... In order to circumvent the need for similar quantities of manual annotations, recent works have explored ways of reducing human effort. This has been of interest in both the computer vision industry [14]- [18] and the CAI community [6], [19]- [22]. There are two fundamental approaches to alleviate the manual effort. ...
... Alternatively to just reducing the number of labels or time required for dataset curation, dataset synthesis has emerged as an inexpensive approach to generate annotated images automatically. The work of Heimann et al. [19] proposed to synthetically embed an ultrasound (US) transducer into fluoroscopy images to generate a training set. The aim was to detect the in-plane probe's position, orientation, and scale. ...
Article
Full-text available
Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is often a rate-limiting factor when large amounts of labeled images are required, such as for training deep convolutional networks for instrument-background segmentation in surgical scenes. No large datasets comparable to industry standards in the computer vision community are available for this task. To circumvent this problem, we propose to automate the creation of a realistic training dataset by exploiting techniques stemming from special effects and harnessing them to target training performance rather than visual appeal. Foreground data is captured by placing sample surgical instruments over a chroma key (a.k.a. green screen) in a controlled environment, thereby making extraction of the relevant image segment straightforward. Multiple lighting conditions and viewpoints can be captured and introduced in the simulation by moving the instruments and camera and modulating the light source. Background data is captured by collecting videos that do not contain instruments. In the absence of pre-existing instrument-free background videos, minimal labeling effort is required, just to select frames that do not contain surgical instruments from videos of surgical interventions freely available online. We compare different methods to blend instruments over tissue and propose a novel data augmentation approach that takes advantage of the plurality of options. We show that by training a vanilla U-Net on semi-synthetic data only and applying a simple post-processing, we are able to match the results of the same network trained on a publicly available manually labeled real dataset.
... In order to circumvent the need for similar quantities of manual annotations, recent works have explored ways of reducing human effort. This has been of interest in both the computer vision industry [14]- [18] and the CAI community [6], [19]- [22]. There are two fundamental approaches to alleviate the manual effort. ...
... Alternatively to just reducing the number of labels or time required for dataset curation, dataset synthesis has emerged as an inexpensive approach to generate annotated images automatically. The work of Heimann et al. [19] proposed to synthetically embed an ultrasound (US) transducer into fluoroscopy images to generate a training set. The aim was to detect the in-plane probe's position, orientation, and scale. ...
Preprint
Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is often a rate-limiting factor when large amounts of labeled images are required, such as for training deep convolutional networks for instrument-background segmentation in surgical scenes. No large datasets comparable to industry standards in the computer vision community are available for this task. To circumvent this problem, we propose to automate the creation of a realistic training dataset by exploiting techniques stemming from special effects and harnessing them to target training performance rather than visual appeal. Foreground data is captured by placing sample surgical instruments over a chroma key (a.k.a. green screen) in a controlled environment, thereby making extraction of the relevant image segment straightforward. Multiple lighting conditions and viewpoints can be captured and introduced in the simulation by moving the instruments and camera and modulating the light source. Background data is captured by collecting videos that do not contain instruments. In the absence of pre-existing instrument-free background videos, minimal labeling effort is required, just to select frames that do not contain surgical instruments from videos of surgical interventions freely available online. We compare different methods to blend instruments over tissue and propose a novel data augmentation approach that takes advantage of the plurality of options. We show that by training a vanilla U-Net on semi-synthetic data only and applying a simple post-processing, we are able to match the results of the same network trained on a publicly available manually labeled real dataset.
... In this work, we proposed a DL-based framework to enhance the visibility of clinical needles with PA imaging for guiding minimally invasive procedures. As clinical needles have relatively simple geometries whilst background biological tissues such as blood vessels are complex, as opposed to using purely synthetic data [46][47][48][49], a hybrid method was proposed for generating semi-synthetic datasets [50]. The DL model was trained and validated using such semi-synthetic datasets and blind to the test data obtained from tissue-mimicking phantoms, ex vivo tissue and human fingers in vivo. ...
Article
Full-text available
Photoacoustic imaging has shown great potential for guiding minimally invasive procedures by accurate identification of critical tissue targets and invasive medical devices (such as metallic needles). The use of light emitting diodes (LEDs) as the excitation light sources accelerates its clinical translation owing to its high affordability and portability. However, needle visibility in LED-based photoacoustic imaging is compromised primarily due to its low optical fluence. In this work, we propose a deep learning framework based on U-Net to improve the visibility of clinical metallic needles with a LED-based photoacoustic and ultrasound imaging system. To address the complexity of capturing ground truth for real data and the poor realism of purely simulated data, this framework included the generation of semi-synthetic training datasets combining both simulated data to represent features from the needles and in vivo measurements for tissue background. Evaluation of the trained neural network was performed with needle insertions into blood-vessel-mimicking phantoms, pork joint tissue ex vivo and measurements on human volunteers. This deep learning-based framework substantially improved the needle visibility in photoacoustic imaging in vivo compared to conventional reconstruction by suppressing background noise and image artefacts, achieving 5.8 and 4.5 times improvements in signal-to-noise ratio (SNR) and the modified Hausdorff distance (MHD) respectively. Thus, the proposed framework could be helpful for reducing complications during percutaneous needle insertions by accurate identification of clinical needles in photoacoustic imaging.
... One important aspects in this regard is the simulation of further datasets to obtain more comprehensive training data with respect to image contrast and fiber structure. Another aspect where methodological improvements are definitely possible is the currently employed naïve approach to generalize between the simulated and in vivo domain, using approaches of unsupervised domain adaptation and transfer learning (G€ otz et al., 2014;Heimann et al., 2014;Long et al., 2016Long et al., , 2014McKeough et al., 2013;Pan and Yang, 2010;Sener et al., 2016). This also includes the possibility to generalize to images acquired with different settings such as b-value and number of shells. ...
Article
We present a fiber tractography approach based on a random forest classification and voting process, guiding each step of the streamline progression by directly processing raw diffusion-weighted signal intensities. For comparison to the state-of-the-art, i.e. tractography pipelines that rely on mathematical modeling, we performed a quantitative and qualitative evaluation with multiple phantom and in vivo experiments, including a comparison to the 96 submissions of the ISMRM tractography challenge 2015. The results demonstrate the vast potential of machine learning for fiber tractography.
... Fluoroscopic images are generated from the camera point of view using a composite ray caster that simulates the X-ray physical attenuation process. [25][26][27] The 3D catheter shape generated by the simulator is then registered to the camera coordinate frame and projected onto the image plane to generate the 2D catheter shape in the fluoroscopic image. In the conducted simulation, the virtual camera was placed at 1200mm from the patient, facing the natural plane of the aortic arch, with an image size of 400 × 600px, focal lengths of f x = f y = 2000px and a principal point placed at (200,300)px. ...
Article
Full-text available
In current practice, fluoroscopy remains the gold standard for guiding surgeons during endovascular catheterization. The poor visibility of anatomical structures and the absence of depth information make accurate catheter localization and manipulation a difficult task. Overexposure to radiation and use of risk-prone contrast agent also compromise surgeons’ and patients’ health. Alternative approaches using embedded electromagnetic (EM) sensors have been developed to overcome the limitations of fluoroscopy-based interventions. As only a finite number of sensors can be integrated within a catheter, methods that rely on such sensors require the use of interpolation schemes to recover the catheter shape. Since EM sensors are sensitive to external interferences, the outcome is not robust. This paper introduces a probabilistic framework that improves the catheter localization and reduces the dependency on fluoroscopy and contrast agents. Within this framework, the dense 2D information extracted from fluoroscopic images is combined with the discrete pose information of EM sensors to provide a reliable reconstruction of the full three-dimensional catheter shape. Validation in a physics-based simulation environment and in a real-world experimental setup provides promising results and indicates that the proposed framework allows reconstructing the 3D catheter shape with a median root-mean-square error of 3.7 mm with an interquartile range of 0.3 mm.
... One important aspects in this regard is the simulation of further datasets to obtain more comprehensive training data with respect to image contrast and fiber structure. Another aspect where methodological improvements are definitely possible is the currently employed naïve approach to generalize between the simulated and in vivo domain, using approaches of unsupervised domain adaptation and transfer learning (Götz, Michael et al., 2014;Heimann et al., 2014;Long et al., 2016Long et al., , 2014McKeough et al., 2013;Pan and Yang, 2010;Sener et al., 2016). ...
Article
We present a fiber tractography approach based on a random forest classification and voting process, guiding each step of the streamline progression by directly processing raw diffusion-weighted signal intensities. For comparison to the state-of-the-art, i.e. tractography pipelines that rely on mathematical modeling, we performed a quantitative and qualitative evaluation with multiple phantom and in vivo experiments, including a comparison to the 96 submissions of the ISMRM tractography challenge 2015. The results demonstrate the vast potential of machine learning for fiber tractography.
... Such methods have only recently started to emerge in medical imaging applications. These approaches frequently rely on a small amount of labeled target data ( [1,14,15,16,17], to name a few), or can be unsupervised with respect to the target [2,18], which is favorable for tasks where annotation is costly. In the latter case, typically the transfer is achieved by weighing the training samples such that the differences between training and target data are minimized. ...
Article
Supervised learning has been very successful for automatic segmentation of images from a single scanner. However, several papers report deteriorated performances when using classifiers trained on images from one scanner to segment images from other scanners. We propose a transfer learning classifier that adapts to differences between training and test images. This method uses a weighted ensemble of classifiers trained on individual images. The weight of each classifier is determined by the similarity between its training image and the test image. We examine three unsupervised similarity measures, which can be used in scenarios where no labeled data from a newly introduced scanner or scanning protocol is available. The measures are based on a divergence, a bag distance, and on estimating the labels with a clustering procedure. These measures are asymmetric. We study whether the asymmetry can improve classification. Out of the three similarity measures, the bag similarity measure is the most robust across different studies and achieves excellent results on four brain tissue segmentation datasets and three white matter lesion segmentation datasets, acquired at different centers and with different scanners and scanning protocols. We show that the asymmetry can indeed be informative, and that computing the similarity from the test image to the training images is more appropriate than the opposite direction.
... The correct framework would consist in transferring the knowledge between the source domain (large database with labeled data) to the target domain (smaller database, here real data) where the 2 domains are different but related. The transfer learning methods are solving these types of problems, and it has been applied for example in an ultrasound transducer localization in fluoroscopy images [Heimann 2014]. Because only unlabeled real images were available, they generated a simulated training data with an automatic labeling. ...
Thesis
The objective of this thesis is to use non-invasive data (body surface potentialmapping, BSPM) to personalise the main parameters of a cardiac electrophysiological(EP) model for predicting the response to cardiac resynchronization therapy(CRT). CRT is a clinically proven treatment option for some heart failures.However, these therapies are ineffective in 30% of the treated patients and involvesignificant morbidity and substantial cost. The precise understanding of the patientspecificcardiac function can help to predict the response to therapy. Until now, suchmethods required to measure intra-cardiac electrical potentials through an invasiveendovascular procedure which can be at risk for the patient.We developed a non-invasive EP model personalisation based on a patientspecificsimulated database and machine learning regressions. First, we estimatedthe onset activation location and a global conduction parameter. We extended thisapproach to multiple onsets and to ischemic patients by means of a sparse Bayesianregression. Moreover, we developed a reference ventricle-torso anatomy in order toperform an common offline regression and we predicted the response to differentpacing conditions from the personalised model. In a second part, we studied theadaptation of the proposed method to the input of 12-lead electrocardiograms (ECG)and the integration in an electro-mechanical model for a clinical use. The evaluationof our work was performed on an important dataset (more than 25 patients and150 cardiac cycles). Besides having comparable results with state-of-the-art ECGimaging methods, the predicted BSPMs show good correlation coefficients with thereal BSPMs
... Machine learning based techniques have also been used (Mountney et al. (2012); Hatt et al. (2015a); Heimann et al. (2014)) but have focused primarily on fully-automatic probe detection rather than 5DOF or 6DOF poseestimation, which is necessary to achieve the accuracy required for most clinical applications. ...
Article
Full-text available
In recent years, registration between x-ray fluoroscopy (XRF) and transesophageal echocardiography (TEE) has been rapidly developed, validated, and translated to the clinic as a tool for advanced image guidance of structural heart interventions. This technology relies on accurate pose-estimation of the TEE probe via standard 2D/3D registration methods. It has been shown that latencies caused by slow registrations can result in errors during untracked frames, and a real-time ( > 15 hz) tracking algorithm is needed to minimize these errors. This paper presents two novel similarity metrics designed for accurate, robust, and extremely fast pose-estimation of devices from XRF images: Direct Splat Correlation (DSC) and Patch Gradient Correlation (PGC). Both metrics were implemented in CUDA C, and validated on simulated and clinical datasets against prior methods presented in the literature. It was shown that by combining DSC and PGC in a hybrid method (HYB), target registration errors comparable to previously reported methods were achieved, but at much higher speeds and lower failure rates. In simulated datasets, the proposed HYB method achieved a median projected target registration error (pTRE) of 0.33 mm and a mean registration frame-rate of 12.1 hz, while previously published methods produced median pTREs greater than 1.5 mm and mean registration frame-rates less than 4 hz. In clinical datasets, the HYB method achieved a median pTRE of 1.1 mm and a mean registration frame-rate of 20.5 hz, while previously published methods produced median pTREs greater than 1.3 mm and mean registration frame-rates less than 12 hz. The proposed hybrid method also had much lower failure rates than previously published methods.
... By learning with few or no labeled target examples, DA effectively mitigates the high labeling and data acquisition costs in medical data, which would have to be incurred if a large representative dataset were to be acquired for every domain evolution. Plausible scenarios where DA can be leveraged include upgrading of scanner platforms, in multi-center studies where image acquisition protocol may differ, new patient population etc. Towards this end, prior applications and approaches reported in medical imaging literature include training from synthetically generated samples in ultrasound transducer localization (Heimann et al., 2015) and reducing annotation costs in microscopic images by leveraging labeled examples from closely related domains (Becker et al., 2015). Further, in Goetz et al. (2010), random forests were adapted to increase class-specificity through instance weighting under covariate shift assumption for brain tumor classification with limited labeled data and in Cheng et al. (2012) DA was employed for classifying Alzheimer's disease stages. ...
Article
In this paper, we propose a supervised domain adaptation (DA) framework for adapting decision forests in the presence of distribution shift between training (source) and testing (target) domains, given few labeled examples. We introduce a novel method for DA through an error-correcting hierarchical transfer relaxation scheme with domain alignment, feature normalization, and leaf posterior reweighting to correct for the distribution shift between the domains. For the first time we apply DA to the challenging problem of extending in vitro trained forests (source domain) for in vivo applications (target domain). The proof-of-concept is provided for in vivo characterization of atherosclerotic tissues using intravascular ultrasound signals, where presence of flowing blood is a source of distribution shift between the two domains. This potentially leads to misclassification upon direct deployment of in vitro trained classifier, thus motivating the need for DA as obtaining reliable in vivo training labels is often challenging if not infeasible. Exhaustive validations and parameter sensitivity analysis substantiate the reliability of the proposed DA framework and demonstrates improved tissue characterization performance for scenarios where adaptation is conducted in presence of only a few examples. The proposed method can thus be leveraged to reduce annotation costs and improve computational efficiency over conventional retraining approaches.
... Fortunately, transfer learning [5] techniques can be employed in order to deal with the differences between source and target data. Such approaches frequently rely on a small amount of labeled target data ( [2,6,7], to name a few), or can be unsupervised with respect to the target [1,8], which is favorable for tasks where annotation is costly. In the latter case, typically the transfer is achieved by weighing the training samples such that the differences between training and target data are minimized. ...
Conference Paper
Supervised classification is widely used for image segmen-tation. To work effectively, these techniques need large amounts of labeled training data, that is representative of the test data. Different patient groups, different scanners or different scanning protocols can lead to differences between the images, thus representative data might not be available. Transfer learning techniques can be used to account for these differences, thus taking advantage of all the available data acquired with different protocols. We investigate the use of classifier ensembles, where each classifier is weighted according to the similarity between the data it is trained on, and the data it needs to segment. We examine 3 asymmetric similarity measures that can be used in scenarios where no labeled data from a newly introduced scanner or scanning protocol is available. We show that the asymmetry is informative and the direction of measurement needs to be chosen carefully. We also show that a point set similarity measure is robust across different studies, and out-performs state-of-the-art results on a multi-center brain tissue segmentation task.
... Various methods have been proposed to determine these weights (Fan et al., 2005;Huang et al., 2006;Sugiyama et al., 2008;Zadrozny, 2004). Heimann et al. (2014) for example, used a weighting method of Sugiyama et al. (2010) to improve localization of an ultrasound transducer in X-ray images. Their method was trained on artificial training images of the transducer and applied to real-life images, which introduced a distribution difference between training and target images, yet, due to the way their training data was constructed, they could assume to have no difference in labeling functions between training and test data. ...
Article
Many automatic segmentation methods are based on supervised machine learning. Such methods have proven to perform well, on the condition that they are trained on a sufficiently large manually labeled training set that is representative of the images to segment. However, due to differences between scanners, scanning parameters, and patients such a training set may be difficult to obtain. We present a transfer-learning approach to segmentation by multi-feature voxelwise classification. The presented method can be trained using a heterogeneous set of training images that may be obtained with different scanners than the target image. In our approach each training image is given a weight based on the distribution of its voxels in the feature space. These image weights are chosen as to minimize the difference between the weighted probability density function (PDF) of the voxels of the training images and the PDF of the voxels of the target image. The voxels and weights of the training images are then used to train a weighted classifier. We tested our method on three segmentation tasks: brain-tissue segmentation, skull stripping, and white-matter-lesion segmentation. For all three applications, the proposed weighted classifier significantly outperformed an unweighted classifier on all training images, reducing classification errors by up to 42%. For brain-tissue segmentation and skull stripping our method even significantly outperformed the traditional approach of training on representative training images from the same study as the target image. Copyright © 2015 Elsevier B.V. All rights reserved.
... The appeal of this method is, that only recordings x i and no labels y i are necessary for adaptation. While the concept of covariate shift has been applied with great success in a number of different medical imaging applications [8,9], major challenges related to transferring it to our problem are estimation of high dimensional β, and high variance introduced by weighting, both addressed in the next two subsections. ...
Conference Paper
Full-text available
Multispectral imaging in laparoscopy can provide tissue reflectance measurements for each point in the image at multiple wavelengths of light. These reflectances encode information on important physiological parameters not visible to the naked eye. Fast decoding of the data during surgery, however, remains challenging. While model-based methods suffer from inaccurate base assumptions, a major bottleneck related to competing machine learning-based solutions is the lack of labelled training data. In this paper, we address this issue with the first transfer learning-based method to physiological parameter estimation from multispectral images. It relies on a highly generic tissue model that aims to capture the full range of optical tissue parameters that can potentially be observed in vivo. Adaptation of the model to a specific clinical application based on unlabelled in vivo data is achieved using a new concept of domain adaptation that explicitly addresses the high variance often introduced by conventional covariance-shift correction methods. According to comprehensive in silico and in vivo experiments our approach enables accurate parameter estimation for various tissue types without the need for incorporating specific prior knowledge on optical properties and could thus pave the way for many exciting applications in multispectral laparoscopy.
... voxels. We decided to use a logistic regression classifier (LRC) to calculate w because it was previously successfully used [11] and we found that it works fast on large data sets. We trained a LRC that predicts if a voxel is in the complete or the small segmentation. ...
Conference Paper
Full-text available
Current learning-based brain tumor classification methods show good performance but require large datasets of manually annotated training examples. Since image acquisition hardware and setup vary from clinic to clinic, training has to be repeated and the required time-consuming labeling effort limits a wider applicability of these approaches in clinical routine. We propose an approach that allows labelling of only small and unabiguous parts of the training data. Domain adaptation is applied to correct for the induced sampling error. We validated our approach using multimodal MR-scans of 19 patients and showed that our approach reduces the labeling time significantly while giving results that closely match those from a fully annotated training set. This is an important step towards bringing automatic tumor segmentation into clinical routine.
... In this study, it's stated that the features of first layer of a network is similar to Gabor filters and it's claimed that initialization of a network with the features of pre-trained networks increases the generalization capability of the target model. In the literature, there are many applications that exploit the power of transfer learning for various problems such as visual tracking [7], facial age and gender classification [8] and ultrasound transducer localization [9]. One recent work [10] focuses on modelling the statistical information of the good convolutional filters with Gaussian Mixture Model (GMM) and eliminates the need for pre-trained networks similarly as in this study. ...
... The experimental results on the RIM-ONE database demonstrate the effectiveness of the proposed algorithm despite the lack of initial labeled samples. Other unsupervised transfer learning problems addressed in [263][264][265][266]. ...
Article
Full-text available
Medical imaging is a useful tool for disease detection and diagnostic imaging technology has enabled early diagnosis of medical conditions. Manual image analysis methods are labor-intense and they are susceptible to intra as well as inter-observer variability. Automated medical image analysis techniques can overcome these limitations. In this review, we investigated Transfer Learning (TL) architectures for automated medical image analysis. We discovered that TL has been applied to a wide range of medical imaging tasks, such as segmentation, object identification, disease categorization, severity grading, to name a few. We could establish that TL provides high quality decision support and requires less training data when compared to traditional deep learning methods. These advantageous properties arise from the fact that TL models have already been trained on large generic datasets and a task specific dataset is only used to customize the model. This eliminates the need to train the models from scratch. Our review shows that AlexNet, ResNet, VGGNet, and GoogleNet are the most widely used TL models for medical image analysis. We found that these models can understand medical images, and the customization refines the ability, making these TL models useful tools for medical image analysis.
... In ( Rusu et al. 2017), authors use progressive transfer learning approaches to make use of experiences of a robot from a simulated domain to real one. In ( Heimann et al. 2014), authors develop a system that can learn from unlabeled images to improve localization of objects for medical imaging application. (Pan, Yang, and others 2010) has an excellent survey of transfer learning methods. ...
Preprint
Deep neural networks can form high-level hierarchical representations of input data. Various researchers have demonstrated that these representations can be used to enable a variety of useful applications. However, such representations are typically based on the statistics within the data, and may not conform with the semantic representation that may be necessitated by the application. Conditional models are typically used to overcome this challenge, but they require large annotated datasets which are difficult to come by and costly to create. In this paper, we show that semantically-aligned representations can be generated instead with the help of a physics based engine. This is accomplished by creating a synthetic dataset with decoupled attributes, learning an encoder for the synthetic dataset, and augmenting prescribed attributes from the synthetic domain with attributes from the real domain. It is shown that the proposed (SYNTH-VAE-GAN) method can construct a conditional predictive-generative model of human face attributes without relying on real data labels.
... This technique is particularly useful in cases where manual annotation of the training data is too time consuming or would lead to high uncertainties such as interobserver variability in the case of organ-at-risk segmentation. Previous examples of fields utilising this technique include: geology [34,35], biology [36][37][38][39][40], automotive [41,42], medical [43][44][45][46][47][48][49] and robotics [50][51][52]. This technique has also been utilised in radiation oncology to calculate MV linac radiation doses in real patient CTs using DL models which were trained only on synthetic data [53]. ...
Article
Radiation therapy requires clinical linear accelerators to be mechanically and dosimetrically calibrated to a high standard. One important quality assurance test is the Winston-Lutz test which localises the radiation isocentre of the linac. In the current work we demonstrate a novel method of analysing EPID based Winston-Lutz QA images using a deep learning model trained only on synthetic image data. In addition, we propose a novel method of generating the synthetic WL images and associated ‘ground-truth’ masks using an optical path-tracing engine to ‘fake’ mega-voltage EPID images. The model called DeepWL was trained on 1500 synthetic WL images using data augmentation techniques for 180 epochs. The model was built using Keras with a TensorFlow backend on an Intel Core i5-6500T CPU and trained in approximately 15 h. DeepWL was shown to produce ball bearing and multi-leaf collimator field segmentations with a mean dice coefficient of 0.964 and 0.994 respectively on previously unseen synthetic testing data. When DeepWL was applied to WL data measured on an EPID, the predicted mean displacements were shown to be statistically similar to the Canny Edge detection method. However, the DeepWL predictions for the ball bearing locations were shown to correlate better with manual annotations compared with the Canny edge detection algorithm. DeepWL was demonstrated to analyse Winston-Lutz images with an accuracy suitable for routine linac quality assurance with some statistical evidence that it may outperform Canny Edge detection methods in terms of segmentation robustness and the resultant displacement predictions.
... This technique is particularly useful in cases where manual annotation of the training data is too time consuming or would lead to high uncertainties such as inter-observer variability in the case of organ-at-risk segmentation. Previous examples of fields utilizing this technique include: geology [35,36], biology [37][38][39][40][41], automotive [42,43], medical [44][45][46][47][48][49][50] and robotics [51][52][53]. This technique has also been utilized in radiation oncology to calculate MV linac radiation doses in real patient CTs using DL models which were trained only on synthetic data [54]. ...
Preprint
Full-text available
https://arxiv.org/abs/2107.01976 Radiation therapy requires clinical linear accelerators to be mechanically and dosimetrically calibrated to a high standard. One important quality assurance test is the Winston-Lutz test which localizes the radiation isocentre of the linac. In the current work we demonstrate a novel method of analysing EPID based Winston-Lutz QA images using a deep learning model trained only on synthetic image data. In addition, we propose a novel method of generating the synthetic WL images and associated ground-truth masks using an optical ray-tracing engine to fake mega-voltage EPID images. The model called DeepWL was trained on 1500 synthetic WL images using data augmentation techniques for 180 epochs. The model was built using Keras with a TensorFlow backend on an Intel Core i5 6500T CPU and trained in approximately 15 hours. DeepWL was shown to produce ball bearing and multi-leaf collimator field segmentations with a mean dice coefficient of 0.964 and 0.994 respectively on previously unseen synthetic testing data. When DeepWL was applied to WL data measured on an EPID, the predicted mean displacements were shown to be statistically similar to the Canny Edge detection method. However, the DeepWL predictions for the ball bearing locations were shown to correlate better with manual annotations compared with the Canny edge detection algorithm. DeepWL was demonstrated to analyse Winston-Lutz images with accuracy suitable for routine linac quality assurance with some statistical evidence that it may outperform Canny Edge detection methods in terms of segmentation robustness and the resultant displacement predictions.
... However in practice, these methods differ in whether they use labeled data from the target domain. Unsupervised transfer in addressed in [Wang et al., 2013, Heimann et al., 2014. Other works focus on supervised transfer, with a small amount of labeled data from the target domain [Conjeti et al., 2016, Wachinger and Reuter, 2016, Goetz et al., 2016]. ...
Article
Machine learning (ML) algorithms have made a tremendous impact in the field of medical imaging. While medical imaging datasets have been growing in size, a challenge for supervised ML algorithms that is frequently mentioned is the lack of annotated data. As a result, various methods which can learn with less/other types of supervision, have been proposed. We review semi-supervised, multiple instance, and transfer learning in medical imaging, both in diagnosis/detection or segmentation tasks. We also discuss connections between these learning scenarios, and opportunities for future research.
Chapter
Bei der Schlaganfallbehandlung mittels Thrombektomie werden Blutgerinnsel mithilfe von Instrumenten wie Kathetern und Führungsdrähten aus dem Gefäßsystem entfernt. Werden diese Instrumente mit kleinen elektromagnetischen (EM) Sensoren versehen, kann man sie durch einen EM Feldgenerator (FG) im Körper lokalisieren. Mithilfe dieser Trackingdaten und präoperativen Bilddaten könnten dem Arzt während des Eingriffes zusätzliche Informationen, wie z.B. der aktuelle Abstand zum Blutgerinnsel, angezeigt werden, was eine effektivere Durchführung unterstützen würde. Die Firma Polhemus Inc. bietet mit dem Liberty TX1 einen kleinen FG an, der nah am Eingriffsort platziertwerden kann und potentiellwenige Störungen in der intraoperativen Bildaufnahme verursacht. Daneben gibt es das etablierte System Aurora von Northern Digital Inc. (NDI) mit dem Tabletop FG. Mithilfe eines standardisierten Messprotokolls wurden beide FGen unter klinischen Bedingungen auf ihre Genauigkeit und Robustheit geprüft. Der NDI Tabletop FG erreichte bei der Platzierung auf der Patientenliege eine Positionsgenauigkeit von 0,38mm und einen mittleren Jitter von 0,02 mm, der Polhemus TX1 FG eine Genauigkeit von 0,24mm und einen mittleren Jitter von 0,07 mm. Außerdem wurde mithilfe eines open-science Gefäßphantoms eine katheterbasierte Intervention nachgestellt, um den Einfluss der FGen auf die digitale Subtraktionsangiographie (DSA) zu untersuchen. Der TX1 FG erzeugte durch die geringe Baugröße im Gegensatz zum Tabletop FG kaum Artefakte. Aufgrund der Kompaktheit, den tendenziell besseren Genauigkeitsergebnissen und der hohen Robustheit stellt der TX1 FG für das Instrumententracking eine interessante Alternative dar. Derzeit hat das Polhemus System allerdings noch die Einschränkung, dass dessen kleinster Sensor mit einem Durchmesser von 1,8mm minimal größer ist als der maximale Innendurchmesser eines bei der Thrombektomie eingesetzten Aspirationskatheters von 1,77 mm.
Article
The fourth industrial revolution is set to integrate entire manufacturing processes using industrial digital technologies such as the Internet of Things, Cloud Computing, and machine learning to improve process productivity, efficiency, and sustainability. Sensors collect the real-time data required to optimise manufacturing processes and are therefore a key technology in this transformation. Ultrasonic sensors have benefits of being low-cost, in-line, non-invasive, and able to operate in opaque systems. Supervised machine learning models can correlate ultrasonic sensor data to useful information about the manufacturing materials and processes. However, this requires a reference measurement of the process material to label each data point for model training. Labelled data is often difficult to obtain in factory environments, and so a method of training models without this is desirable. This work compares two domain adaptation methods to transfer models across processes, so that no labelled data is required to accurately monitor a target process. The two method compared are a Single Feature transfer learning approach and Transfer Component Analysis using three features. Ultrasonic waveforms are unique to the sensor used, attachment procedure, and contact pressure. Therefore, only a small number of transferable features are investigated. Two industrially relevant processes were used as case studies: mixing and cleaning of fouling in pipes. A reflection-mode ultrasonic sensing technique was used, which monitors the sound wave reflected from the interface between the vessel wall and process material. Overall, the Single Feature method produced the highest prediction accuracies: up to 96.0% and 98.4% to classify the completion of mixing and cleaning, respectively; and R² values of up to 0.947 and 0.999 to predict the time remaining until completion. These results highlight the potential of combining ultrasonic measurements with transfer learning techniques to monitor industrial processes. Although, further work is required to study various effects such as changing sensor location between source and target domains.
Article
In Orthopaedics, fracture detection is considered one of the challenging tasks with x-ray images. The proposed methodology uses a novel feature-set to classify diaphyseal Tibial fractures using an Artificial Neural Network. The task of classification is carried out in two levels. The first level involves the classification of images into normal and fractured. The second level comprises of classification into three types of fractures, namely, simple, wedge and complex type. Around 12,000 X-Ray images are used as a dataset, collected from local hospitals and publicly available musculoskeletal radiographs. The local features such as Hough lines, texture values, number of intersection points, number of fragments and local binary patterns are deployed in the work. Performance-based feature reduction is carried out. The experimentation performed with individuals, a combination of two, three, four and five features, has revealed an average classification accuracy of 98.59%. Along with BPNN, other classifiers, namely, k-NN and DT are used. Results show that the method outperforms the state-of-the-art works and are found encouraging. The work is useful for Orthopaedic practitioners and extendable to other types of bones.
Article
A CNN based method for cardiac MRI tag tracking was developed and validated. A synthetic data simulator was created to generate large amounts of training data using natural images, a Bloch equation simulation, a broad range of tissue properties, and programmed ground-truth motion. The method was validated using both an analytical deforming cardiac phantom and in vivo data with manually tracked reference motion paths. In the analytical phantom, error was investigated relative to SNR, and accurate results were seen for SNR>10 (displacement error <0.3 mm). Excellent agreement was seen in vivo for tag locations (mean displacement difference = -0.02 pixels, 95% CI [-0.73, 0.69]) and calculated cardiac circumferential strain (mean difference = 0.006, 95% CI [-0.012, 0.024]). Automated tag tracking with a CNN trained on synthetic data is both accurate and precise.
Article
Minimally invasive endovascular interventions have evolved rapidly over the past decade, facilitated by breakthroughs in medical imaging and sensing, instrumentation and most recently robotics. Catheter-based operations are potentially safer and applicable to a wider patient population due to the reduced comorbidity. As a result endovascular surgery has become the preferred treatment option for conditions previously treated with open surgery and as such the number of patients undergoing endovascular interventions is increasing every year. This fact coupled with a proclivity for reduced working hours results in a requirement for efficient training and assessment of new surgeons, that deviates from the “see one, do one, teach one” model introduced by William Halsted, so that trainees obtain operational expertise in a shorter period. Developing more objective assessment tools based on quantitative metrics is now a recognized need in interventional training and this manuscript reports the current literature for endovascular skills assessment and the associated emerging technologies. A systematic search was performed on PubMed (MEDLINE), Google Scholar, IEEXplore and known journals using the keywords, “endovascular surgery”, “surgical skills”, “endovascular skills”, “surgical training endovascular” and “catheter skills”. Focusing explicitly on endovascular surgical skills, we group related works into three categories based on the metrics used; structured scales and checklists, simulation-based and motion-based metrics. This review highlights the key findings in each category and also provides suggestions for new research opportunities towards fully objective and automated surgical assessment solutions.
Article
Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols, learning from weak labels, and interpretation and evaluation of results.
Article
Cardiovascular surgeons increasingly resort to catheter-based diagnostic and therapeutic interventions because of their limited invasiveness. Although, these approaches allow treatment of patients considered unfit for conventional open surgery, exposure to radiation and high procedural complexity could lead to complications. These factors motivated the introduction of robotic technology offering more dexterous catheters, enhanced visualization and opening new possibilities in terms of guidance and coordinated control. In addition to improvements of patient outcome, through teleoperated catheter control radiation exposure of surgeons can be reduced. In order to limit surgical workload, intuitive mappings between joystick input and resulting catheter motion are essential. This paper presents and compares two proposed mappings and investigates the benefits of additional visual guidance. The comparison is based on data gathered during an experimental campaign involving 14 novices and three surgeons. The participants were asked to perform an endovascular task in a virtual reality simulator presented in the first part of this paper. Statistical results show significant superiority of one mapping with respect to the other and a significant improvement of performance thanks to additional visual guidance. Future work will focus on translating the results to a physical setup for surgical validation, also the learning effect will be analyzed more in-depth.
Article
Research into artificial intelligence (AI) has made tremendous progress over the past decade. In particular, the AI-powered analysis of images and signals has reached human-level performance in many applications owing to the efficiency of modern machine learning methods, in particular deep learning using convolutional neural networks. Research into the application of AI to medical imaging is now very active, especially in the field of cardiovascular imaging because of the challenges associated with acquiring and analysing images of this dynamic organ. In this Review, we discuss the clinical questions in cardiovascular imaging that AI can be used to address and the principal methodological AI approaches that have been developed to solve the related image analysis problems. Some approaches are purely data-driven and rely mainly on statistical associations, whereas others integrate anatomical and physiological information through additional statistical, geometric and biophysical models of the human heart. In a structured manner, we provide representative examples of each of these approaches, with particular attention to the underlying computational imaging challenges. Finally, we discuss the remaining limitations of AI approaches in cardiovascular imaging (such as generalizability and explainability) and how they can be overcome.
Preprint
Full-text available
3-D pose estimation of instruments is a crucial step towards automatic scene understanding in robotic minimally invasive surgery. Although robotic systems can potentially directly provide joint values, this information is not commonly exploited inside the operating room, due to its possible unreliability, limited access and the time-consuming calibration required, especially for continuum robots. For this reason, standard approaches for 3-D pose estimation involve the use of external tracking systems. Recently, image-based methods have emerged as promising, non-invasive alternatives. While many image-based approaches in the literature have shown accurate results, they generally require either a complex iterative optimization for each processed image, making them unsuitable for real-time applications, or a large number of manually-annotated images for efficient learning. In this paper we propose a self-supervised image-based method, exploiting, at training time only, the imprecise kinematic information provided by the robot. In order to avoid introducing time-consuming manual annotations, the problem is formulated as an auto-encoder, smartly bottlenecked by the presence of a physical model of the robotic instruments and surgical camera, forcing a separation between image background and kinematic content. Validation of the method was performed on semi-synthetic, phantom and in-vivo datasets, obtained using a flexible robotized endoscope, showing promising results for real-time image-based 3-D pose estimation of surgical instruments.
Article
Full-text available
3-D pose estimation of instruments is a crucial step towards automatic scene understanding in robotic minimally invasive surgery. Although robotic systems can potentially directly provide joint values, this information is not commonly exploited inside the operating room, due to its possible unreliability, limited access and the time-consuming calibration required, especially for continuum robots. For this reason, standard approaches for 3-D pose estimation involve the use of external tracking systems. Recently, image-based methods have emerged as promising, non-invasive alternatives. While many image-based approaches in the literature have shown accurate results, they generally require either a complex iterative optimization for each processed image, making them unsuitable for real-time applications, or a large number of manually-annotated images for efficient learning. In this paper we propose a self-supervised image-based method, exploiting, at training time only, the imprecise kinematic information provided by the robot. In order to avoid introducing time-consuming manual annotations, the problem is formulated as an auto-encoder, smartly bottlenecked by the presence of a physical model of the robotic instruments and surgical camera, forcing a separation between image background and kinematic content. Validation of the method was performed on semi-synthetic, phantom and in-vivo datasets obtained using a flexible robotized endoscope, showing promising results for real-time image-based 3-D pose estimation of surgical instruments.
Article
Collecting large databases of annotated medical images is crucial for the validation and testing of feature extraction, statistical analysis and machine learning algorithms. Recent advances in cardiac electromechanical modeling and image synthesis provided a framework to generate synthetic images based on realistic mesh simulations. Nonetheless, their potential to augment an existing database with large amounts of synthetic cases requires further investigation. We build upon these works and propose a revised scheme for synthesizing pathological cardiac sequences from real healthy sequences. Our new pipeline notably involves a much easier registration problem to reduce potential artifacts, and takes advantage of mesh correspondences to generate new data from a given case without additional registration. The output sequences are thoroughly examined in terms of quality and usability on a given application: the assessment of myocardial viability, via the generation of 465 synthetic cine MR sequences (15 healthy and 450 with pathological tissue viability [random location, extent and grade, up to myocardial infarct]). We demonstrate that our methodology (i) improves state-of-the-art algorithms in terms of realism and accuracy of the simulated images, and (ii) is well-suited for the generation of large databases at small computational cost.
Article
Full-text available
The amount of training data that is required to train a classifier scales with the dimensionality of the feature data. In hyperspectral remote sensing, feature data can potentially become very high dimensional. However, the amount of training data is oftentimes limited. Thus, one of the core challenges in hyperspectral remote sensing is how to perform multi-class classification using only relatively few training data points. In this work, we address this issue by enriching the feature matrix with synthetically generated sample points. This synthetic data is sampled from a GMM fitted to each class of the limited training data. Although, the true distribution of features may not be perfectly modeled by the fitted GMM, we demonstrate that a moderate augmentation by these synthetic samples can effectively replace a part of the missing training samples. We show the efficacy of the proposed approach on two hyperspectral datasets. The median gain in classification performance is $5\%$. It is also encouraging that this performance gain is remarkably stable for large variations in the number of added samples, which makes it much easier to apply this method to real-world applications.
Conference Paper
Full-text available
New minimal-invasive interventions such as transcatheter valve procedures exploit multiple imaging modalities to guide tools (fluoroscopy) and visualize soft tissue (transesophageal echocardiography (TEE)). Currently, these complementary modalities are visualized in separate coordinate systems and on separate monitors creating a challenging clinical workflow. This paper proposes a novel framework for fusing TEE and fluoroscopy by detecting the pose of the TEE probe in the fluoroscopic image. Probe pose detection is challenging in fluoroscopy and conventional computer vision techniques are not well suited. Current research requires manual initialization or the addition of fiducials. The main contribution of this paper is autonomous six DoF pose detection by combining discriminative learning techniques with a fast binary template library. The pose estimation problem is reformulated to incrementally detect pose parameters by exploiting natural invariances in the image. The theoretical contribution of this paper is validated on synthetic, phantom and in vivo data. The practical application of this technique is supported by accurate results (< 5 mm in-plane error) and computation time of 0.5s.
Article
Full-text available
A class of predictive densities is derived by weighting the observed samples in maximizing the log-likelihood function. This approach is effective in cases such as sample surveys or design of experiments, where the observed covariate follows a different distribution than that in the whole population. Under misspecification of the parametric model, the optimal choice of the weight function is asymptotically shown to be the ratio of the density function of the covariate in the population to that in the observations. This is the pseudo-maximum likelihood estimation of sample surveys. The optimality is defined by the expected Kullback–Leibler loss, and the optimal weight is obtained by considering the importance sampling identity. Under correct specification of the model, however, the ordinary maximum likelihood estimate (i.e. the uniform weight) is shown to be optimal asymptotically. For moderate sample size, the situation is in between the two extreme cases, and the weight function is selected by minimizing a variant of the information criterion derived as an estimate of the expected loss. The method is also applied to a weighted version of the Bayesian predictive density. Numerical examples as well as Monte-Carlo simulations are shown for polynomial regression. A connection with the robust parametric estimation is discussed.
Conference Paper
Full-text available
We have constructed a frontal face detection system which achieves detection and false positive rates which are equivalent to the best published results [7, 5, 6, 4, 1]. This face detection system is most clearly distinguished from previous approaches in its ability to detect faces extremely rapidly. Operating on 384 by 288 pixel images, faces are detected at 15 frames per second on a conventional 700 MHz Intel Pentium III. In other face detection systems, auxiliary information, such as image differences in video sequences, or pixel color in color images, have been used to achieve high frame rates. Our system achieves high frame rates working only with the information present in a single grey scale image. These alternative sources of information can also be integrated with our system to achieve even higher frame rates.
Article
Full-text available
This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the Integral Image which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a cascade which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.
Article
Full-text available
We propose an automatic four-chamber heart segmentation system for the quantitative functional analysis of the heart from cardiac computed tomography (CT) volumes. Two topics are discussed: heart modeling and automatic model fitting to an unseen volume. Heart modeling is a nontrivial task since the heart is a complex nonrigid organ. The model must be anatomically accurate, allow manual editing, and provide sufficient information to guide automatic detection and segmentation. Unlike previous work, we explicitly represent important landmarks (such as the valves and the ventricular septum cusps) among the control points of the model. The control points can be detected reliably to guide the automatic model fitting process. Using this model, we develop an efficient and robust approach for automatic heart chamber segmentation in 3-D CT volumes. We formulate the segmentation as a two-step learning problem: anatomical structure localization and boundary delineation. In both steps, we exploit the recent advances in learning discriminative models. A novel algorithm, marginal space learning (MSL), is introduced to solve the 9-D similarity transformation search problem for localizing the heart chambers. After determining the pose of the heart chambers, we estimate the 3-D shape through learning-based boundary delineation. The proposed method has been extensively tested on the largest dataset (with 323 volumes from 137 patients) ever reported in the literature. To the best of our knowledge, our system is the fastest with a speed of 4.0 s per volume (on a dual-core 3.2-GHz processor) for the automatic segmentation of all four chambers.
Conference Paper
The fusion of image data from trans-esophageal echography (TEE) and X-ray fluoroscopy is attracting increasing interest in minimally-invasive treatment of structural heart disease. In order to calculate the needed transform between both imaging systems, we employ a discriminative learning based approach to localize the TEE transducer in X-ray images. Instead of time-consuming manual labeling, we generate the required training data automatically from a single volumetric image of the transducer. In order to adapt this system to real X-ray data, we use unlabeled fluoroscopy images to estimate differences in feature space density and correct covariate shift by instance weighting. An evaluation on more than 1900 images reveals that our approach reduces detection failures by 95% compared to cross validation on the test set and improves the localization error from 1.5 to 0.8 mm. Due to the automatic generation of training data, the proposed system is highly flexible and can be adapted to any medical device with minimal efforts.
Article
Discriminative, or (structured) prediction, methods have proved effective for variety of problems in computer vision; a notable example is 3D monocular pose estimation. All methods to date, however, relied on an assumption that training (source) and test (target) data come from the same underlying joint distribution. In many real cases, including standard data sets, this assumption is flawed. In the presence of training set bias, the learning results in a biased model whose performance degrades on the (target) test set. Under the assumption of covariate shift, we propose an unsupervised domain adaptation approach to address this problem. The approach takes the form of training instance reweighting, where the weights are assigned based on the ratio of training and test marginals evaluated at the samples. Learning with the resulting weighted training samples alleviates the bias in the learned models. We show the efficacy of our approach by proposing weighted variants of kernel regression (KR) and twin Gaussian processes (TGP). We show that our weighted variants outperform their unweighted counterparts and improve on the state-of-the-art performance in the public (HumanEva) data set.
Article
A basic assumption of statistical learning theory is that train and test data are drawn from the same underlying distribution. Unfortunately, this assumption doesn't hold in many applications. Instead, ample labeled data might exist in a particular `source' domain while inference is needed in another, `target' domain. Domain adaptation methods leverage labeled data from both domains to improve classification on unseen data in the target domain. In this work we survey domain transfer learning methods for various application domains with focus on recent work in Computer Vision.
Article
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.
Article
Transcatheter aortic valve implantation is a minimally invasive alternative to open-heart surgery for aortic stenosis in which a stent-based bioprosthetic valve is delivered into the heart on a catheter. Limited visualization during this procedure can lead to severe complications. Improved visualization can be provided by live registration of transesophageal echo (TEE) and fluoroscopy images intraoperatively. Since the TEE probe is always visible in the fluoroscopy image, it is possible to track it using fiducial-based single-perspective pose estimation. In this study, inherent probe tracking performance was assessed, and TEE to fluoroscopy registration accuracy and robustness were evaluated. Results demonstrated probe tracking errors of below 0.6 mm and 0.2°, a 2-D RMS registration error of 1.5 mm, and a tracking failure rate of below 1%. In addition to providing live registration and better accuracy and robustness compared to existing TEE probe tracking methods, this system is designed to be suitable for clinical use. It is fully automatic, requires no additional operating room hardware, does not require intraoperative calibration, maintains existing procedure and imaging workflow without modification, and can be implemented in all cardiac centers at extremely low cost.
Conference Paper
Live 3D trans-esophageal echocardiography (TEE) and X-ray fluoroscopy provide complementary imaging information for guiding minimally invasive cardiac interventions. X-ray fluoroscopy is most commonly used for these procedures due to its excellent device visualization. However, its challenges include the 2D projection nature of the images and poor soft tissue contrast, both of which are addressed by the use of live 3D TEE imaging. We propose to integrate 3D TEE imaging with X-ray fluoroscopy, providing the capability to co-visualize both the interventional devices and cardiac anatomy, by accurately registering the images using an electro-magnetic tracking system. Phantom trials validating the proposed registration scheme indicate an average accuracy of 2.04 mm with a standard deviation of 0.59 mm. In the future, this system may benefit the guidance and navigation of interventional cardiac procedures such as mitral valve repair or patent foramen ovale closure.
Article
Density ratio estimation has gathered a great deal of attention recently since it can be used for various data processing tasks. In this paper, we consider three methods of density ratio estimation: (A) the numerator and denominator densities are separately estimated and then the ratio of the estimated densities is computed, (B) a logistic regression classifier discriminating denominator samples from numerator samples is learned and then the ratio of the posterior probabilities is computed, and (C) the density ratio function is directly modeled and learned by minimizing the empirical Kullback-Leibler divergence. We first prove that when the numerator and denominator densities are known to be members of the exponential family, (A) is better than (B) and (B) is better than (C). Then we show that once the model assumption is violated, (C) is better than (A) and (B). Thus in practical situations where no exact model is available, (C) would be the most promising approach to density ratio estimation.
Article
Density ratio estimation has attracted a great deal of attention in the statistics and machine learning communities since it can be used for solving various statistical data processing tasks such as non-stationarity adaptation, two-sample test, outlier detection, independence test, feature selection/extraction, independent component analysis, causal inference, and conditional probability estimation. When estimating the density ratio, it is preferable to avoid estimating densities since density estimation is known to be a hard problem. In this paper, we give a comprehensive review of density ratio estimation methods based on moment matching, probabilistic classification, and ratio matching.
Article
We review the literature on semi-supervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semi-supervised learning. This document is a chapter excerpt from the author’s doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Article
Two-dimensional (2D) X-ray imaging is the dominant imaging modality for cardiac interventions. However, the use of X-ray fluoroscopy alone is inadequate for the guidance of procedures that require soft-tissue information, for example, the treatment of structural heart disease. The recent availability of three-dimensional (3D) trans-esophageal echocardiography (TEE) provides cardiologists with real-time 3D imaging of cardiac anatomy. Increasingly X-ray imaging is now supported by using intra-procedure 3D TEE imaging. We hypothesize that the real-time co-registration and visualization of 3D TEE and X-ray fluoroscopy data will provide a powerful guidance tool for cardiologists. In this paper, we propose a novel, robust and efficient method for performing this registration. The major advantage of our method is that it does not rely on any additional tracking hardware and therefore can be deployed straightforwardly into any interventional laboratory. Our method consists of an image-based TEE probe localization algorithm and a calibration procedure. While the calibration needs to be done only once, the GPU-accelerated registration takes approximately from 2 to 15s to complete depending on the number of X-ray images used in the registration and the image resolution. The accuracy of our method was assessed using a realistic heart phantom. The target registration error (TRE) for the heart phantom was less than 2mm. In addition, we assess the accuracy and the clinical feasibility of our method using five patient datasets, two of which were acquired from cardiac electrophysiology procedures and three from trans-catheter aortic valve implantation procedures. The registration results showed our technique had mean registration errors of 1.5-4.2mm and 95% capture range of 8.7-11.4mm in terms of TRE.
Article
We present a feasibility study on hybrid echocardiography (echo) and x-ray image guidance for cardiac catheterization procedures. A self-tracked, remotely operated robotic arm with haptic feedback was developed that attached to a standard x-ray table. This was used to safely manipulate a three-dimensional (3D) trans-thoracic echo probe during simultaneous x-ray fluoroscopy and echo acquisitions. By a combination of calibration and tracking of the echo and x-ray systems, it was possible to register the 3D echo images with the 2D x-ray images. Visualization of the combined data was achieved by either overlaying triangulated surfaces extracted from segmented echo data onto the x-ray images or by overlaying volume rendered 3D echo data. Furthermore, in order to overcome the limited field of view of the echo probe, it was possible to create extended field of view (EFOV) 3D echo images by co-registering multiple tracked echo data to generate larger roadmaps for procedure guidance. The registration method was validated using a cross-wire phantom and showed a 2D target registration error of 3.5 mm. The clinical feasibility of the method was demonstrated during two clinical cases for patients undergoing cardiac pacing studies. The EFOV technique was demonstrated using two healthy volunteers.
Article
The ratio of two probability density functions is becoming a quantity of interest these days in the machine learning and data mining communities since it can be used for various data processing tasks such as non-stationarity adaptation, outlier detection, and feature selection. Recently, several methods have been developed for directly estimating the density ratio without going through density estimation and were shown to work well in various practical problems. However, these methods still perform rather poorly when the dimensionality of the data domain is high. In this paper, we propose to incorporate a dimensionality reduction scheme into a density-ratio estimation procedure and experimentally show that the estimation accuracy in high-dimensional cases can be improved.
Article
A situation where training and test samples follow different input distributions is called covariate shift. Under covariate shift, standard learning methods such as maximum likelihood estimation are no longer consistent—weighted variants according to the ratio of test and training input densities are consistent. Therefore, accurately estimating the density ratio, called the importance, is one of the key issues in covariate shift adaptation. A naive approach to this task is to first estimate training and test input densities separately and then estimate the importance by taking the ratio of the estimated densities. However, this naive approach tends to perform poorly since density estimation is a hard task particularly in high dimensional cases. In this paper, we propose a direct importance estimation method that does not involve density estimation. Our method is equipped with a natural cross validation procedure and hence tuning parameters such as the kernel width can be objectively optimized. Furthermore, we give rigorous mathematical proofs for the convergence of the proposed algorithm. Simulations illustrate the usefulness of our approach.
Conference Paper
In this paper, a new learning framework - probabilistic boosting-tree (PBT), is proposed for learning two-class and multi-class discriminative models. In the learning stage, the probabilistic boosting-tree automatically constructs a tree in which each node combines a number of weak classifiers (evidence, knowledge,) into a strong classifier (a conditional posterior probability). It approaches the target posterior distribution by data augmentation (tree expansion) through a divide-and-conquer strategy. In the testing stage, the conditional probability is computed at each tree node based on the learned classifier, which guides the probability propagation in its sub-trees. The top node of the tree therefore outputs the overall posterior probability by integrating the probabilities gathered from its sub-trees. Also, clustering is naturally embedded in the learning phase and each sub-tree represents a cluster of certain level. The proposed framework is very general and it has interesting connections to a number of existing methods such as the A* algorithm, decision tree algorithms, generative models, and cascade approaches. In this paper, we show the applications of PBT for classification, detection, and object recognition. We have also applied the framework in segmentation.
Potential clinical application of interventional image fusion: the 3D TEE images can be shown together with the fluoroscopy data and from the same viewpoint. This enables soft tissue information (from ultrasound) and tool location (from fluoroscopy) to be visualized in the same coordinate system
  • Fig
Fig. 9. Potential clinical application of interventional image fusion: the 3D TEE images can be shown together with the fluoroscopy data and from the same viewpoint. This enables soft tissue information (from ultrasound) and tool location (from fluoroscopy) to be visualized in the same coordinate system.
Learning without labeling: Domain adaptation for ultrasound transducer localization
  • T Heimann
  • P Mountney
  • M John
  • R I Ionasec
Heimann, T., Mountney, P., John, M., Ionasec, R.I., 2013. Learning without labeling: Domain adaptation for ultrasound transducer localization, in: Proc MICCAI, pp. 49-56.
Potential clinical application of interventional image fusion: The 3D TEE images can be shown together with the fluoroscopy data and from the same viewpoint. This enables soft tissue information (from ultrasound) and tool location
  • O Beijbom
Beijbom, O., 2012. Domain Adaptation for Computer Vision Applications. Technical Report. University of California, San Diego. Figure 9: Potential clinical application of interventional image fusion: The 3D TEE images can be shown together with the fluoroscopy data and from the same viewpoint. This enables soft tissue information (from ultrasound) and tool location (from fluoroscopy) to be visualized in the same coordinate system.
Ultrasound and fluoroscopic images fusion by autonomous ultrasound probe detection
  • P Mountney
  • R Ionasec
  • M Kaiser
  • S Mamaghani
  • W Wu
  • T Chen
  • M John
  • J Boese
  • D Comaniciu
Mountney, P., Ionasec, R., Kaiser, M., Mamaghani, S., Wu, W., Chen, T., John, M., Boese, J., Comaniciu, D., 2012. Ultrasound and fluoroscopic images fusion by autonomous ultrasound probe detection, in: Proc MICCAI, Springer. pp. 544-551.