Article

Improving ultrasound video classification: an evaluation of novel deep learning methods in echocardiography

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Echocardiography is the commonest medical ultrasound examination, but automated interpretation is challenging and hinges on correct recognition of the 'view' (imaging plane and orientation). Current state-of-the-art methods for identifying the view computationally involve 2-dimensional convolutional neural networks (CNNs), but these merely classify individual frames of a video in isolation, and ignore information describing the movement of structures throughout the cardiac cycle. Here we explore the efficacy of novel CNN architectures, including time-distributed networks and two-stream networks, which are inspired by advances in human action recognition. We demonstrate that these new architectures more than halve the error rate of traditional CNNs from 8.1% to 3.9%. These advances in accuracy may be due to these networks' ability to track the movement of specific structures such as heart valves throughout the cardiac cycle. Finally, we show the accuracies of these new state-of-the-art networks are approaching expert agreement (3.6% discordance), with a similar pattern of discordance between views.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We have previously reported on preparation and annotation of a large patient dataset, covering a range of pathologies and including 14 different echocardiographic views, which we used for evaluating the performance of existing standard CNN architectures. 38 In this study, we will use this dataset to design customized network architectures for the task of echo view classification. ...
... In this section, a brief account of the patient dataset used in this study is provided. A detailed description, including patient characteristics, can be found in Howard et al. 38 A random sample of 374 echocardiographic examinations of different patients and performed between 2010 and 2020 was extracted from Imperial College Healthcare NHS Trust's echocardiogram database. The acquisition of the images was performed by experienced echocardiographers and according to standard protocols, using ultrasound equipment from GE and Philips manufacturers. ...
... 31 Interestingly, the two views the model found most difficult to correctly differentiate (A4CH-LV versus A5CH, and A2CH versus A3CH) were also the two views on which the two experts disagreed most often. 38 The A4CH view is in an anatomical continuity with the A5CH view. The difference is whether the scanning plane has been tilted to bring the aortic valve into view, which would make it A5CH. ...
Article
Purpose: Echocardiography is the most commonly used modality for assessing the heart in clinical practice. In an echocardiographic exam, an ultrasound probe samples the heart from different orientations and positions, thereby creating different viewpoints for assessing the cardiac function. The determination of the probe viewpoint forms an essential step in automatic echocardiographic image analysis. Approach: In this study, convolutional neural networks are used for the automated identification of 14 different anatomical echocardiographic views (larger than any previous study) in a dataset of 8732 videos acquired from 374 patients. Differentiable architecture search approach was utilized to design small neural network architectures for rapid inference while maintaining high accuracy. The impact of the image quality and resolution, size of the training dataset, and number of echocardiographic view classes on the efficacy of the models were also investigated. Results: In contrast to the deeper classification architectures, the proposed models had significantly lower number of trainable parameters (up to 99.9% reduction), achieved comparable classification performance (accuracy 88.4% to 96%, precision 87.8% to 95.2%, recall 87.1% to 95.1%) and real-time performance with inference time per image of 3.6 to 12.6 ms. Conclusion: Compared with the standard classification neural network architectures, the proposed models are faster and achieve comparable classification performance. They also require less training data. Such models can be used for real-time detection of the standard views.
... We have previously reported on preparation and annotation of a large patient dataset, covering a range of pathologies and including 14 different echocardiographic views, which we used for evaluating the performance of existing standard CNN architectures. 38 In this study, we will use this dataset to design customized network architectures for the task of echo view classification. ...
... In this section, a brief account of the patient dataset used in this study is provided. A detailed description, including patient characteristics, can be found in Howard et al. 38 A random sample of 374 echocardiographic examinations of different patients and performed between 2010 and 2020 was extracted from Imperial College Healthcare NHS Trust's echocardiogram database. The acquisition of the images was performed by experienced echocardiographers and according to standard protocols, using ultrasound equipment from GE and Philips manufacturers. ...
... 31 Interestingly, the two views the model found most difficult to correctly differentiate (A4CH-LV versus A5CH, and A2CH versus A3CH) were also the two views on which the two experts disagreed most often. 38 The A4CH view is in an anatomical continuity with the A5CH view. The difference is whether the scanning plane has been tilted to bring the aortic valve into view, which would make it A5CH. ...
Preprint
Full-text available
Abstract Purpose: Echocardiography is the most commonly used modality for assessing the heart in clinical practice. In an echocardiographic exam, an ultrasound probe samples the heart from different orientations and positions, thereby creating different viewpoints for assessing the cardiac function. The determination of the probe viewpoint forms an essential step in automatic echocardiographic image analysis. Approach: In this study, convolutional neural networks are used for the automated identification of 14 different anatomical echocardiographic views (larger than any previous study) in a dataset of 8732 videos acquired from 374 patients. Differentiable architecture search approach was utilized to design small neural network architectures for rapid inference while maintaining high accuracy. The impact of the image quality and resolution, size of the training dataset, and number of echocardiographic view classes on the efficacy of the models were also investigated. Results: In contrast to the deeper classification architectures, the proposed models had significantly lower number of trainable parameters (up to 99.9% reduction), achieved comparable classification performance (accuracy 88.4% to 96%, precision 87.8% to 95.2%, recall 87.1% to 95.1%) and real-time performance with inference time per image of 3.6 to 12.6 ms. Conclusion: Compared with the standard classification neural network architectures, the proposed models are faster and achieve comparable classification performance. They also require less training data. Such models can be used for real-time detection of the standard views.
... Automated anonymisation was performed to remove the patient-identifiable information. A detailed description, including patient characteristics, can be found in Howard et al. [22]. ...
... A CNN model, previously developed in our research group to detect different echocardiographic views [22], was then used to identify and separate the A4C views. A total of 1,000 videos from different patients of varying lengths, containing 1-3 heartbeats, were randomly selected. ...
Article
Tissue Doppler imaging is an essential echocardiographic technique for the non-invasive assessment of myocardial blood velocity. Image acquisition and interpretation are performed by trained operators who visually localise landmarks representing Doppler peak velocities. Current clinical guidelines recommend averaging measurements over several heartbeats. However, this manual process is both time-consuming and disruptive to workflow. An automated system for accurate beat isolation and landmark identification would be highly desirable. A dataset of tissue Doppler images was annotated by three cardiologist experts, providing a gold standard and allowing for observer variability comparisons. Deep neural networks were trained for fully automated predictions on multiple heartbeats and tested on tissue Doppler strips of arbitrary length. Automated measurements of peak Doppler velocities show good Bland–Altman agreement (average standard deviation of 0.40 cm/s) with consensus expert values; less than the inter-observer variability (0.65 cm/s). Performance is akin to individual experts (standard deviation of 0.40 to 0.75 cm/s). Our approach allows for > 26 times as many heartbeats to be analysed, compared to a manual approach. The proposed automated models can accurately and reliably make measurements on tissue Doppler images spanning several heartbeats, with performance indistinguishable from that of human experts, but with significantly shorter processing time. • Novel approach successfully identifies heartbeats from Tissue Doppler Images • Accurately measures peak velocities on several heartbeats • Framework is fast and can make predictions on arbitrary length images • Patient dataset and models made public for future benchmark studies
... By leveraging spatial and temporal information from multiple image frames across an echocardiographic video, DCNN(2D+t) and RNN models have the potential to detect subtle functional/motion changes through a cumulative evaluation of the continuous movements of the heart, therefore being likely more sensitive than DCNN(2D) models. [21,23] The visualization data support the theory that the DCNN(2D+t) network's ability to discriminate between such classes may be in part due to its ability to track the movement of cardiac structures (basal LV and RV walls) throughout the simultaneous multi-dimensional motion, in order to increase data resolution and catch "invisible" spatio-temporal imaging information. With a limited view of the receptive field, DCNN(2D+t) is able to "see" only a limited number of frames concurrently with the 3D convolution operations, which enables learning of relatively regional ventricular motions. ...
... With a limited view of the receptive field, DCNN(2D+t) is able to "see" only a limited number of frames concurrently with the 3D convolution operations, which enables learning of relatively regional ventricular motions. [21,22] In the present study, DL networks significantly reduced erroneous human "judgement calls" on TTS. In contrast to a normal distribution pattern of the numbers of readers making the correct diagnosis on STEMI, the distribution of the numbers of readers that correctly identified TTS appears to be random and arbitrary (Fig. 4), which did not reflect readers' various training background and experiences. ...
Article
Full-text available
Background We investigate whether deep learning (DL) neural networks can reduce erroneous human “judgment calls” on bedside echocardiograms and help distinguish Takotsubo syndrome (TTS) from anterior wall ST segment elevation myocardial infarction (STEMI). Methods We developed a single-channel (DCNN[2D SCI]), a multi-channel (DCNN[2D MCI]), and a 3-dimensional (DCNN[2D+t]) deep convolution neural network, and a recurrent neural network (RNN) based on 17,280 still-frame images and 540 videos from 2-dimensional echocardiograms in 10 years (1 January 2008 to 1 January 2018) retrospective cohort in University of Iowa (UI) and eight other medical centers. Echocardiograms from 450 UI patients were randomly divided into training and testing sets for internal training, testing, and model construction. Echocardiograms of 90 patients from the other medical centers were used for external validation to evaluate the model generalizability. A total of 49 board-certified human readers performed human-side classification on the same echocardiography dataset to compare the diagnostic performance and help data visualization. Findings The DCNN (2D SCI), DCNN (2D MCI), DCNN(2D+t), and RNN models established based on UI dataset for TTS versus STEMI prediction showed mean diagnostic accuracy 73%, 75%, 80%, and 75% respectively, and mean diagnostic accuracy of 74%, 74%, 77%, and 73%, respectively, on the external validation. DCNN(2D+t) (area under the curve [AUC] 0·787 vs. 0·699, P = 0·015) and RNN models (AUC 0·774 vs. 0·699, P = 0·033) outperformed human readers in differentiating TTS and STEMI by reducing human erroneous judgement calls on TTS. Interpretation Spatio-temporal hybrid DL neural networks reduce erroneous human “judgement calls” in distinguishing TTS from anterior wall STEMI based on bedside echocardiographic videos. Funding University of Iowa Obermann Center for Advanced Studies Interdisciplinary Research Grant, and Institute for Clinical and Translational Science Grant. National Institutes of Health Award (1R01EB025018–01).
... Automated anonymisation was performed to remove the patient-identifiable information. A detailed description, including patient characteristics, can be found in Howard et al. [22]. ...
... A CNN model, previously developed in our research group to detect different echocardiographic views [22], was then used to identify and separate the A4C views. A total of 1,000 videos from different patients of varying lengths, containing 1-3 heartbeats, were randomly selected. ...
Conference Paper
Accurate identification of end-diastolic (ED) and end-systolic (ES) frames in echocardiographic cine loops is essential when measuring cardiac function. Manual selection by human experts is challenging and error prone. We present a deep neural network trained and tested on multi-centre patient data for accurate phase detection in apical four-chamber videos of arbitrary length, spanning several heartbeats, with performance indistinguishable from that of human experts.
... Automated anonymisation was performed to remove the patient-identifiable information. A detailed description, including patient characteristics, can be found in Howard et al. [22]. ...
... A CNN model, previously developed in our research group to detect different echocardiographic views [22], was then used to identify and separate the A4C views. A total of 1,000 videos from different patients of varying lengths, containing 1-3 heartbeats, were randomly selected. ...
Article
Background Accurate identification of end-diastolic and end-systolic frames in echocardiographic cine loops is important, yet challenging, for human experts. Manual frame selection is subject to uncertainty, affecting crucial clinical measurements, such as myocardial strain. Therefore, the ability to automatically detect frames of interest is highly desirable. Methods We have developed deep neural networks, trained and tested on multi-centre patient data, for the accurate identification of end-diastolic and end-systolic frames in apical four-chamber 2D multibeat cine loop recordings of arbitrary length. Seven experienced cardiologist experts independently labelled the frames of interest, thereby providing infallible annotations, allowing for observer variability measurements. Results When compared with the ground-truth, our model shows an average frame difference of -0.09±1.10 and 0.11±1.29 frames for end-diastolic and end-systolic frames, respectively. When applied to patient datasets from a different clinical site, to which the model was blind during its development, average frame differences of -1.34±3.27 and -0.31±3.37 frames were obtained for both frames of interest. All detection errors fall within the range of inter-observer variability: [-0.87, -5.51]±[2.29, 4.26] and [-0.97, -3.46]±[3.67, 4.68] for ED and ES events, respectively. Conclusions The proposed automated model can identify multiple end-systolic and end-diastolic frames in echocardiographic videos of arbitrary length with performance indistinguishable from that of human experts, but with significantly shorter processing time.
... However, they often do not take advantage of the 3D aspect of the data. The same issue occurs when stacking slice-wise embeddings [22,11,4], applying self-attention [5] for feature aggregation, or using principal component analysis (PCA) [9] to reduce variable number of embeddings to a fixed size. As an alternative, recurrent neural networks (RNNs) [18] consider the volume as a sequence of arranged slices or the corresponding embeddings. ...
Preprint
Full-text available
The automatic classification of 3D medical data is memory-intensive. Also, variations in the number of slices between samples is common. Naive solutions such as subsampling can solve these problems, but at the cost of potentially eliminating relevant diagnosis information. Transformers have shown promising performance for sequential data analysis. However, their application for long-sequences is data, computationally, and memory demanding. In this paper, we propose an end-to-end Transformer-based framework that allows to classify volumetric data of variable length in an efficient fashion. Particularly, by randomizing the input slice-wise resolution during training, we enhance the capacity of the learnable positional embedding assigned to each volume slice. Consequently, the accumulated positional information in each positional embedding can be generalized to the neighbouring slices, even for high resolution volumes at the test time. By doing so, the model will be more robust to variable volume length and amenable to different computational budgets. We evaluated the proposed approach in retinal OCT volume classification and achieved 21.96% average improvement in balanced accuracy on a 9-class diagnostic task, compared to state-of-the-art video transformers. Our findings show that varying the slice-wise resolution of the input during training results in more informative volume representation as compared to training with fixed number of slices per volume. Our code is available at: https://github.com/marziehoghbaie/VLFAT.
... Deep learning-based CADx systems have been utilized to determine the optimal views of ultrasound images of various anatomic structures, including fetal, 28 cardiac, 29 and breast 30 structures. Studies have evaluated the use of ultrasound videos to determine BP at the interscalene level 9,31 and to identify anatomic structures in several regions of the body. ...
Preprint
Full-text available
Background Successful ultrasound-guided supraclavicular block (SCB) requires the understanding of sonoanatomy and identification of the optimal view. Segmentation using a convolutional neural network (CNN) is limited in clearly determining the optimal view. The present study describes the development of a computer-aided diagnosis (CADx) system using a CNN that can determine the optimal view for complete SCB in real time. Objective The aim of this study was the development of computer-aided diagnosis system that aid non-expert to determine the optimal view for complete supraclavicular block in real time. Methods Ultrasound videos were retrospectively collected from 881 patients to develop the CADx system (600 to the training and validation set and 281 to the test set). The CADx system included classification and segmentation approaches, with Residual neural network (ResNet) and U-Net, respectively, applied as backbone networks. In the classification approach, an ablation study was performed to determine the optimal architecture and improve the performance of the model. In the segmentation approach, a cascade structure, in which U-Net is connected to ResNet, was implemented. The performance of the two approaches was evaluated based on a confusion matrix. Results Using the classification approach, ResNet34 and gated recurrent units with augmentation showed the highest performance, with average accuracy 0.901, precision 0.613, recall 0.757, f1-score 0.677 and AUROC 0.936. Using the segmentation approach, U-Net combined with ResNet34 and augmentation showed poorer performance than the classification approach. Conclusions The CADx system described in this study showed high performance in determining the optimal view for SCB. This system could be expanded to include many anatomical regions and may have potential to aid clinicians in real-time settings. Trial registration The protocol was registered with the Clinical Trial Registry of Korea (KCT0005822, https://cris.nih.go.kr)
... Therefore, some studies have investigated the performance of three-dimensional convolutional neural networks (3D CNNs) in ultrasound video recognition [13,14], such as echocardiography and fetal ultrasound, achieving better performance than 2D models in these tasks. For ultrasound video, the main difference between 3D and 2D lies in whether they can handle time-direction information. ...
Preprint
Full-text available
The purpose of this study is to develop a computer-aided diagnosis system for classifying benign and malignant lung lesions, and to assist physicians in real-time analysis of radial probe endobronchial ultrasound (EBUS) videos. During the biopsy process of lung cancer, physicians use real-time ultrasound images to find suitable lesion locations for sampling. However, most of these images are difficult to classify and contain a lot of noise. Previous studies have employed 2D convolutional neural networks to effectively differentiate between benign and malignant lung lesions, but doctors still need to manually select good-quality images, which can result in additional labor costs. In addition, the 2D neural network has no ability to capture the temporal information of the ultrasound video, so it is difficult to obtain the relationship between the features of the continuous images. This study designs an automatic diagnosis system based on a 3D neural network, uses the SlowFast architecture as the backbone to fuse temporal and spatial features, and uses the SwAV method of contrastive learning to enhance the noise robustness of the model. The method we propose includes the following advantages, such as (1) using clinical ultrasound films as model input, thereby reducing the need for high-quality image selection by physicians, (2) high-accuracy classification of benign and malignant lung lesions can assist doctors in clinical diagnosis and reduce the time and risk of surgery, and (3) the capability to classify well even in the presence of significant image noise. The AUC, accuracy, precision, recall and specificity of our proposed method on the validation set reached 0.87, 83.87%, 86.96%, 90.91% and 66.67%, respectively. The results have verified the importance of incorporating temporal information and the effectiveness of using the method of contrastive learning on feature extraction.
... Computer vision operationalizes machines to recognize and analyze still images and videos. Recent advances in DL and computational capabilities have improved software abilities in video classification problems [75]. ...
Article
Full-text available
Exponential growth in data storage and computational power is rapidly narrowing the gap between translating findings from advanced clinical informatics into cardiovascular clinical practice. Specifically, cardiovascular imaging has the distinct advantage in providing a great quantity of data for potentially rich insights, but nuanced interpretation requires a high-level skillset that few individuals possess. A subset of machine learning, deep learning (DL), is a modality that has shown promise, particularly in the areas of image recognition, computer vision, and video classification. Due to a low signal-to-noise ratio, echocardiographic data tend to be challenging to classify; however, utilization of robust DL architectures may help clinicians and researchers automate conventional human tasks and catalyze the extraction of clinically useful data from the petabytes of collected imaging data. The promise is extending far and beyond towards a contactless echocardiographic exam—a dream that is much needed in this time of uncertainty and social distancing brought on by a stunning pandemic culture. In the current review, we discuss state-of-the-art DL techniques and architectures that can be used for image and video classification, and future directions in echocardiographic research in the current era.
... On the other hand, Howard et al. (36) aimed at improving the current state-of-the-art of view classification by exploring the efficacy of time-distributed networks and two stream networks. The dataset used, from Imperial College Healthcare NHS Trust's echocardiogram database, consists of 8,732 videos classified as one of 14 views by an expert. ...
Article
Full-text available
Echocardiography is the most frequently used imaging modality in cardiology. However, its acquisition is affected by inter-observer variability and largely dependent on the operator’s experience. In this context, artificial intelligence techniques could reduce these variabilities and provide a user independent system. In recent years, machine learning (ML) algorithms have been used in echocardiography to automate echocardiographic acquisition. This review focuses on the state-of-the-art studies that use ML to automate tasks regarding the acquisition of echocardiograms, including quality assessment (QA), recognition of cardiac views and assisted probe guidance during the scanning process. The results indicate that performance of automated acquisition was overall good, but most studies lack variability in their datasets. From our comprehensive review, we believe automated acquisition has the potential not only to improve accuracy of diagnosis, but also help novice operators build expertise and facilitate point of care healthcare in medically underserved areas.
... Howard et al. applied such a two-stream CNN to automatically determine the scan view from echocardiography data. Such two-stream CNNs can potentially lower the computational overhead for POCUS analysis and classification [54]. ...
Article
Full-text available
When a patient presents to the ED, clinicians often turn to medical imaging to better understand their condition. Traditionally, imaging is collected from the patient and interpreted by a radiologist remotely. However, scanning devices are increasingly equipped with analytical software that can provide quantitative assessments at the patient’s bedside. These assessments often rely on machine learning algorithms as a means of interpreting medical images.
... Other research from [14] comparing the use of four CNN architectures which aims to classify 14 classes of echocardiographic views consisting of single frame classification (2D CNN), multi-frame classification (TD CNN), spatio-temporal convolution (3D CNN) and two stream classifications. The best-performing model was a "two-stream" network using both spatial and optical flow inputs, with a corresponding error rate 3.9%. ...
... By using transfer learning, the most recent published medical paper shows that ECG videos can be inference as action recognition problem so that some stateof-the-art methods in action recognition fields are applied in the ECG video classification and regression problems (Howard et al., 2020). One of the most prominent approaches refers to the two-stream network which facilitates two pipelines with original image sequences and optical flow sequences as inputs respectively (Simonyan and Zisserman, 2014), and it exploits the same idea to push the performance of the models to the limits by using multi-task learning (Feichtenhofer et al., 2017). ...
Preprint
Full-text available
This project aims to build a model for automatically analyzing echocardiogram (ECG) images. The model is composed of two tasks. The first task is to predict the ejection fraction (EF) of the left ventricular (LV). EF refers to a proportion of blood pumped out from LV each contraction and is the most common metric to diagnose heart failureslures (McMurray et al., 2012).
... Several medical image classification tasks have been successfully performed with convolutional neural networks. [10][11][12][13][14][15][16] Radiographs of plain film can be successfully used to identify implants routinely. Since only three implant models have been used so far or the level of classification accuracy has been limited thus far, it has not been demonstrated that deep learning can be used to identify implants in any real-world setting. ...
Article
Aim In preoperative planning of a revision knee arthroplasty, it can be challenging to identify an implant manufacturer type from a primary knee arthroplasty due to the inability to identify the implants in time. It has been shown that deep learning improves diagnosis with each iteration in the medical field. The problem of identifying the manufacturer and model of knee arthroplasty prostheses has been solved using automated deep learning models. A deep learning algorithm was used to identify knee arthroplasty, implant models. The current study aimed to identify the best machine learning-based model for detecting knee implants according to different manufacturing types from plain radiographs and compare the efficacy and accuracy of seven different models. Material and methods Plain radiographs of 521 knee arthroplasty implants of six different manufacturers were taken from the anteroposterior and lateral perspectives to train, validate, and test the implants. Among 521 x-ray images, 70% were used in the initial training process, 10% in testing the models, and 20% in determining the accuracy and validity of the models. The advantage of transfer learning for knee implant detection is that if existing models are already trained on a large enough and general dataset, these models can be used to fulfil the study's objectives. In addition, to establish the efficacy of these knee implants, two orthopaedic consultants specialised in arthroplasty independently identified these manufacturers types. Results The performance and network of the model resulted in high accuracy for identifying knee implant types out of seven models, five of which had more than 90% accuracy. After 20 training epochs on all seven models, the model that performed best achieved 95.5% accuracy. VGG-16 produces the best results with precision of 98.4% and accuracy of 95.5% on validation dataset. Furthermore, the study showed that machine learning outperformed two human expert, who achieved an average accuracy of 78%. Conclusion Study could lead to the development of an automated implant identification tool that could help healthcare professionals make better accurate and quicker decisions.
... Our proof-of-concept study demonstrates that a deep neural network algorithm is capable of being trained to identify germinal matrix hemorrhage on head ultrasound with acceptable accuracy, even with a limited data set. There is a growing number of studies that have assessed machine learning applications with ultrasound imaging, such as in the liver or thyroid, but to our knowledge, none has assessed identification of germinal matrix hemorrhage [11][12][13]. Compared to other neural network models that have been trained to classify abnormalities on imaging, our model utilized a much smaller data set [14][15][16][17]. ...
Article
Full-text available
Background Germinal matrix hemorrhage–intraventricular hemorrhage is among the most common intracranial complications in premature infants. Early detection is important to guide clinical management for improved patient prognosis. Objective The purpose of this study was to assess whether a convolutional neural network (CNN) can be trained via transfer learning to accurately diagnose germinal matrix hemorrhage on head ultrasound. Materials and methods Over a 10-year period, 400 head ultrasounds performed in patients ages 6 months or younger were reviewed. Key sagittal images at the level of the caudothalamic groove were obtained from 200 patients with germinal matrix hemorrhage and 200 patients without hemorrhage; all images were reviewed by a board-certified pediatric radiologist. One hundred cases were randomly allocated from the total for validation and an additional 100 for testing of a CNN binary classifier. Transfer learning and data augmentation were used to train the model. Results The median age of patients was 0 weeks old with a median gestational age of 30 weeks. The final trained CNN model had a receiver operating characteristic area under the curve of 0.92 on the validation set and accuracy of 0.875 on the test set, with 95% confidence intervals of [0.86, 0.98] and [0.81, 0.94], respectively. Conclusion A CNN trained on a small set of images with data augmentation can detect germinal matrix hemorrhage on head ultrasounds with strong accuracy.
... Protocol 1 was designed to develop and test the accuracy of an ML approach for automated identification of image types and views, similar to recent studies, [4][5][6] and assigning them to "reading stacks," in order to streamline review and interpretation. The convolutional neural network (CNN) was trained to identify image types and recognize 18 standard views from two-dimensional, tissue Doppler and pulsed-wave and continuous-wave Doppler images. ...
Chapter
Many medical ultrasound video recognition tasks involve identifying key anatomical features regardless of when they appear in the video suggesting that modeling such tasks may not benefit from temporal features. Correspondingly, model architectures that exclude temporal features may have better sample efficiency. We propose a novel multi-head attention architecture that incorporates these hypotheses as inductive priors to achieve better sample efficiency on common ultrasound tasks. We compare the performance of our architecture to an efficient 3D CNN video recognition model in two settings: one where we expect not to require temporal features and one where we do. In the former setting, our model outperforms the 3D CNN - especially when we artificially limit the training data. In the latter, the outcome reverses. These results suggest that expressive time-independent models may be more effective than state-of-the-art video recognition models for some common ultrasound tasks in the low-data regime. Code is available at https://github.com/MedAI-Clemson/pda_detection.
Chapter
The automatic classification of 3D medical data is memory-intensive. Also, variations in the number of slices between samples is common. Naïve solutions such as subsampling can solve these problems, but at the cost of potentially eliminating relevant diagnosis information. Transformers have shown promising performance for sequential data analysis. However, their application for long sequences is data, computationally, and memory demanding. In this paper, we propose an end-to-end Transformer-based framework that allows to classify volumetric data of variable length in an efficient fashion. Particularly, by randomizing the input volume-wise resolution(#slices) during training, we enhance the capacity of the learnable positional embedding assigned to each volume slice. Consequently, the accumulated positional information in each positional embedding can be generalized to the neighbouring slices, even for high-resolution volumes at the test time. By doing so, the model will be more robust to variable volume length and amenable to different computational budgets. We evaluated the proposed approach in retinal OCT volume classification and achieved 21.96\(\%\) average improvement in balanced accuracy on a 9-class diagnostic task, compared to state-of-the-art video transformers. Our findings show that varying the volume-wise resolution of the input during training results in more informative volume representation as compared to training with fixed number of slices per volume.
Article
Full-text available
A growing number of artificial intelligence (AI)-based systems are being proposed and developed in cardiology, driven by the increasing need to deal with the vast amount of clinical and imaging data with the ultimate aim of advancing patient care, diagnosis and prognostication. However, there is a critical gap between the development and clinical deployment of AI tools. A key consideration for implementing AI tools into real-life clinical practice is their “trustworthiness” by end-users. Namely, we must ensure that AI systems can be trusted and adopted by all parties involved, including clinicians and patients. Here we provide a summary of the concepts involved in developing a “trustworthy AI system.” We describe the main risks of AI applications and potential mitigation techniques for the wider application of these promising techniques in the context of cardiovascular imaging. Finally, we show why trustworthy AI concepts are important governing forces of AI development.
Article
Purpose: This is a foundational study in which multiorgan system point of care ultrasound (POCUS) and machine learning (ML) are used to mimic physician management decisions regarding the functional intravascular volume status (IVS) and need for diuretic therapy. We present this as an impactful use case of an application of ML in aided decision making for clinical practice. IVS represents complex physiologic interactions of the cardiac, renal, pulmonary, and other organ systems. In particular, we focus on vascular congestion and overload as an evolving concept in POCUS diagnosis and clinical relevance. It is critical for physicians to be able to evaluate IVS without disrupting workflow or exposing patients to unnecessary testing, radiation, or cost. This work utilized a small retrospective dataset as a feasibility test for ML binary classification of diuretic administration validated with clinical decision data. Future work will be directed toward artificial intelligence (AI) delivery at the bedside and assessment of the impact on patient-centered outcomes and physician workflow improvement. Approach: We retrospectively reviewed and processed 1039 POCUS video clips, including cardiac, thoracic, and inferior vena cava (IVC) views. Multiorgan POCUS clips were correlated with clinical data extracted from the electronic health record and deidentified for algorithm training and validation. We implemented a two-stream three-dimensional (3D) deep learning approach that fuses heart and IVC data to perform binary classification of the need for diuretic use. Results: Our proposed approach achieves high classification accuracy (84%) for the determination of diuretic use with 0.84 area under the receiver operating characteristic curve. Conclusions: Our two-stream 3D deep neural network is able to classify POCUS video clips that match physicians' classification for or against diuretic use with high accuracy. This serves as a foundational step in the progress toward AI-aided diagnosis and AI implementation in the field of IVS evaluation by POCUS.
Article
Background and purpose: In this review we describe the use of artificial intelligence in the field of echocardiography. Various aspects and terminologies used in artificial intelligence are explained in an easy-to-understand manner and supplemented with illustrations related to echocardiography. Limitations of artificial intelligence, including epistemologic concerns from a philosophical standpoint, are also discussed. Methods: A narrative review of relevant papers was conducted. Conclusion: We provide an overview of the usefulness of artificial intelligence in echocardiography and focus on how it can supplement current day-to-day clinical practice in the assessment of various cardiovascular disease entities. On the other hand, there are significant limitations, including epistemological concerns, which need to be kept in perspective.
Article
Full-text available
Secundum atrial septal defect (ASD) is one of the most common congenital heart diseases (CHDs). This study aims to evaluate the feasibility and accuracy of automatic detection of ASD in children based on color Doppler echocardiographic images using convolutional neural networks. In this study, we propose a fully automatic detection system for ASD, which includes three stages. The first stage is used to identify four target echocardiographic views (that is, the subcostal view focusing on the atrium septum, the apical four-chamber view, the low parasternal four-chamber view, and the parasternal short-axis view). These four echocardiographic views are most useful for the diagnosis of ASD clinically. The second stage aims to segment the target cardiac structure and detect candidates for ASD. The third stage is to infer the final detection by utilizing the segmentation and detection results of the second stage. The proposed ASD detection system was developed and validated using a training set of 4,031 cases containing 370,057 echocardiographic images and an independent test set of 229 cases containing 203,619 images, of which 105 cases with ASD and 124 cases with intact atrial septum. Experimental results showed that the proposed ASD detection system achieved accuracy, recall, precision, specificity, and F1 score of 0.8833, 0.8545, 0.8577, 0.9136, and 0.8546, respectively on the image-level averages of the four most clinically useful echocardiographic views. The proposed system can automatically and accurately identify ASD, laying a good foundation for the subsequent artificial intelligence diagnosis of CHDs.
Article
Endobronchial ultrasound (EBUS) elastography videos have shown great potential to supplement intrathoracic lymph node diagnosis. However, it is laborious and subjective for the specialists to select the representative frames from the tedious videos and make a diagnosis, and there lacks a framework for automatic representative frame selection and diagnosis. To this end, we propose a novel deep learning framework that achieves reliable diagnosis by explicitly selecting sparse representative frames and guaranteeing the invariance of diagnostic results to the permutations of video frames. Specifically, we develop a differentiable sparse graph attention mechanism that jointly considers frame-level features and the interactions across frames to select sparse representative frames and exclude disturbed frames. Furthermore, instead of adopting deep learning-based frame-level features, we introduce the normalized color histogram that considers the domain knowledge of EBUS elastography images and achieves superior performance. To our best knowledge, the proposed framework is the first to simultaneously achieve automatic representative frame selection and diagnosis with EBUS elastography videos. Experimental results demonstrate that it achieves an average accuracy of 81.29% and area under the receiver operating characteristic curve (AUC) of 0.8749 on the collected dataset of 727 EBUS elastography videos, which is comparable to the performance of the expert-based clinical methods based on manually-selected representative frames.
Chapter
View classification is a key initial step for the analysis of echocardiograms. Typical deep learning classifiers can make highly confident errors unnoticed by human operators, consequential for downstream tasks. Instead of failing, it is important to create a method that alarms “I don’t know” to inform clinicians of potential errors when faced with difficult or novel inputs. This paper proposes Efficient-Evidential Network (Efficient-EvidNet), a lightweight framework designed to classify echocardiogram views and simultaneously provide a sampling-free uncertainty prediction. Evidential uncertainty is used to filter faulty results and flag out the outliers, hence, improving the overall performance. Efficient-EvidNet classifies among 13 standard echo views with 91.9% test accuracy, competitive with other state-of-the-art lightweight networks. Notably, it achieves a 97.6% test accuracy when only reporting on data with low evidential uncertainty. Further, we propose improved techniques for outlier detection, reaching a 0.97 area under the ROC curve for differentiating between cardiac and lung ultrasound, for which the latter is unseen throughout the training. Efficient-EvidNet does not require costly sampling steps for uncertainty estimation and uses a low parameter neural network, providing two key features that are essential for real-time deployment in clinical scenarios.
Article
The echocardiogram is a test that is widely used in Heart Disease Diagnoses. However, its analysis is largely dependent on the physician’s experience. In this regard, artificial intelligence has become an essential technology to assist physicians. This study is a Systematic Literature Review (SLR) of primary state-of-the-art studies that used Artificial Intelligence (AI) techniques to automate echocardiogram analyses. Searches on the leading scientific article indexing platforms using a search string returned approximately 1400 articles. After applying the inclusion and exclusion criteria, 118 articles were selected to compose the detailed SLR. This SLR presents a thorough investigation of AI applied to support medical decisions for the main types of echocardiogram (Transthoracic, Transesophageal, Doppler, Stress, and Fetal). The article’s data extraction indicated that the primary research interest of the studies comprised four groups: 1) Improvement of image quality; 2) identification of the cardiac window vision plane; 3) quantification and analysis of cardiac functions, and; 4) detection and classification of cardiac diseases. The articles were categorized and grouped to show the main contributions of the literature to each type of ECHO. The results indicate that the Deep Learning (DL) methods presented the best results for the detection and segmentation of the heart walls, right and left atrium and ventricles, and classification of heart diseases using images/videos obtained by echocardiography. The models that used Convolutional Neural Network (CNN) and its variations showed the best results for all groups. The evidence produced by the results presented in the tabulation of the studies indicates that the DL contributed significantly to advances in echocardiogram automated analysis processes. Although several solutions were presented regarding the automated analysis of ECHO, this area of research still has great potential for further studies to improve the accuracy of results already known in the literature.
Article
Full-text available
Importance . With the booming growth of artificial intelligence (AI), especially the recent advancements of deep learning, utilizing advanced deep learning-based methods for medical image analysis has become an active research area both in medical industry and academia. This paper reviewed the recent progress of deep learning research in medical image analysis and clinical applications. It also discussed the existing problems in the field and provided possible solutions and future directions. Highlights . This paper reviewed the advancement of convolutional neural network-based techniques in clinical applications. More specifically, state-of-the-art clinical applications include four major human body systems: the nervous system, the cardiovascular system, the digestive system, and the skeletal system. Overall, according to the best available evidence, deep learning models performed well in medical image analysis, but what cannot be ignored are the algorithms derived from small-scale medical datasets impeding the clinical applicability. Future direction could include federated learning, benchmark dataset collection, and utilizing domain subject knowledge as priors. Conclusion . Recent advanced deep learning technologies have achieved great success in medical image analysis with high accuracy, efficiency, stability, and scalability. Technological advancements that can alleviate the high demands on high-quality large-scale datasets could be one of the future developments in this area.
Article
Full-text available
Accurate identification of metallic orthopedic implant design is important for preoperative planning of revision arthroplasty. Surgical records of implant models are frequently unavailable. The aim of this study was to develop and evaluate a convolutional neural network for identifying orthopedic implant models using radiographs. In this retrospective study, 427 knee and 922 hip unilateral anteroposterior radiographs, including 12 implant models from 650 patients, were collated from an orthopedic center between March 2015 and November 2019 to develop classification networks. A total of 198 images paired with autogenerated image masks were used to develop a U-Net segmentation network to automatically zero-mask around the implants on the radiographs. Classification networks processing original radiographs, and two-channel conjoined original and zero-masked radiographs, were ensembled to provide a consensus prediction. Accuracies of five senior orthopedic specialists assisted by a reference radiographic gallery were compared with network accuracy using McNemar exact test. When evaluated on a balanced unseen dataset of 180 radiographs, the final network achieved a 98.9% accuracy (178 of 180) and 100% top-three accuracy (180 of 180). The network performed superiorly to all five specialists (76.1% [137 of 180] median accuracy and 85.6% [154 of 180] best accuracy; both P < .001), with robustness to scan quality variation and difficult to distinguish implants. A neural network model was developed that outperformed senior orthopedic specialists at identifying implant models on radiographs; real-world application can now be readily realized through training on a broader range of implants and joints, supported by all code and radiographs being made freely available. Supplemental material is available for this article. Keywords: Neural Networks, Skeletal-Appendicular, Knee, Hip, Computer Applications-General (Informatics), Prostheses, Technology Assess-ment, Observer Performance © RSNA, 2021.
Article
Full-text available
Objectives: This paper reports the development, validation, and public availability of a new neural network-based system which attempts to identify the manufacturer and even the model group of a pacemaker or defibrillator from a chest radiograph. Background: Medical staff often need to determine the model of a pacemaker or defibrillator (cardiac rhythm device) quickly and accurately. Current approaches involve comparing a device's radiographic appearance with a manual flow chart. Methods: In this study, radiographic images of 1,676 devices, comprising 45 models from 5 manufacturers were extracted. A convolutional neural network was developed to classify the images, using a training set of 1,451 images. The testing set contained an additional 225 images consisting of 5 examples of each model. The network's ability to identify the manufacturer of a device was compared with that of cardiologists, using a published flowchart. Results: The neural network was 99.6% (95% confidence interval [CI]: 97.5% to 100.0%) accurate in identifying the manufacturer of a device from a radiograph and 96.4% (95% CI: 93.1% to 98.5%) accurate in identifying the model group. Among 5 cardiologists who used the flowchart, median identification of manufacturer accuracy was 72.0% (range 62.2% to 88.9%), and model group identification was not possible. The network's ability to identify the manufacturer of the devices was significantly superior to that of all the cardiologists (p < 0.0001 compared with the median human identification; p < 0.0001 compared with the best human identification). Conclusions: A neural network can accurately identify the manufacturer and even model group of a cardiac rhythm device from a radiograph and exceeds human performance. This system may speed up the diagnosis and treatment of patients with cardiac rhythm devices, and it is publicly accessible online.
Article
Full-text available
Background: Automated cardiac image interpretation has the potential to transform clinical practice in multiple ways, including enabling serial assessment of cardiac function by nonexperts in primary care and rural settings. We hypothesized that advances in computer vision could enable building a fully automated, scalable analysis pipeline for echocardiogram interpretation, including (1) view identification, (2) image segmentation, (3) quantification of structure and function, and (4) disease detection. Methods: Using 14 035 echocardiograms spanning a 10-year period, we trained and evaluated convolutional neural network models for multiple tasks, including automated identification of 23 viewpoints and segmentation of cardiac chambers across 5 common views. The segmentation output was used to quantify chamber volumes and left ventricular mass, determine ejection fraction, and facilitate automated determination of longitudinal strain through speckle tracking. Results were evaluated through comparison to manual segmentation and measurements from 8666 echocardiograms obtained during the routine clinical workflow. Finally, we developed models to detect 3 diseases: hypertrophic cardiomyopathy, cardiac amyloid, and pulmonary arterial hypertension. Results: Convolutional neural networks accurately identified views (eg, 96% for parasternal long axis), including flagging partially obscured cardiac chambers, and enabled the segmentation of individual cardiac chambers. The resulting cardiac structure measurements agreed with study report values (eg, median absolute deviations of 15% to 17% of observed values for left ventricular mass, left ventricular diastolic volume, and left atrial volume). In terms of function, we computed automated ejection fraction and longitudinal strain measurements (within 2 cohorts), which agreed with commercial software-derived values (for ejection fraction, median absolute deviation=9.7% of observed, N=6407 studies; for strain, median absolute deviation=7.5%, n=419, and 9.0%, n=110) and demonstrated applicability to serial monitoring of patients with breast cancer for trastuzumab cardiotoxicity. Overall, we found automated measurements to be comparable or superior to manual measurements across 11 internal consistency metrics (eg, the correlation of left atrial and ventricular volumes). Finally, we trained convolutional neural networks to detect hypertrophic cardiomyopathy, cardiac amyloidosis, and pulmonary arterial hypertension with C statistics of 0.93, 0.87, and 0.85, respectively. Conclusions: Our pipeline lays the groundwork for using automated interpretation to support serial patient tracking and scalable analysis of millions of echocardiograms archived within healthcare systems.
Article
Full-text available
Echocardiography is essential to modern cardiology. However, human interpretation limits high throughput analysis, limiting echocardiography from reaching its full clinical and research potential for precision medicine. Deep learning is a cutting-edge machine-learning technique that has been useful in analyzing medical images but has not yet been widely applied to echocardiography, partly due to the complexity of echocardiograms' multi view, multi modality format. The essential first step toward comprehensive computer assisted echocardiographic interpretation is determining whether computers can learn to recognize standard views. To this end, we anonymized 834,267 transthoracic echocardiogram (TTE) images from 267 patients (20 to 96 years, 51 percent female, 26 percent obese) seen between 2000 and 2017 and labeled them according to standard views. Images covered a range of real world clinical variation. We built a multilayer convolutional neural network and used supervised learning to simultaneously classify 15 standard views. Eighty percent of data used was randomly chosen for training and 20 percent reserved for validation and testing on never seen echocardiograms. Using multiple images from each clip, the model classified among 12 video views with 97.8 percent overall test accuracy without overfitting. Even on single low resolution images, test accuracy among 15 views was 91.7 percent versus 70.2 to 83.5 percent for board-certified echocardiographers. Confusional matrices, occlusion experiments, and saliency mapping showed that the model finds recognizable similarities among related views and classifies using clinically relevant image features. In conclusion, deep neural networks can classify essential echocardiographic views simultaneously and with high accuracy. Our results provide a foundation for more complex deep learning assisted echocardiographic interpretation.
Article
Skin cancer, the most common human malignancy, is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination. Automated classification of skin lesions using images is a challenging task owing to the fine-grained variability in the appearance of skin lesions. Deep convolutional neural networks (CNNs) show potential for general and highly variable tasks across many fine-grained object categories. Here we demonstrate classification of skin lesions using a single CNN, trained end-to-end from images directly, using only pixels and disease labels as inputs. We train a CNN using a dataset of 129,450 clinical images-two orders of magnitude larger than previous datasets-consisting of 2,032 different diseases. We test its performance against 21 board-certified dermatologists on biopsy-proven clinical images with two critical binary classification use cases: keratinocyte carcinomas versus benign seborrheic keratoses; and malignant melanomas versus benign nevi. The first case represents the identification of the most common cancers, the second represents the identification of the deadliest skin cancer. The CNN achieves performance on par with all tested experts across both tasks, demonstrating an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists. Outfitted with deep neural networks, mobile devices can potentially extend the reach of dermatologists outside of the clinic. It is projected that 6.3 billion smartphone subscriptions will exist by the year 2021 (ref. 13) and can therefore potentially provide low-cost universal access to vital diagnostic care.
  • P Rajpurkar
  • J Irvin
  • K Zhu
Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv: 1711.05225v3 [cs.CV]. 2017 Dec 25.