Luke Oakden-Rayner’s research while affiliated with University of Adelaide and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (41)


Prospective and external validation of stroke discharge planning machine learning models
  • Article

February 2022

·

98 Reads

·

10 Citations

Journal of Clinical Neuroscience

·

Luke Oakden-Rayner

·

David K Menon

·

[...]

·

Simon Koblar

Machine learning may be able to help with predicting factors that aid in discharge planning for stroke patients. This study aims to validate previously derived models, on external and prospective datasets, for the prediction of discharge modified Rankin scale (mRS), discharge destination, survival to discharge and length of stay. Data were collected from consecutive patients admitted with ischaemic or haemorrhagic stroke at the Royal Adelaide Hospital from September 2019 to January 2020, and at the Lyell McEwin Hospital from January 2017 to January 2020. The previously derived models were then applied to these datasets with three pre-defined cut-off scores (high-sensitivity, Youden’s index, and high-specificity) to return indicators of performance including area under the receiver operator curve (AUC), sensitivity and specificity. The number of individuals included in the prospective and external datasets were 334 and 824 respectively. The models performed well on both the prospective and external datasets in the prediction of discharge mRS ≤ 2 (AUC 0.85 and 0.87), discharge destination to home (AUC 0.76 and 0.78) and survival to discharge (AUC 0.91 and 0.92). Accurate prediction of length of stay with only admission data remains difficult (AUC 0.62 and 0.66). This study demonstrates successful prospective and external validation of machine learning models using six variables to predict information relevant to discharge planning for stroke patients. Further research is required to demonstrate patient or system benefits following implementation of these models.


Figure 4 Diverging stacked bar chart depicting the first set of radiologist survey responses. CXR, chest X-ray.
Figure 5 Diverging stacked bar chart visualising the second set of survey responses of the radiologists. AI, artificial intelligence; CXR, chest X-ray.
Demographics and results for the eleven radiologists involved in this study
Assessment of the effect of a comprehensive chest radiograph deep learning model on radiologist reports and patient outcomes: a real-world observational study
  • Article
  • Full-text available

December 2021

·

114 Reads

·

35 Citations

BMJ Open

Objectives Artificial intelligence (AI) algorithms have been developed to detect imaging features on chest X-ray (CXR) with a comprehensive AI model capable of detecting 124 CXR findings being recently developed. The aim of this study was to evaluate the real-world usefulness of the model as a diagnostic assistance device for radiologists. Design This prospective real-world multicentre study involved a group of radiologists using the model in their daily reporting workflow to report consecutive CXRs and recording their feedback on level of agreement with the model findings and whether this significantly affected their reporting. Setting The study took place at radiology clinics and hospitals within a large radiology network in Australia between November and December 2020. Participants Eleven consultant diagnostic radiologists of varying levels of experience participated in this study. Primary and secondary outcome measures Proportion of CXR cases where use of the AI model led to significant material changes to the radiologist report, to patient management, or to imaging recommendations. Additionally, level of agreement between radiologists and the model findings, and radiologist attitudes towards the model were assessed. Results Of 2972 cases reviewed with the model, 92 cases (3.1%) had significant report changes, 43 cases (1.4%) had changed patient management and 29 cases (1.0%) had further imaging recommendations. In terms of agreement with the model, 2569 cases showed complete agreement (86.5%). 390 (13%) cases had one or more findings rejected by the radiologist. There were 16 findings across 13 cases (0.5%) deemed to be missed by the model. Nine out of 10 radiologists felt their accuracy was improved with the model and were more positive towards AI poststudy. Conclusions Use of an AI model in a real-world reporting environment significantly improved radiologist reporting and showed good agreement with radiologists, highlighting the potential for AI diagnostic support to improve clinical practice.

Download

Figure 1 Difference in AUC for detecting simple pneumothorax in the test dataset versus each specific subgroup with adjusted 95% CI. AUC, area under the receiver operating characteristic curve.
Figure 2 Difference in AUC for detecting tension pneumothorax in the test dataset versus each specific subgroup with adjusted 95% CI. AUC, area under the receiver operating characteristic curve. on December 8, 2021 by guest. Protected by copyright.
Demographics of the test dataset
Number of studies with simple or tension pneumothorax within the testing set as well as throughout each subgroup
Difference in AUC values between each specific subgroup and the overall test dataset with 95% adjusted CI
Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography

December 2021

·

139 Reads

·

8 Citations

BMJ Open

Objectives To evaluate the ability of a commercially available comprehensive chest radiography deep convolutional neural network (DCNN) to detect simple and tension pneumothorax, as stratified by the following subgroups: the presence of an intercostal drain; rib, clavicular, scapular or humeral fractures or rib resections; subcutaneous emphysema and erect versus non-erect positioning. The hypothesis was that performance would not differ significantly in each of these subgroups when compared with the overall test dataset. Design A retrospective case–control study was undertaken. Setting Community radiology clinics and hospitals in Australia and the USA. Participants A test dataset of 2557 chest radiography studies was ground-truthed by three subspecialty thoracic radiologists for the presence of simple or tension pneumothorax as well as each subgroup other than positioning. Radiograph positioning was derived from radiographer annotations on the images. Outcome measures DCNN performance for detecting simple and tension pneumothorax was evaluated over the entire test set, as well as within each subgroup, using the area under the receiver operating characteristic curve (AUC). A difference in AUC of more than 0.05 was considered clinically significant. Results When compared with the overall test set, performance of the DCNN for detecting simple and tension pneumothorax was statistically non-inferior in all subgroups. The DCNN had an AUC of 0.981 (0.976–0.986) for detecting simple pneumothorax and 0.997 (0.995–0.999) for detecting tension pneumothorax. Conclusions Hidden stratification has significant implications for potential failures of deep learning when applied in clinical practice. This study demonstrated that a comprehensively trained DCNN can be resilient to hidden stratification in several clinically meaningful subgroups in detecting pneumothorax.


Figure 1: Heat map produced by a post-hoc explanation method for a deep learning model designed to detect pneumonia in chest x-rays Brighter colours (red) indicate regions with higher levels of importance according to the deep neural network, and darker colours (blue) indicate regions with lower levels of importance. Reproduced with permission from Rajpurkar et al. 21 CNN=convolutional neural network.
The false hope of current approaches to explainable artificial intelligence in health care

November 2021

·

1,445 Reads

·

801 Citations

The Lancet Digital Health

The black-box nature of current artificial intelligence (AI) has caused some to question whether AI must be explainable to be used in high-stakes scenarios such as medicine. It has been argued that explainable AI will engender trust with the health-care workforce, provide transparency into the AI decision making process, and potentially mitigate various kinds of bias. In this Viewpoint, we argue that this argument represents a false hope for explainable AI and that current explainability methods are unlikely to achieve these goals for patient-level decision support. We provide an overview of current explainability techniques and highlight how various failure cases can cause problems for decision making for individual patients. In the absence of suitable explainability methods, we advocate for rigorous internal and external validation of AI models as a more direct means of achieving the goals often associated with explainability, and we caution against having explainability be a requirement for clinically deployed models.


Reading Race: AI Recognises Patient's Racial Identity In Medical Images

July 2021

·

858 Reads

·

4 Citations

Background: In medical imaging, prior studies have demonstrated disparate AI performance by race, yet there is no known correlation for race on medical imaging that would be obvious to the human expert interpreting the images. Methods: Using private and public datasets we evaluate: A) performance quantification of deep learning models to detect race from medical images, including the ability of these models to generalize to external environments and across multiple imaging modalities, B) assessment of possible confounding anatomic and phenotype population features, such as disease distribution and body habitus as predictors of race, and C) investigation into the underlying mechanism by which AI models can recognize race. Findings: Standard deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities. Our findings hold under external validation conditions, as well as when models are optimized to perform clinically motivated tasks. We demonstrate this detection is not due to trivial proxies or imaging-related surrogate covariates for race, such as underlying disease distribution. Finally, we show that performance persists over all anatomical regions and frequency spectrum of the images suggesting that mitigation efforts will be challenging and demand further study. Interpretation: We emphasize that model ability to predict self-reported race is itself not the issue of importance. However, our findings that AI can trivially predict self-reported race -- even from corrupted, cropped, and noised medical images -- in a setting where clinical experts cannot, creates an enormous risk for all model deployments in medical imaging: if an AI model secretly used its knowledge of self-reported race to misclassify all Black patients, radiologists would not be able to tell using the same data the model has access to.


Assessing the accuracy of 68Ga‐PSMA PET/CT compared with MRI in the initial diagnosis of prostate malignancy: A cohort analysis of 114 consecutive patients

July 2021

·

14 Reads

·

2 Citations

Journal of Medical Imaging and Radiation Oncology

Introduction Prostate cancer diagnosis is shifting towards a minimally invasive approach, maintaining accuracy and efficacy while reducing morbidity. We aimed to assess if PSMA–Ga68 PET/CT can accurately grade and localise prostatic malignancy using objective methods, compared with pathology and MRI. Methods Retrospective analysis on 114 consecutive patients undergoing staging PSMA PET/CT scans over 12 months was carried out. The SUVmax and site of highest PSMA activity within the prostate gland were recorded. Pathology/biopsy review assessed maximum Gleason score (and location). MRI analysis assessed the highest PIRADS score and location. The grade, location and size of malignant tissue on biopsy, and PSA, were correlated with the SUVmax and the PIRADS score. Results SUVmax was significantly elevated in cases with PSA ≥10 (P = 0.003) and Gleason score ≥8 (P = 0.0002). SUVmax demonstrated equivalent sensitivity to MRI-PIRADS in predicting Gleason ≥8 disease, with higher specificity when tested under a high-specificity regime (SUVmax ≥10, PIRADS = 5, P = 0.002). Furthermore, the region of highest SUVmax was superior to MRI-PIRADS for localising the highest grade tumour region, correctly identifying 71% of highest grade regions compared to 54% with MRI (P = 0.015). Conclusion PSMA PET/CT is as effective as MRI in identifying high-grade prostate malignancy. Our findings also support previous studies in showing a significant relationship between SUVmax and Gleason grade. These benefits, along with the known advantage in identifying distant metastases and the reduced cost, further support the argument that PSMA PET/CT should be offered as an initial investigation in the workup of prostate cancer.


Figure 2: Deep-learning tool interface The clinical findings detected by the deep-learning model are listed on the interface and an image segmentation overlay is presented. The finding likelihood score and CI are displayed as a bar graph under the x-ray. Patient details have been replaced with dummy data.
Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: a retrospective, multireader multicase study

July 2021

·

444 Reads

·

151 Citations

The Lancet Digital Health

Background Chest x-rays are widely used in clinical practice; however, interpretation can be hindered by human error and a lack of experienced thoracic radiologists. Deep learning has the potential to improve the accuracy of chest x-ray interpretation. We therefore aimed to assess the accuracy of radiologists with and without the assistance of a deep-learning model. Methods In this retrospective study, a deep-learning model was trained on 821 681 images (284 649 patients) from five data sets from Australia, Europe, and the USA. 2568 enriched chest x-ray cases from adult patients (≥16 years) who had at least one frontal chest x-ray were included in the test dataset; cases were representative of inpatient, outpatient, and emergency settings. 20 radiologists reviewed cases with and without the assistance of the deep-learning model with a 3-month washout period. We assessed the change in accuracy of chest x-ray interpretation across 127 clinical findings when the deep-learning model was used as a decision support by calculating area under the receiver operating characteristic curve (AUC) for each radiologist with and without the deep-learning model. We also compared AUCs for the model alone with those of unassisted radiologists. If the lower bound of the adjusted 95% CI of the difference in AUC between the model and the unassisted radiologists was more than −0·05, the model was considered to be non-inferior for that finding. If the lower bound exceeded 0, the model was considered to be superior. Findings Unassisted radiologists had a macroaveraged AUC of 0·713 (95% CI 0·645–0·785) across the 127 clinical findings, compared with 0·808 (0·763–0·839) when assisted by the model. The deep-learning model statistically significantly improved the classification accuracy of radiologists for 102 (80%) of 127 clinical findings, was statistically non-inferior for 19 (15%) findings, and no findings showed a decrease in accuracy when radiologists used the deep-learning model. Unassisted radiologists had a macroaveraged mean AUC of 0·713 (0·645–0·785) across all findings, compared with 0·957 (0·954–0·959) for the model alone. Model classification alone was significantly more accurate than unassisted radiologists for 117 (94%) of 124 clinical findings predicted by the model and was non-inferior to unassisted radiologists for all other clinical findings. Interpretation This study shows the potential of a comprehensive deep-learning model to improve chest x-ray interpretation across a large breadth of clinical practice. Funding Annalise.ai.


Chest radiographs and machine learning – Past, present and future

June 2021

·

68 Reads

·

20 Citations

Journal of Medical Imaging and Radiation Oncology

Despite its simple acquisition technique, the chest X‐ray remains the most common first‐line imaging tool for chest assessment globally. Recent evidence for image analysis using modern machine learning points to possible improvements in both the efficiency and the accuracy of chest X‐ray interpretation. While promising, these machine learning algorithms have not provided comprehensive assessment of findings in an image and do not account for clinical history or other relevant clinical information. However, the rapid evolution in technology and evidence base for its use suggests that the next generation of comprehensive, well‐tested machine learning algorithms will be a revolution akin to early advances in X‐ray technology. Current use cases, strengths, limitations and applications of chest X‐ray machine learning systems are discussed.


Fig. 1. Dataset flow diagram. 1. Excluding assessment mammograms. 2. We use 'round' to refer to one episode of screening, consisting of at least four standard views ('CC' and 'MLO' for each breast). 3. Stratified client-wise by dominant finding.
Fig. 4. Image differences: BSSA images (left) and NYU images (right) not to scale. Two pairs of MLO views showing subtle differences in image contrast and opacity which may affect model generalisability.
Fig. S5. Different image sizes to scale. Left: NYU image -4096 x 3328 pixels. Right: BSSA image -5355 x 4915 pixels
AUROC for malignancy differentiation. NYU1: image-only models. NYU2: images and benign and malignant heatmaps as model input, pictured in Figure 2 and described by Wu et al.(2)
Replication of an open-access deep learning system for screening mammography: Reduced performance mitigated by retraining on local data

June 2021

·

130 Reads

·

5 Citations

Aim To assess the generalisability of a deep learning (DL) system for screening mammography developed at New York University (NYU), USA (1,2) in a South Australian (SA) dataset. Methods and Materials Clients with pathology-proven lesions (n=3,160) and age-matched controls (n=3,240) were selected from women screened at BreastScreen SA from January 2010 to December 2016 (n clients=207,691) and split into training, validation and test subsets (70\%, 15\%, 15\% respectively). The primary outcome was area under the curve (AUC), in the SA Test Set 1 (SATS1), differentiating invasive breast cancer or ductal carcinoma in situ (n=469) from age-matched controls (n=490) and benign lesions (n=44). The NYU system was tested statically, after training without transfer learning (TL), after retraining with TL and without (NYU1) and with (NYU2) heatmaps. Results The static NYU1 model AUCs in the NYU test set (NYTS) and SATS1 were 83.0\%(95\%CI=82.4\%-83.6\%)(2) and 75.8\%(95\%CI=72.6\%-78.8\%), respectively. Static NYU2 AUCs in the NYTS and SATS1 were 88.6\%(95\%CI=88.3\%-88.9\%)(2) and 84.5\%(95\%CI=81.9\%-86.8\%), respectively. Training of NYU1 and NYU2 without TL achieved AUCs in the SATS1 of 65.8\% (95\%CI=62.2\%-69.1\%) and 85.9\%(95\%CI=83.5\%-88.2\%), respectively. Retraining of NYU1 and NYU2 with TL resulted in AUCs of 82.4\%(95\%CI=79.7-84.9\%) and 86.3\%(95\%CI=84.0-88.5\%) respectively. Conclusion We did not fully reproduce the reported performance of NYU on a local dataset; local retraining with TL approximated this level of performance. Optimising models for local clinical environments may improve performance. The generalisation of DL systems to new environments may be challenging.


A survey of clinicians on the use of artificial intelligence in ophthalmology, dermatology, radiology and radiation oncology

March 2021

·

526 Reads

·

167 Citations

Artificial intelligence technology has advanced rapidly in recent years and has the potential to improve healthcare outcomes. However, technology uptake will be largely driven by clinicians, and there is a paucity of data regarding the attitude that clinicians have to this new technology. In June–August 2019 we conducted an online survey of fellows and trainees of three specialty colleges (ophthalmology, radiology/radiation oncology, dermatology) in Australia and New Zealand on artificial intelligence. There were 632 complete responses (n = 305, 230, and 97, respectively), equating to a response rate of 20.4%, 5.1%, and 13.2% for the above colleges, respectively. The majority (n = 449, 71.0%) believed artificial intelligence would improve their field of medicine, and that medical workforce needs would be impacted by the technology within the next decade (n = 542, 85.8%). Improved disease screening and streamlining of monotonous tasks were identified as key benefits of artificial intelligence. The divestment of healthcare to technology companies and medical liability implications were the greatest concerns. Education was identified as a priority to prepare clinicians for the implementation of artificial intelligence in healthcare. This survey highlights parallels between the perceptions of different clinician groups in Australia and New Zealand about artificial intelligence in medicine. Artificial intelligence was recognized as valuable technology that will have wide-ranging impacts on healthcare.


Citations (36)


... Research using historical data and ML to predict discharge destinations [31][32][33][34], discharge timing [35,36], and LOS [37,38]. For instance, Elbattah et al. [39] use ML models to predict LOS and discharge destinations for patients with hip fractures. ...

Reference:

Developing a decision support tool to predict delayed discharge from hospitals using machine learning
Prospective and external validation of stroke discharge planning machine learning models
  • Citing Article
  • February 2022

Journal of Clinical Neuroscience

... AI systems, utilizing machine learning algorithms, are increasingly being deployed to analyze patient data for more accurate pain assessment and personalized pain management strategies. These AI-powered tools can help predict a patient's pain tolerance levels and susceptibility to drug abuse, enabling clinicians to tailor pain management protocols more effectively 55 . This is crucial for high complexity cardiac pain symptoms in the ICU. ...

Assessment of the effect of a comprehensive chest radiograph deep learning model on radiologist reports and patient outcomes: a real-world observational study

BMJ Open

... The metallic density of ECG leads could be hindering ETT and CVC observation, so we recommend their removal before imaging to potentially enhance AI performance. Training models to accommodate specific hidden stratifications, as demonstrated by a previous study, could also be of help [29]. ...

Do comprehensive deep learning algorithms suffer from hidden stratification? A retrospective study on pneumothorax detection in chest radiography

BMJ Open

... The use of AI in education involves extensive collection of students' personal, behavioral, and performance data. Generative AI models often require vast datasets for effective training, which can lead to privacy breaches if the data are mismanaged or if their sources lack transparency (Ghassemi et al., 2021). ...

The false hope of current approaches to explainable artificial intelligence in health care

The Lancet Digital Health

... This learning mechanism means that CNNs, and thus clinical oncology AI tools that rely on CNNs, are susceptible to bias based on the data they are trained on, and how they are implemented. It has been shown that DL models perform very well on race identification of heavily corrupted and noise-injected X-ray and Computed Tomography (CT) scans, even when not trained for such a task [19]. The same task is considered impossible for human experts to accurately predict. ...

Reading Race: AI Recognises Patient's Racial Identity In Medical Images

... A recent study by Rowe et al. found a moderate correlation between SUV and tumor Gleason grade [34]. Paterson et al. showed a significant relationship between SUVmax, PSA levels, and GS grades [35]. An Italian team evaluated 45 patients who underwent a 68 Ga-PSMA-11 PET/ CT-guided biopsy and found that lesions with a GS score of ≥ 7 had a higher SUVmax than lesions with a GS score of ≤ 6 [36]. ...

Assessing the accuracy of 68Ga‐PSMA PET/CT compared with MRI in the initial diagnosis of prostate malignancy: A cohort analysis of 114 consecutive patients
  • Citing Article
  • July 2021

Journal of Medical Imaging and Radiation Oncology

... Since its publication [28], the dataset has been downloaded 3504 times from BIMCV, from 50 different countries. Its users include private and public health organizations and academic publications making using it [29,30]. ...

Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: a retrospective, multireader multicase study

The Lancet Digital Health

... CXR is an imaging modality that utilizes X-rays that are absorbed by the structure of interest which causes attenuation of each ray and then exposes a film with different pixel values depending on that attenuation (see Figures 1,2). This results in a two-dimensional representation of a three-dimensional anatomical structure (7). Due to its simplicity and inexpensive nature, it is the most common initial modality used globally (7). ...

Chest radiographs and machine learning – Past, present and future

Journal of Medical Imaging and Radiation Oncology

... Datasets containing a sufficiently large number of validated images manually labeled by radiologists as ground truth have become a precious asset for developers to train and test the performance of their models [50]. However, the usage of large public datasets poses challenges due to the heterogeneity of demographic characteristics and scanning devices, possibly hampering their performance in local facilities [51]. Conversely, a model trained to perform effectively on images acquired in a single institution for a specific patient demographic might not fare well when tested on a large public dataset or on other external datasets [52]. ...

Replication of an open-access deep learning system for screening mammography: Reduced performance mitigated by retraining on local data

... To address reported knowledge deficits and resultant distrust toward AI, our findings indicate that quality education and training is imperative for developing a skilled workforce to enable AI implementation in AH practices, in line with other research. [57][58][59] Efforts should target increasing general AI knowledge and improve the technical explainability of each AI tool. 6,44 An earlier phase of our study found that AHPs had little to no knowledge, training, or firsthand experience with AI and that efforts to improve this should be tailored to individual AH professional groups. ...

A survey of clinicians on the use of artificial intelligence in ophthalmology, dermatology, radiology and radiation oncology