Article

Recommended Requirements and Essential Elements for Proper Reporting of the Use of Artificial Intelligence Machine Learning Tools in Biomedical Research and Scientific Publications

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The use of ML algorithms, which can analyze large datasets, has garnered significant attention in online mental healthcare with several models now being proposed in this field including: probabilistic latent variable model, linear ML model, random forest model, Latent Dirichlet Allocation (LDA) topic models, elastic net model, inductive logic programming, decision tree model, support vector machine, deep learning (DL), and artificial neural networks (ANN) (27)(28)(29)(30)(31)(32)(33). However, several concerns have been raised in terms of the validity, generalizability, and reliability of the results obtained using ML, which can be impacted by insufficient or not representative training datasets, improper model fitting or hyperparameter finetuning, improper handling of training data sets resulting in data leakage, lack of validation and reproducibility assessments, among others (34)(35)(36)(37). Therefore, to support the robustness of studies using ML algorithms, recent guidelines have proposed six essential elements: justification of the need to use ML, adequacy of the data, description of the algorithm used, results including model accuracy and calibration, availability of the programming code, and discussion of the model's internal and external validation (34)(35)(36)(37). ...
... However, several concerns have been raised in terms of the validity, generalizability, and reliability of the results obtained using ML, which can be impacted by insufficient or not representative training datasets, improper model fitting or hyperparameter finetuning, improper handling of training data sets resulting in data leakage, lack of validation and reproducibility assessments, among others (34)(35)(36)(37). Therefore, to support the robustness of studies using ML algorithms, recent guidelines have proposed six essential elements: justification of the need to use ML, adequacy of the data, description of the algorithm used, results including model accuracy and calibration, availability of the programming code, and discussion of the model's internal and external validation (34)(35)(36)(37). ...
... We focused on the extraction of these outcome measures to avoid underpowered meta-analytic comparisons. For studies reporting on the results of large dataset analysis using ML algorithms, data extraction followed the recent research guidelines and standards for ML studies, which recommend that these studies report on the adequacy of the data for the intended outcomes, model training and fine-tuning, features analyzed, validation, interpretability, and code and data availability (34)(35)(36)(37). Additionally, we extracted the clinical and practical insights and implications that the studies obtained and discussed regarding the implementation of ML. ...
Article
Full-text available
Introduction Online mental healthcare has gained significant attention due to its effectiveness, accessibility, and scalability in the management of mental health symptoms. Despite these advantages over traditional in-person formats, including higher availability and accessibility, issues with low treatment adherence and high dropout rates persist. Artificial intelligence (AI) technologies could help address these issues, through powerful predictive models, language analysis, and intelligent dialogue with users, however the study of these applications remains underexplored. The following mixed methods review aimed to supplement this gap by synthesizing the available evidence on the applications of AI in online mental healthcare. Method We searched the following databases: MEDLINE, CINAHL, PsycINFO, EMBASE, and Cochrane. This review included peer-reviewed randomized controlled trials, observational studies, non-randomized experimental studies, and case studies that were selected using the PRISMA guidelines. Data regarding pre and post-intervention outcomes and AI applications were extracted and analyzed. A mixed-methods approach encompassing meta-analysis and network meta-analysis was used to analyze pre and post-intervention outcomes, including main effects, depression, anxiety, and study dropouts. We applied the Cochrane risk of bias tool and the Grading of Recommendations Assessment, Development and Evaluation (GRADE) to assess the quality of the evidence. Results Twenty-nine studies were included revealing a variety of AI applications including triage, psychotherapy delivery, treatment monitoring, therapy engagement support, identification of effective therapy features, and prediction of treatment response, dropout, and adherence. AI-delivered self-guided interventions demonstrated medium to large effects on managing mental health symptoms, with dropout rates comparable to non-AI interventions. The quality of the data was low to very low. Discussion The review supported the use of AI in enhancing treatment response, adherence, and improvements in online mental healthcare. Nevertheless, given the low quality of the available evidence, this study highlighted the need for additional robust and high-powered studies in this emerging field. Systematic review registration https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=443575, identifier CRD42023443575.
Article
Full-text available
Purpose: The purpose of this study was to use unsupervised machine learning clustering to define the "optimal observed outcome" after surgery for anterior shoulder instability (ASI) and to identify predictors for achieving it. Methods: Medical records, images, and operative reports were reviewed for patients <40 years old undergoing surgery for ASI. Four unsupervised machine learning clustering algorithms partitioned subjects into "optimal observed outcome" or "suboptimal outcome" based on combinations of actually observed outcomes. Demographic, clinical, and treatment variables were compared between groups using descriptive statistics and Kaplan-Meier survival curves. Variables were assessed for prognostic value through multivariate stepwise logistic regression. Results: Two hundred patients with a mean follow-up of 11 years were included. Of these, 146 (64%) obtained the "optimal observed outcome," characterized by decreased: postoperative pain (23% vs 52%; P < 0.001), recurrent instability (12% vs 41%; P < 0.001), revision surgery (10% vs 24%; P = 0.015), osteoarthritis (OA) (5% vs 19%; P = 0.005), and restricted motion (161° vs 168°; P = 0.001). Forty-one percent of patients had a "perfect outcome," defined as ideal performance across all outcomes. Time from initial instability to presentation (odds ratio [OR] = 0.96; 95% confidence interval [CI], 0.92-0.98; P = 0.006) and habitual/voluntary instability (OR = 0.17; 95% CI, 0.04-0.77; P = 0.020) were negative predictors of achieving the "optimal observed outcome." A predilection toward subluxations rather than dislocations before surgery (OR = 1.30; 95% CI, 1.02-1.65; P = 0.030) was a positive predictor. Type of surgery performed was not a significant predictor. Conclusion: After surgery for ASI, 64% of patients achieved the "optimal observed outcome" defined as minimal postoperative pain, no recurrent instability or OA, low revision surgery rates, and increased range of motion, of whom only 41% achieved a "perfect outcome." Positive predictors were shorter time to presentation and predilection toward preoperative subluxations over dislocations. Level of evidence: Retrospective cohort, level IV.
Article
Full-text available
The use of artificial intelligence (AI) is rapidly growing across many domains, of which the medical field is no exception. AI is an umbrella term defining the practical application of algorithms to generate useful output, without the need of human cognition. Owing to the expanding volume of patient information collected, known as ‘big data’, AI is showing promise as a useful tool in healthcare research and across all aspects of patient care pathways. Practical applications in orthopaedic surgery include: diagnostics, such as fracture recognition and tumour detection; predictive models of clinical and patient-reported outcome measures, such as calculating mortality rates and length of hospital stay; and real-time rehabilitation monitoring and surgical training. However, clinicians should remain cognizant of AI’s limitations, as the development of robust reporting and validation frameworks is of paramount importance to prevent avoidable errors and biases. The aim of this review article is to provide a comprehensive understanding of AI and its subfields, as well as to delineate its existing clinical applications in trauma and orthopaedic surgery. Furthermore, this narrative review expands upon the limitations of AI and future direction. Cite this article: Bone Joint Res 2023;12(7):447–454.
Article
Full-text available
Purpose: This systematic review aimed to (1) determine the model performance of artificial intelligence (AI) in detecting rotator cuff pathology using different imaging modalities and (2) to compare capability with physicians in clinical scenarios. Methods: The review followed the PRISMA guidelines and was registered on PROSPERO. The criteria were as follows: (1) studies on the application of AI in detecting rotator cuff pathology using medical images, and (2) studies on smart devices for assisting in diagnosis were excluded. The following data were extracted and recorded: statistical characteristics, input features, AI algorithms used, sample sizes of training and testing sets, and model performance. The data extracted from the included studies were narratively reviewed. Results: A total of 14 articles, comprising 23,119 patients, met the inclusion and exclusion criteria. The pooled mean age of the patients was 56.7 years, and the female rate was 56.1%. The area under the curve (AUC) of the algorithmic model to detect rotator cuff pathology from ultrasound images, MRI images, and radiographic series ranged from 0.789 to 0.950, 0.844 to 0.943, and 0.820 to 0.830, respectively. Notably, 1 of the studies reported that AI models based on ultrasound images demonstrated a diagnostic performance similar to that of radiologists. Another comparative study demonstrated that AI models utilizing MRI images exhibited greater accuracy and specificity compared to orthopedic surgeons in the diagnosis of rotator cuff pathology, albeit not in sensitivity. Conclusion: The detection of rotator cuff pathology has been significantly aided by the exceptional performance of AI models. In particular, these models are equally adept as musculoskeletal radiologists in utilizing ultrasound to diagnose rotator cuff pathology. Furthermore, AI models exhibit statistically superior levels of accuracy and specificity when utilizing MRI to diagnose rotator cuff pathology, albeit with no marked difference in sensitivity, in comparison to orthopaedic surgeons.
Article
Full-text available
Purpose: The purpose of this study was to evaluate the use of an AI conversational agent during the postoperative recovery of patients undergoing elective hip arthroscopy. Methods: Patients undergoing hip arthroscopy were enrolled in a prospective cohort for their first 6 weeks following surgery. Patients used standard SMS text messaging to interact with an artificial intelligence (AI) chatbot ("Felix") used to initiate automated conversations regarding elements of postoperative recovery. Patient satisfaction was measured at 6 weeks after surgery using a Likert scale survey. Accuracy was determined by measuring the appropriateness of chatbot responses, topic recognition, and examples of confusion. Safety was measured by evaluating the chatbot's responses to any questions with potential medical urgency. Results: Twenty-six patients were enrolled with a mean age of 36 years, and 58% (n = 15) were male. Overall, 80% of patients (n = 20) rated the helpfulness of Felix as good or excellent. In the postoperative period, 12/25 (48%) patients reported being worried about a complication but were reassured by Felix and, thus, did not seek medical attention. Of a total of 128 independent patient questions, Felix handled 101/128 questions appropriately (79%), either by addressing them independently, or facilitating contact with the care team. Felix was able to adequately answer the patient question independently 31% of the time (n = 40/128). Of 10 patient questions that were thought to potentially represent patient complications, in 3 cases Felix did not adequately address or recognize the health concern-none of these situations resulted in patient harm. Conclusion: The results of this study demonstrate that the use of a chatbot or conversational agent can enhance the postoperative experience for hip arthroscopy patients, as demonstrated by high levels of patient satisfaction. Levels of evidence: Level IV, therapeutic case series.
Article
Full-text available
Total knee arthroplasty (TKA) is widely used in clinical practice as an effective treatment for end-stage knee joint lesions. It can effectively correct joint deformities, relieve painful symptoms, and improve joint function. The reconstruction of lower extremity joint lines and soft tissue balance are important factors related to the durability of the implant; therefore, it is especially important to measure the joint lines and associated angles before TKA. In this article, we review the technological progress in the preoperative measurement of TKA.
Article
Full-text available
The practice of medicine is rapidly transforming as a result of technological breakthroughs. Artificial intelligence (AI) systems are becoming more and more relevant in medicine and orthopaedic surgery as a result of the nearly exponential growth in computer processing power, cloud based computing, and development, and refining of medical-task specific software algorithms. Because of the extensive role of technologies such as medical imaging that bring high sensitivity, specificity, and positive/negative prognostic value to management of orthopaedic disorders, the field is particularly ripe for the application of machine-based integration of imaging studies, among other applications. Through this review, we seek to promote awareness in the orthopaedics community of the current accomplishments and projected uses of AI and ML as described in the literature. We summarize the current state of the art in the use of ML and AI in five key orthopaedic disciplines: joint reconstruction, spine, orthopaedic oncology, trauma, and sports medicine.
Article
Full-text available
Artificial intelligence (AI) in medicine is a rapidly growing field. In orthopedics, the clinical implementations of AI have not yet reached their full potential. Deep learning algorithms have shown promising results in computed radiographs for fracture detection, classification of OA, bone age, as well as automated measurements of the lower extremities. Studies investigating the performance of AI compared to trained human readers often show equal or better results, although human validation is indispensable at the current standards. The objective of this narrative review is to give an overview of AI in medicine and summarize the current applications of AI in orthopedic radiography imaging. Due to the different AI software and study design, it is difficult to find a clear structure in this field. To produce more homogeneous studies, open-source access to AI software codes and a consensus on study design should be aimed for.
Article
Full-text available
Objective: To explore a new artificial intelligence (AI)-aided method to assist the clinical diagnosis of femoral intertrochanteric fracture (FIF), and further compare the performance with human level to confirm the effect and feasibility of the AI algorithm. Methods: 700 X-rays of FIF were collected and labeled by two senior orthopedic physicians to set up the database, 643 for the training database and 57 for the test database. A Faster-RCNN algorithm was applied to be trained and detect the FIF on X-rays. The performance of the AI algorithm such as accuracy, sensitivity, miss diagnosis rate, specificity, misdiagnosis rate, and time consumption was calculated and compared with that of orthopedic attending physicians. Results: Compared with orthopedic attending physicians, the Faster-RCNN algorithm performed better in accuracy (0.88 vs. 0.84 ± 0.04), specificity (0.87 vs. 0.71 ± 0.08), misdiagnosis rate (0.13 vs. 0.29 ± 0.08), and time consumption (5 min vs. 18.20 ± 1.92 min). As for the sensitivity and missed diagnosis rate, there was no statistical difference between the AI and orthopedic attending physicians (0.89 vs. 0.87 ± 0.03 and 0.11 vs. 0.13 ± 0.03). Conclusion: The AI diagnostic algorithm is an available and effective method for the clinical diagnosis of FIF. It could serve as a satisfying clinical assistant for orthopedic physicians.
Article
Full-text available
Purpose The purpose of this study is to calculate the diagnostic accuracy from the confusion matrix using deep learning (DL) on ultrasound (US) images of Palmer 1B Triangular fibrocartilage complex (TFCC) injury. Methods Twenty-nine wrists of 15 healthy volunteers (11 men; mean age, 34.9 years ± 9.7) (control group) and 20 wrists of 17 patients (11 men; mean age 41.0 years ± 12.2) with TFCC injury (Palmer type IB) (injury group) were included in the study. The diagnosis of Palmer 1B TFCC injury was made using MRI, CT arthrography, and intraoperative arthroscopic findings. 2000 images were provided to each group, 80% of which were randomly selected by AI and used as training data; the remaining data were used as test data. Transfer learning was conducted using a pretrained three separate models (GoogLeNet, ResNet50, ResNet101). Model evaluation was performed using a confusion matrix. The area under a receiver operating characteristic (ROC) curve (AUC) was also calculated. The occlusion sensitivity was used to visualize the important features. Results For the prediction of TFCC injury by the DL model, the best score of accuracy was 0.85 in GoogLeNet, a recall was 1.0 in ResNet50 and ResNet101, and a specificity was 0.78 in GoogLeNet. In predicting the TFCC injury for the test data, the best score of the AUC was 0.97 on ResNet101. Visualization of important features showed that AI predicted the presence of injury by focusing on the morphology of the articular disc. Conclusions US images using the DL model predicted Palmer 1B TFCC injury with high accuracy, with the best scores of 0.85 for accuracy on GoogLeNet, 1.00 for sensitivity on ResNet50 and ResNet101, and 0.78 for specificity on GoogLeNet. The use of DL for US imaging of Palmer 1B TFCC injury predicted the injury as well as MRI and CTA. Level of Evidence IV; A retrospective case series study
Article
Full-text available
The improved treatment of knee injuries critically relies on having an accurate and cost-effective detection. In recent years, deep-learning-based approaches have monopolized knee injury detection in MRI studies. The aim of this paper is to present the findings of a systematic literature review of knee (anterior cruciate ligament, meniscus, and cartilage) injury detection papers using deep learning. The systematic review was carried out following the PRISMA guidelines on several databases, including PubMed, Cochrane Library, EMBASE, and Google Scholar. Appropriate metrics were chosen to interpret the results. The prediction accuracy of the deep-learning models for the identification of knee injuries ranged from 72.5–100%. Deep learning has the potential to act at par with human-level performance in decision-making tasks related to the MRI-based diagnosis of knee injuries. The limitations of the present deep-learning approaches include data imbalance, model generalizability across different centers, verification bias, lack of related classification studies with more than two classes, and ground-truth subjectivity. There are several possible avenues of further exploration of deep learning for improving MRI-based knee injury diagnosis. Explainability and lightweightness of the deployed deep-learning systems are expected to become crucial enablers for their widespread use in clinical practice.
Article
Full-text available
There is a growing interest in the application of artificial intelligence (AI) to orthopaedic surgery. This review aims to identify and characterise research in this field, in order to understand the extent, range and nature of this work, and act as springboard to stimulate future studies. A scoping review, a form of structured evidence synthesis, was conducted to summarise the use of AI in orthopaedics. A literature search (1946–2019) identified 222 studies eligible for inclusion. These studies were predominantly small and retrospective. There has been significant growth in the number of papers published in the last three years, mainly from the USA (37%). The majority of research used AI for image interpretation (45%) or as a clinical decision tool (25%). Spine (43%), knee (23%) and hip (14%) were the regions of the body most commonly studied. The application of artificial intelligence to orthopaedics is growing. However, the scope of its use so far remains limited, both in terms of its possible clinical applications, and the sub-specialty areas of the body which have been studied. A standardized method of reporting AI studies would allow direct assessment and comparison. Prospective studies are required to validate AI tools for clinical use.
Article
Full-text available
Most surgeons are skeptical as to the feasibility of autonomous actions in surgery. Interestingly, many examples of autonomous actions already exist and have been around for years. Since the beginning of this millennium, the field of artificial intelligence (AI) has grown exponentially with the development of machine learning (ML), deep learning (DL), computer vision (CV) and natural language processing (NLP). All of these facets of AI will be fundamental to the development of more autonomous actions in surgery, unfortunately, only a limited number of surgeons have or seek expertise in this rapidly evolving field. As opposed to AI in medicine, AI surgery (AIS) involves autonomous movements. Fortuitously, as the field of robotics in surgery has improved, more surgeons are becoming interested in technology and the potential of autonomous actions in procedures such as interventional radiology, endoscopy and surgery. The lack of haptics, or the sensation of touch, has hindered the wider adoption of robotics by many surgeons; however, now that the true potential of robotics can be comprehended, the embracing of AI by the surgical community is more important than ever before. Although current complete surgical systems are mainly only examples of tele-manipulation, for surgeons to get to more autonomously functioning robots, haptics is perhaps not the most important aspect. If the goal is for robots to ultimately become more and more independent, perhaps research should not focus on the concept of haptics as it is perceived by humans, and the focus should be on haptics as it is perceived by robots/computers. This article will discuss aspects of ML, DL, CV and NLP as they pertain to the modern practice of surgery, with a focus on current AI issues and advances that will enable us to get to more autonomous actions in surgery. Ultimately, there may be a paradigm shift that needs to occur in the surgical community as more surgeons with expertise in AI may be needed to fully unlock the potential of AIS in a safe, efficacious and timely manner.
Article
Purpose: The purpose of this study was to develop a deep learning model to accurately detect anterior cruciate ligament (ACL) ruptures on magnetic resonance imaging (MRI) and evaluate its effect on the diagnostic accuracy and efficiency of clinicians. Methods: A training dataset was built from MRIs acquired from January 2017 to June 2021, including patients with knee symptoms, irrespective of ACL ruptures. An external validation dataset was built from MRIs acquired from January 2021 to June 2022, including patients who underwent knee arthroscopy or arthroplasty. Patients with fractures or prior knee surgeries were excluded in both datasets. Subsequently, a deep learning model was developed and validated using these datasets. Clinicians of varying expertise levels in sports medicine and radiology were recruited and their capacities in diagnosing ACL injuries in terms of accuracy and diagnosing time were evaluated both with and without artificial intelligence (AI) assistance. Results: A deep learning model was developed based on the training dataset of 22767 MRIs from 5 centers, and verified with external validation dataset of 4086 MRIs from 6 centers. The model achieved an area under the receiver operating characteristic curve of 0.980 and a sensitivity and specificity of 95.1%. Thirty-eight clinicians from 25 centers were recruited to diagnose 3800 MRIs. The AI assistance significantly improved the accuracy of all clinicians, exceeding 96%. Additionally, a notable reduction in diagnostic time was observed. The most significant improvements in accuracy and time efficiency were observed in the trainee groups, suggesting that AI support is particularly beneficial for clinicians with moderately limited diagnostic expertise. Conclusions: This deep learning model demonstrated expert-level diagnostic performance for ACL ruptures, serving as a valuable tool to assist clinicians of various specialties and experience levels in making accurate and efficient diagnoses.
Article
Purpose: The purpose of this study was to analyse the quality and readability of information regarding shoulder stabilisation surgery available using an online AI software (ChatGPT), using standardised scoring systems, as well as to report on the given answers by the AI. Methods: An open AI model (ChatGPT) was used to answer 23 commonly asked questions from patients on shoulder stabilization surgery. These answers were evaluated for medical accuracy, quality and readability using The JAMA Benchmark criteria, DISCERN score, Flesch-Kincaid Reading Ease Score (FRES) & Grade Level (FKGL). Results: The JAMA Benchmark criteria score was 0, which is the lowest score indicating no reliable resources cited. The DISCERN score was 60, which is considered a good score. The areas that open AI model did not achieve full marks were also related to the lack of available source material used to compile the answers, and finally some shortcomings with information not fully supported by the literature. The FRES was 26.2, and the FKGL was considered to be that of a college graduate. Conclusion: There was generally high quality in the answers given on questions relating to shoulder stabilization surgery, but there was a high reading level required to comprehend the information presented. However, but it is unclear where the answers came from with no source material cited. Although, it is important to note that the ChatGPT software repeatedly reference the need to discuss these questions with an orthopaedic surgeon and the importance of shared discussion making, as well as compliance with surgeon treatment recommendations.
Article
As the implementation of artificial intelligence in orthopedic surgery research flourishes, so grows the need for responsible use. Related research requires clear reporting of algorithmic error rates. Recent studies show that preoperative opioid use, male sex, and greater body mass index are risk factors for extended, postoperative opioid use, but may result in high false positive rates. Thus, to be applied clinically when screening patients, these tools require physician and patient input, and nuanced interpretation, as the utility of these screening tools diminish without providers interpreting and acting on the information. Machine learning and artificial intelligence should be viewed as tools that can facilitate these human conversations among patients, orthopedic surgeons, and health care providers.
Article
Machine learning (ML) has become an increasingly common statistical methodology in medical research. In recent years, ML techniques have been used with greater frequency to evaluate orthopaedic data. ML allows for the creation of adaptive predictive models that can be applied to clinical patient outcomes. However, ML models for predicting clinical or safety outcomes may be made available online so that physicians may apply these models to their patients to make predictions. If the algorithms have not been externally validated, then the models are not likely to generalize, and their predictions will suffer from inaccuracy. This is especially important to bear in mind because the recent increase in ML papers in the medical literature includes publications with fundamental flaws.
Article
Purpose: To develop a predictive machine learning model to identify prognostic factors for continued opioid prescriptions after arthroscopic meniscus surgery. Methods: Patients undergoing arthroscopic meniscal surgery, such as meniscus debridement, repair, or revision at a single institution from 2013 to 2017 were retrospectively followed up to 1 year postoperatively. Procedural details were recorded, including concomitant procedures, primary versus revision, and whether a partial debridement or a repair was performed. Intraoperative arthritis severity was measured using the Outerbridge Classification. The number of opioid prescriptions in each month were recorded. Primary analysis used was multivariate Cox-Regression model. We then created a naïve Bayesian model, a machine learning classifier that utilizes Bayes' theorem with an assumption of independence between variables. Results: A total of 581 patients were reviewed. Postoperative opioid refills occurred in 98 patients (16.9%). Using multivariate logistic modeling, independent risk factors for opioid refills included male sex, larger BMI, chronic preoperative opioid use while meniscus resection demonstrated decreased likelihood of refills. Concomitant procedures, revision procedures, and presence of arthritis graded by the Outerbridge classification were not significant predictors of postoperative opioid refills. The Naïve Bayesian model for extended postoperative opioid use demonstrated good fit with our cohort with an area under the curve of 0.79, sensitivity of 94.5%, positive predictive value (PPV) of 83%, and a detection rate of 78.2%. The two most important features in the model were preoperative opioid use and male sex. Conclusion: After arthroscopic meniscus surgery, preoperative opioid consumption and male sex were the most significant predictors for sustained opioid use beyond 1 month postoperatively. Intraoperative arthritis was not an independent risk factor for continued refills. A machine learning algorithm performed with high accuracy, although with a high false positive rate, to function as a screening tool to identify patients filling additional narcotic prescriptions after surgery.
Article
Machine learning, a subset of artificial intelligence, has become increasingly common in the analysis of orthopaedic data. The resources needed to utilize machine-learning approaches for data analysis have become increasingly accessible to researchers, contributing to a recent influx of research using these techniques. As machine learning becomes increasingly available, misapplication owing to a lack of competence becomes more common. Sensationalized titles, misused vernacular, and a failure to fully vet machine learning–derived algorithms are just a few issues that warrant attention. As the orthopaedic community’s knowledge on this topic grows, the flaws in our understanding of this field will likely become apparent, allowing for rectification and ultimately improvement of how machine learning is utilized in research.
Article
Accurate diagnosis of the etiology of ulnar-sided wrist pain and injury to the triangular fibrocartilage complex, particularly Palmar 1B tears, can prove to be challenging. Multiple peer-reviewed studies have demonstrated that accurate diagnosis and treatment of tears of the triangular fibrocartilage complex through nonoperative and operative means, including arthroscopy, can result in improved patient outcomes and function. One of the keys to successful treatment, however, is accurate diagnosis. While our current imaging modalities help to provide additional data for the assessment of this pathology, magnetic resonance imaging and computed tomography scans have limitations. Thus, employing the power of artificial intelligence and deep learning to ultrasound assessment of this injury is appealing. Efficient integration of this technology into daily practice has potential to bolster diagnostics not only in large medical centers but also in underserved areas with limited access to magnetic resonance imaging and computed tomography.
Article
Purpose: This study aimed to develop machine learning models to predict hospital admission (overnight stay) as well as short-term complications and readmission rates following ACLR. Furthermore, we sought to compare the ML models with logistic regression models in predicting ACLR outcomes. Methods: The American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database was queried for patients who underwent elective ACLR from 2012-2018. Artificial neural network (ANN) ML and logistic regression models were developed to predict overnight stay, 30-day postoperative complications, and ACL-related readmission, and model performance was compared using the area under the ROC curve (AUC). Regression analyses were used to identify variables that were significantly associated with the predicted outcomes. Results: A total of 21,636 elective ACLR cases met inclusion criteria. Variables associated with hospital admission include white race, obesity, hypertension, and ASA classification 3 and greater, anesthesia other than general, prolonged operative time, and inpatient setting. The incidence of hospital admission (overnight stay) was 10.2%, 30-day complications was 1.3%, and 30-day readmission for ACLR-related causes was 0.9%. Compared to logistic regression models, ANN models reported superior AUC values in predicting overnight stay (0.835 vs. 0.589), 30-day complications (0.742 vs. 0.590), reoperation (0.842 vs. 0.601), ACLR-related readmission (0.872 vs. 0.606), DVT (0.804 vs. 0.608), and surgical site infection (SSI) (0.818 vs. 0.596). Conclusions: The ML models developed in this study demonstrate an application of ML in which data from a national surgical patient registry was used to predict hospital admission and 30-day postoperative complications after elective ACLR. Machine learning models developed performed well, outperforming regression models in predicting hospital admission and short-term complications following elective ACLR. ML models performed best when predicting ACLR-related readmissions and reoperations, followed by overnight stay. Level of evidence: IV, retrospective comparative prognostic trial.
Article
Complex statistical approaches are increasingly being used in the orthopaedic literature, and this is especially true in the field of sports medicine. Tools such as machine learning provide the opportunity to analyze certain research areas that would often require the complex assessment of large amounts of data. Generally, decision making is multifactorial and based upon experience, personal capabilities, available utilities, and literature. Given the difficulty associated with determining the optimal patient treatment, many studies have moved toward more complex statistical approaches to create algorithms that take large amounts of data and distill it into a formula that may guide surgeons to better patient outcomes while estimating and even optimizing costs. In the future, this clinical and economic information will play an important role in patient management.
Article
Machine learning (ML) and artificial intelligence (AI) may be described as advanced statistical techniques using algorithms to “learn” to evaluate and predict relationships between input and results without explicit human programming, often with high accuracy. The potentials and pitfalls of ML continue to be explored as predictive modeling grows in popularity. While use of and optimism for AI continues to increase in orthopaedic surgery, there remains little high-quality evidence of its ability to improve patient outcome. It is up to us as clinicians to provide context for ML models and guide the use of these technologies to optimize the outcome for our patients. Barriers to widespread adoption of ML include poor quality data, limits to compliant data sharing, few clinicians who are expert in ML statistical techniques, and computing costs including technology, infrastructure, personnel, energy, and updates.
Article
There exists great hope and hype in the literature surrounding applications of artificial intelligence (AI) to orthopaedic surgery. Between 2018-2021, a total of 178 AI-related articles were published in orthopaedics. However, for every two original research papers that apply AI to orthopaedics, a commentary or review is published (30.3%). AI-related research in orthopaedics frequently fails to provide use cases that offer the uninitiated an opportunity to appraise the importance of AI by studying meaningful questions, evaluating unknown hypotheses, or analyzing quality data. The hype perpetuates a feed-forward cycle that relegates AI to a meaningless buzzword by rewarding those with nascent understanding and rudimentary technical knowhow into committing several basic errors: (1) inappropriately conflating vernacular (“AI/ML”), (2) repackaging registry data, (3) prematurely releasing internally validated algorithms, (4) overstating the “black box phenomenon” by failing to provide weighted analysis, (5) claiming to evaluate AI rather than the data itself, and (6) withholding full model architecture code. Relevant AI-specific guidelines are forthcoming, but forced application of the original TRIPOD guidelines designed for regression analyses are irrelevant and misleading. To safeguard meaningful use, AI-related research efforts in orthopaedics should be (1) directed towards administrative support over clinical evaluation and management, (2) require the use of the advanced model, and (3) answer a question that was previously unknown, unanswered, or unquantifiable.
Article
With the plethora of machine learning (ML) analyses published in the orthopaedic literature within the last five years, several attempts have been made to enhance our understanding of what exactly ML means and how it is used. At its most fundamental level, ML comprises a branch of artificial intelligence that uses algorithms to analyze and learn from patterns in data without explicit programming or human intervention. On the contrary, traditional statistics require a user to specifically choose variables of interest to create a model capable of predicting an outcome, the output of which (1) may be falsely influenced by the variables chosen to be included by the user and (2) does not allow for optimization of performance. Early publications have served as succinct editorials or reviews intended to ease audiences unfamiliar with ML into the complexities that accompany the subject. Most commonly, the focus of these studies concerns the terminology and concepts surrounding ML as it is important to understand the rationale behind performing such studies. Unfortunately, these publications only touch on the most basic aspects of ML and are too frequently repetitive. Indeed, the conclusion of these articles reiterate that the potential clinical utility of these algorithms remain tangential at best in their current form and caution against premature adoption without external validation. By doing so, our perspective and ability to draw our own conclusions from these studies has not advanced, and we are left concluding with each subsequent study that a new algorithm is published for an outcome of interest that cannot be used until further validation. What readers now need is to regress to embrace the principles of the scientific method that they have used to critically assess vast numbers of publications prior to this wave of newly applied statistical methodology – a guide to interpret results such that their own conclusions can be drawn.
Article
Recent research using machine learning and data mining to determine predictors of prolonged opioid use after arthroscopic surgery showed that Artificial Neural Networks showed superior discrimination and calibration. Other machine learning algorithms, such as Naïve Bayes, XG Boost, Gradient Boosting Model, Random Forest, and Elastic Net, were also reliable despite slightly lower Brier scores and mean areas under the curve. Machine learning and data mining have limitations, however, and outputs are reliant on large sample sizes and the accuracy of big data. Poor-quality data and the lack of confounding variables are further limitations. There is no doubt that predictive modeling, artificial intelligence, machine learning, and data mining will become a major component of the physician’s practice, and doctors of medicine and related researchers should become familiar with these techniques. Physicians require an understanding of data science for the following reasons: monitoring of large databases could allow early diagnosis of pathologic conditions in individual patients; multiparameter data can be used to assist in the development of care pathways; data visualization could help with interpretation of medical images; understanding artificial intelligence workflow and machine learning will help us with understanding early warning signs of disease; and data science will facilitate personalized medicine with which clinicians can predict treatment outcomes.
Article
Purpose To determine what subspecialties have applied ML to predict CSO within orthopaedic surgery and to determine whether the performance of these models was acceptable through assessing discrimination and other ML metrics. Methods PubMed, EMBASE, and Cochrane Central Register of Controlled Trials databases were queried for articles that used ML to predict achieving the minimal clinically important difference (MCID), patient acceptable symptomatic state (PASS), or substantial clinical benefit (SCB) following orthopaedic surgeries. Data pertaining to demographics, subspecialty, specific machine learning algorithms, and algorithm performance were analyzed. Results Eighteen articles met the inclusion criteria. Seventeen studies developed novel algorithms, while one externally validated an established algorithm. All studies used ML to predict MCID achievement, while three (16.7%) predicted SCB achievement, and none PASS achievement. Seven (38.9%) studies concerned outcomes after spine surgery, six (33.3%) after sports medicine surgery, three (16.7%) after total joint arthroplasty (TJA), and two (11.1%) after shoulder arthroplasty. No studies were found in trauma, hand, elbow, pediatric, or foot/ankle surgery. In spine surgery, c-statistics ranged between 0.65-0.92; in hip arthroscopy, 0.51-0.94; in TJA, 0.63-0.89; in shoulder arthroplasty, between 0.70-0.95. The majority of studies reported c-statistics on the upper end of these ranges, though populations were heterogeneous. Conclusion Currently available ML algorithms can discriminate between propensity to achieve CSO using MCID after spine, TJA, sports medicine, and shoulder surgery with fair to good performance as evidenced by C-statistics ranging between 0.6-0.95 in the majority of analyses. Less evidence is available on the ability of ML to predict achievement of SCB, and no evidence is available for achieving PASS. Such algorithms may augment shared decision-making practices and allow clinicians to provide more appropriate patient expectations using individualized risk assessments. However, these studies remain limited by variable reporting performance metrics, CSO quantification methods, adherence to predictive modeling guidelines, and limited external validation.
Article
Purpose The purpose of the current study is to develop a machine learning algorithm to predict total charges after ambulatory hip arthroscopy and create a risk-adjusted payment model based on patient comorbidities. Methods A retrospective review of the New York State Ambulatory Surgery and Services database was performed to identify patients who underwent elective hip arthroscopy between 2015-2016. Features included in initial models consisted of patient characteristics, medical comorbidities, and procedure-specific variables. Models were generated to predict total charges using five algorithms. Model performance was assessed by root mean squared error, root mean squared logarithmic error, and the coefficient of determination. Global variable importance and partial dependence curves were constructed to demonstrate the impact of each input feature on total charges. For performance benchmarking, the best candidate model was compared with a multivariate linear regression utilizing the same input features. Results A total of 5,121 patients were included. Median costs after hip arthroscopy were 19,720(IQR:19,720 (IQR: 12,399-26,439).Thegradientboostedensemblemodeldemonstratedthebestperformance(RMSE:26,439). The gradient-boosted ensemble model demonstrated the best performance (RMSE: 3800, 95%CI: 37003700 -3900; RMSLE: 0.249, 95%CI: 0.24-0.26, R2: 0.73). Major cost drivers included total hours in facility <12 or >15, longer procedure time, performance of a labral repair, age <30, Elixhauser comorbidity index ≥1, African American race, residence in extreme urban and rural areas, and higher household and neighborhood income. Conclusion The gradient-boosted ensemble model effectively predicted total charges after hip arthroscopy. Few modifiable variables were identified other than anesthesia type; nonmodifiable drivers of total charges included duration of care <12 or >15 hours, OR time >100 minutes, age <30, performance of a labral repair, and ECI of >0. Stratification of patients based on ECI highlighted the increased financial risk bore by physicians in flat reimbursement schedules given variable degrees of comorbidities.
Article
Concerns over need for CT radiation dose optimization and reduction led to improved scanner efficiency and introduction of several reconstruction techniques and image processing-based software. The latest technologies use artificial intelligence (AI) for CT dose optimization and image quality improvement. While CT dose optimization has and can benefit from AI, variations in scanner technologies, reconstruction methods, and scan protocols can lead to substantial variations in radiation doses and image quality across and within different scanners. These variations in turn can influence performance of AI algorithms being deployed for tasks such as detection, segmentation, characterization, and quantification. We review the complex relationship between AI and CT radiation dose.
Article
Disruptive innovation completely changes the traditional way that we operate and may only be realized in retrospect. For example, shoulder superior capsule reconstruction (SCR) is a complete change from the traditional methods of treating massive, irreparable rotator cuff tears and pseudoparalysis. Classic examples of disruptions in orthopaedic surgery include distraction osteogenesis, total hip joint replacement arthroplasty, and modern orthopaedic trauma care. Orthopaedic technologies that promise future disruption include artificial intelligence, surgical simulation, and orthopaedic biologics, including mesenchymal stromal cell (MSC) and gene therapy. Most of all, arthroscopic surgery completely changed the way we operate by using new methods and technology. Many never saw it coming. The challenge going forward is to motivate and foster new ideas and research that result in innovation and progress. Skepticism has a place, but not at the expense of transformative ideas, particularly as medical journals offer the alternative of prospective hypothesis testing using the scientific method, followed by unbiased peer review, and publication. Medical journals should be a forum for disruptive research.
Article
Purpose To develop a machine learning algorithm and clinician-friendly tool predicting the likelihood of prolonged opioid use (>90 days) following hip arthroscopy. Methods The Military Data Repository (MDR) was queried for all adult patients undergoing arthroscopic hip surgery between 2012 – 2017. Demographic, health history, and prescription records were extracted for all included patients. Opioid use was divided into preoperative use (30 – 365 days prior to surgery), perioperative use (30 days prior to surgery through 14 days after surgery), postoperative use (14 – 90 days after surgery), and prolonged postoperative use (90 – 365 days after surgery). Six machine learning algorithms (Naïve Bayes, Gradient Boosting Maching, Extreme Gradient Boosting, Random Forest, Elastic Net Regularization, and Artificial Neural Network) were developed. Area under the receiver operating curve (AUCs) and Brier scores were calculated for each model. Decision curve analysis was applied to assess clinical utility. Local-Interpretable Model-Agnostic Explantations (LIME) were used to demonstrate factor weights within the selected model. Results A total of 6,760 patients were included, of which 2,762 (40.9%) filled at least one opioid prescription >90 days after surgery. The Artificial Neural Network model showed superior discrimination and calibration with AUC = 0.71 (95% CI = 0.68-0.74) and Brier score = 0.21 (95% CI = 0.20-0.22). Post-surgical opioid use, age, and preoperative opioid use had the most influence on model outcome. Lesser factors included the presence of a psychological comorbidity and strong history of a substance use disorder. Conclusions The Artificial Neural Network model shows sufficient validity and discrimination for use in clinical practice. The five identified factors (age, preoperative opioid use, postoperative opioid use, presence of a mental health cormorbidity, and presence of a preoperative substance use disorder) accurately predict the likelihood of prolonged opioid use following hip arthroscopy.
Article
In summary, AI and machine learning are new to Arthroscopy journal, and absent background, the concepts and related research may seem confusing and unapproachable. The fact that AI and machine learning refer to computers designed by humans reminds us of the following: AI and machine learning represent tools to which we, as clinicians and scientists, must adapt. Next, because machine learning is a type of AI in which computers are programmed to improve the algorithms under which they function over time, insight is required to achieve an element of explainability regarding the key data underlining a particular machine-learning prediction. Finally, machine-learning algorithms require validation before they can be applied to data sets different from the data on which they were trained.
Article
Machine learning and artificial intelligence are increasingly used in modern health care, including arthroscopic and related surgery. Multiple high-quality, Level I evidence, randomized, controlled investigations have recently shown the ability of hip arthroscopy to successfully treat femoroacetabular impingement syndrome and labral tears. Contemporary hip preservation practice strives to continually refine and improve the value of care provision. Multiple single-center and multicenter prospective registries continue to grow as part of both United States-based and international hip preservation-specific networks and collaborations. The ability to predict postoperative patient-reported outcomes preoperatively holds great promise with machine learning. Machine learning requires massive amounts of data, which can easily be generated from electronic medical records and both patient- and clinician-generated questionnaires. On top of text-based data, imaging (e.g., plain radiographs, computed tomography, and magnetic resonance imaging) can be rapidly interpreted and used in both clinical practice and research. Formidable computational power is also required, using different advanced statistical methods and algorithms to generate models with the ability to predict individual patient outcomes. Efficient integration of machine learning into hip arthroscopy practice can reduce physicians' "busywork" of data collection and analysis. This can only improve the value of the patient experience, because surgeons have more time for shared decision making, with empathy, compassion, and humanity counterintuitively returning to medicine.
Article
Background: Despite previous reports of improvements for athletes following hip arthroscopy for femoroacetabular impingement syndrome (FAIS), many do not achieve clinically relevant outcomes. The purpose of this study was to develop machine learning algorithms capable of providing patient-specific predictions of which athletes will derive clinically relevant improvement in sports-specific function after undergoing hip arthroscopy for FAIS. Methods: A registry was queried for patients who had participated in a formal sports program or athletic activities before undergoing primary hip arthroscopy between January 2012 and February 2018. The primary outcome was achieving the minimal clinically important difference (MCID) in the Hip Outcome Score-Sports Subscale (HOS-SS) at a minimum of 2 years postoperatively. Recursive feature selection was used to identify the combination of variables, from an initial pool of 26 features, that optimized model performance. Six machine learning algorithms (stochastic gradient boosting, random forest, adaptive gradient boosting, neural network, support vector machine, and elastic-net penalized logistic regression [ENPLR]) were trained using 10-fold cross-validation 3 times and applied to an independent testing set of patients. Models were evaluated using discrimination, decision-curve analysis, calibration, and the Brier score. Results: A total of 1,118 athletes were included, and 76.9% of them achieved the MCID for the HOS-SS. A combination of 6 variables optimized algorithm performance, and specific cutoffs were found to decrease the likelihood of achieving the MCID: preoperative HOS-SS score of ≥58.3, Tönnis grade of 1, alpha angle of ≥67.1°, body mass index (BMI) of >26.6 kg/m2, Tönnis angle of >9.7°, and age of >40 years. The ENPLR model demonstrated the best performance (c-statistic: 0.77, calibration intercept: 0.07, calibration slope: 1.22, and Brier score: 0.14). This model was transformed into an online application as an educational tool to demonstrate machine learning capabilities. Conclusions: The ENPLR machine learning algorithm demonstrated the best performance for predicting clinically relevant sports-specific improvement in athletes who underwent hip arthroscopy for FAIS. In our population, older athletes with more degenerative changes, high preoperative HOS-SS scores, abnormal acetabular inclination, and an alpha angle of ≥67.1° achieved the MCID less frequently. Following external validation, the online application of this model may allow enhanced shared decision-making.
Article
The use of advanced statistical methods and artificial intelligence including machine learning enables researchers to identify preoperative characteristics predictive of patients achieving minimal clinically important differences in health outcomes after interventions including surgery. Machine learning uses algorithms to recognize patterns in data sets to predict outcomes. The advantages are the ability, using “big data” registries, to infer relations that otherwise would not be readily understood and the ability to continuously improve the model as new data are added. However, machine learning has limitations. Models are only as good as the data incorporated, and data may be misapplied owing to huge data sets and strong computing capabilities, in which spurious correlations may be suggested based on significant P values. Hence, common sense must be applied. The future of outcome prediction studies will most definitely rely on machine learning and artificial intelligence methods.
Article
Objective To compare rib fracture detection and classification by radiologists using CT images with and without a deep learning model.Materials and methodsA total of 8529 chest CT images were collected from multiple hospitals for training the deep learning model. The test dataset included 300 chest CT images acquired using a single CT scanner. The rib fractures were marked in the bone window on each CT slice by experienced radiologists, and the ground truth included 861 rib fractures. We proposed a heterogeneous neural network for rib fracture detection and classification consisting of a cascaded feature pyramid network and a classification network. The deep learning-based model was evaluated based on the external testing data. The precision rate, recall rate, F1-score, and diagnostic time of two junior radiologists with and without the deep learning model were computed, and the Chi-square, one-way analysis of variance, and least significant difference tests were used to analyze the results.ResultsThe use of the deep learning model increased detection recall and classification accuracy (0.922 and 0.863) compared with the radiologists alone (0.812 vs. 0.850). The radiologists achieved a higher precision rate, recall rate, and F1-score for fracture detection when using the deep learning model, at 0.943, 0.978, and 0.960, respectively. When using the deep learning model, the radiologist’s reading time was decreased from 158.3 ± 35.7 s to 42.3 ± 6.8 s.Conclusion Radiologists achieved the highest performance in diagnosing and classifying rib fractures on CT images when assisted by the deep learning model.
Article
Background Inappropriate acetabular component angular position is believed to increase the risk of hip dislocation following total hip arthroplasty (THA). However, manual measurement of these angles is time consuming and prone to inter-observer variability. The purpose of this study was to develop a deep learning tool to automate the measurement of acetabular component angles on postoperative radiographs. Methods Two cohorts of 600 anteroposterior (AP) pelvis and 600 cross-table lateral hip postoperative radiographs were used to develop deep learning models to segment the acetabular component and the ischial tuberosities. Cohorts were manually annotated, augmented, and randomly split to train-validation-test datasets on an 8:1:1 basis. Two U-Net convolutional neural network (CNN) models (one for AP and one for cross-table lateral radiographs) were trained for 50 epochs. Image processing was then deployed to measure the acetabular component angles on the predicted masks on anatomical landmarks. Performance of the tool was tested on 80 AP and 80 cross-table lateral radiographs. Results The CNN models achieved a mean Dice Similarity Coefficient of 0.878 and 0.903 on AP and cross-table lateral test datasets, respectively. The mean difference between human-level and machine-level measurements was 1.35° (σ=1.07°) and 1.39° (σ=1.27°) for the inclination and anteversion angles, respectively. Differences of 5⁰ or more between human-level and machine-level measurements were observed in less than 2.5% of cases. Conclusions We developed a highly accurate deep learning tool to automate the measurement of angular position of acetabular components for use in both clinical and research settings. Level of Evidence III
Article
Background: Fresh osteochondral allograft transplantation (OCA) is an effective method of treating symptomatic cartilage defects of the knee. This cartilage restoration technique involves the single-stage implantation of viable, mature hyaline cartilage into the chondral or osteochondral lesion. Predictive models for reaching the clinically meaningful outcome among patients undergoing OCA for cartilage lesions of the knee remain under investigation. Purpose: To apply machine learning to determine which preoperative variables are predictive for achieving the minimal clinically important difference (MCID) and substantial clinical benefit (SCB) at 1 and 2 years after OCA for cartilage lesions of the knee. Study design: Case-control study; Level of evidence, 3. Methods: Data were analyzed for patients who underwent OCA of the knee by 2 high-volume fellowship-trained cartilage surgeons before May 1, 2018. The International Knee Documentation Committee questionnaire (IKDC), Knee Outcome Survey-Activities of Daily Living (KOS-ADL), and Mental Component (MCS) and Physical Component (PCS) Summaries of the 36-Item Short Form Health Survey (SF-36) were administered preoperatively and at 1 and 2 years postoperatively. A total of 84 predictive models were created using 7 unique architectures to detect achievement of the MCID for each of the 4 outcome measures and the SCB for the IKDC and KOS-ADL at both time points. Data inputted into the models included previous and concomitant surgical history, laterality, sex, age, body mass index (BMI), intraoperative findings, and patient-reported outcome measures (PROMs). Shapley Additive Explanations (SHAP) analysis identified predictors of reaching the MCID and SCB. Results: Of the 185 patients who underwent OCA for the knee and met eligibility criteria from an institutional cartilage registry, 135 (73%) patients were available for the 1-year follow-up and 153 (83%) patients for the 2-year follow-up. In predicting outcomes after OCA in terms of the IKDC, KOS-ADL, MCS, and PCS at 1 and 2 years, areas under the receiver operating characteristic curve (AUCs) of the top-performing models ranged from fair (0.72) to excellent (0.94). Lower baseline mental health (MCS), higher baseline physical health (PCS) and knee function scores (KOS-ADL, IKDC Subjective), lower baseline activity demand (Marx, Cincinnati sports), worse pain symptoms (Cincinnati pain, SF-36 pain), and higher BMI were thematic predictors contributing to failure to achieve the MCID or SCB at 1 and 2 years postoperatively. Conclusion: Our machine learning models were effective in predicting outcomes and elucidating the relationships between baseline factors contributing to achieving the MCID for OCA of the knee. Patients who preoperatively report poor mental health, catastrophize pain symptoms, compensate with higher physical health and knee function, and exhibit lower activity demands are at risk for failing to reach clinically meaningful outcomes after OCA of the knee.
Article
From imaging interpretation and health monitoring to drug development, the role of artificial intelligence (AI) in medicine has increased. But AI is not ready to replace humans when it comes to the diagnosis of sports medicine conditions. Rather, in highly specialized fields such as sports medicine, when it comes to interpretation of diagnostic studies such as magnetic resonance imaging scans (that are more sophisticated than simple radiographs), experts outperform AI systems at present. Key features of clinical practice, such as the physical examination, in-person consultation, and ultimately, decision making, cannot be easily replaced. As every novel “smart” tool is incorporated into our lives, we need to be ready to embrace its use, but we also ought to be critical of its implementation and seek transparency at every step of the process. We cannot afford to see AI as an antagonistic element in our practices but rather as a valuable assistant that could someday improve diagnostic accuracy.
Article
Purpose: Recovery following elective knee arthroscopy can be compromised by prolonged postoperative opioid utilization, yet an effective and validated risk calculator for this outcome remains elusive. The purpose of this study is to develop and validate a machine-learning algorithm that can reliably and effectively predict prolonged opioid consumption in patients following elective knee arthroscopy. Methods: A retrospective review of an institutional outcome database was performed at a tertiary academic medical centre to identify adult patients who underwent knee arthroscopy between 2016 and 2018. Extended postoperative opioid consumption was defined as opioid consumption at least 150 days following surgery. Five machine-learning algorithms were assessed for the ability to predict this outcome. Performances of the algorithms were assessed through discrimination, calibration, and decision curve analysis. Results: Overall, of the 381 patients included, 60 (20.3%) demonstrated sustained postoperative opioid consumption. The factors determined for prediction of prolonged postoperative opioid prescriptions were reduced preoperative scores on the following patient-reported outcomes: the IKDC, KOOS ADL, VR12 MCS, KOOS pain, and KOOS Sport and Activities. The ensemble model achieved the best performance based on discrimination (AUC = 0.74), calibration, and decision curve analysis. This model was integrated into a web-based open-access application able to provide both predictions and explanations. Conclusion: Following appropriate external validation, the algorithm developed presently could augment timely identification of patients who are at risk of extended opioid use. Reduced scores on preoperative patient-reported outcomes, symptom duration and perioperative oral morphine equivalents were identified as novel predictors of prolonged postoperative opioid use. The predictive model can be easily deployed in the clinical setting to identify at risk patients thus allowing providers to optimize modifiable risk factors and appropriately counsel patients preoperatively. Level of evidence: III.
Article
Purpose: To (1) develop machine learning algorithms to predict failure to achieve clinically significant satisfaction after hip arthroscopy. Methods: Consecutive primary hip arthroscopy patients between January 2012-January 2017 were queried. Five supervised machine learning algorithms were developed on a training set of patients and internally validated on an independent testing set of patients by discrimination, Brier score, calibration, and decision-curve analysis. The minimal clinically significant difference (MCID) for the visual analog scale (VAS) for satisfaction was derived using an anchor-based method and used as the primary outcome. Results: A total of 935 patients were included, of which 148 (15.8%) failed to achieve the MCID for VAS Satisfaction at a minimum of two-years postoperatively. The best performing algorithm was the Neural Network model (c-statistic:0.94, calibration intercept:-0.43, calibration slope:0.94, Brier score:0.050). The five most important features to predict failure to achieve the MCID for the VAS Satisfaction were history of anxiety/depression, lateral center edge angle, preoperative symptom duration exceeding two years, presence of one or more drug allergies, and workers compensation. Conclusions: Supervised machine learning algorithms conferred excellent discrimination and performance for predicting clinically significant satisfaction after hip arthroscopy, though this analysis was performed on a single population of patients. External validation is required to confirm the performance of these algorithms.