Switchboard analysis of all the different algorithms

Switchboard analysis of all the different algorithms

Source publication
Article
Full-text available
A crucial bottleneck in medical artificial intelligence (AI) is high-quality labeled medical datasets. In this paper, we test a large variety of wisdom of the crowd algorithms to label medical images that were initially classified by individuals recruited through an app-based platform. Individuals classified skin lesions from the International Skin...

Context in source publication

Context 1
... now conduct a full switchboard analysis to test the different hybrid algorithms. We present our results in Fig. 6. We make the following observations. We observe that the accuracy ranges from 77.6% to 80.5% and balanced accuracy ranges from 69.0% to 78.9% depending on the aggregation algorithm when no prevalence information is used. The worst-performing algorithms, especially in the balanced accuracy metric, select only the top performers and ...

Citations

... Crowd Wisdom Platforms use the collective intelligence of crowds to generating ideas, and are typically used in decision making and problem solving [60]. Despite being successful in generating ideas, crowd wisdom platforms have limitations in data quality control [61]), bias, accuracy, scalability [62], and privacy problems [63]. Several studies show that the integration of AI into crowd wisdom platforms addresses limitations such as data quality control, bias, accuracy, privacy, and the lack of expertise. ...
... CI approaches harness the contributions of multiple experts to reduce errors and find creative solutions to complex problems [26,27]. In medical diagnostics, several studies have found that the collective solution of multiple diagnosticians outperforms the average individual across a range of medical contexts [28][29][30][31][32][33][34]. These studies have focused on binary or small-scale decision problems (e.g., detecting a specific condition), but CI has also proved successful in open-ended medical problems. ...
Preprint
Full-text available
Artificial intelligence systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased - shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 medical cases. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience, and can be attributed to humans' and LLMs' complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.
Article
AI systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased—shortcomings that may reflect LLMs’ inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here, we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 text-based medical case vignettes. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience and can be attributed to humans’ and LLMs’ complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.
Article
Context.— Generative artificial intelligence (GAI) is a promising new technology with the potential to transform communication and workflows in health care and pathology. Although new technologies offer advantages, they also come with risks that users, particularly early adopters, must recognize. Given the fast pace of GAI developments, pathologists may find it challenging to stay current with the terminology, technical underpinnings, and latest advancements. Building this knowledge base will enable pathologists to grasp the potential risks and impacts that GAI may have on the future practice of pathology. Objective.— To present key elements of GAI development, evaluation, and implementation in a way that is accessible to pathologists and relevant to laboratory applications. Data Sources.— Information was gathered from recent studies and reviews from PubMed and arXiv. Conclusions.— GAI offers many potential benefits for practicing pathologists. However, the use of GAI in clinical practice requires rigorous oversight and continuous refinement to fully realize its potential and mitigate inherent risks. The performance of GAI is highly dependent on the quality and diversity of the training and fine-tuning data, which can also propagate biases if not carefully managed. Ethical concerns, particularly regarding patient privacy and autonomy, must be addressed to ensure responsible use. By harnessing these emergent technologies, pathologists will be well placed to continue forward as leaders in diagnostic medicine.