Preprint

EVINCE: Optimizing Adversarial LLM Dialogues via Conditional Statistics and Information Theory

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the author.

Abstract

This paper introduces EVINCE (Entropy and Variation IN Conditional Exchanges), a dialogue framework advancing Artificial General Intelligence (AGI) by enhancing versatility, adaptivity, and reasoning in large language models (LLMs). Leveraging adversarial debate and a novel dual entropy theory, EVINCE improves prediction accuracy, robustness, and stability in LLMs by integrating statistical modeling, information theory, and machine learning to balance diverse perspective exploration with strong prior exploitation. The framework's effectiveness is demonstrated through consistent convergence of information-theoretic metrics, particularly improved mutual information, fostering productive LLM collaboration. We apply EVINCE to healthcare, showing improved disease diagnosis, and discuss its broader implications for decision-making across domains. This work provides theoretical foundations and empirical validation for EVINCE, paving the way for advancements in LLM collaboration and AGI development.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
This study introduces SocraHealth, an innovative method using Large Language Models (LLMs) for medical diagnostics. By engaging LLM-based agents in structured debates, SocraHealth not only refines diagnoses but also corrects historical record inaccuracies, utilizing patient data effectively. The case study, featuring GPT-4 and Bard across two experiments, showcases this approach's success in producing logical, hallucination-free debates. Demonstrating a significant advancement over traditional diagnostic techniques, SocraHealth highlights the transformative power of LLMs in healthcare, especially in enhancing diagnostic accuracy and rectifying past diagnostic errors.
Conference Paper
Full-text available
This study explores the architectural advancements of large language models (LLMs), with a particular focus on the GPT-4 model. We begin with a thorough analysis of GPT-4’s distinctive features, including its polydisciplinary and polymodal data representation, the balanced approach in its algorithmic training, and the synergistic blend of human-driven insights with data-centric learning processes. Building upon these insights, we introduce SocraSynth, a {\em reasoning layer} thoughtfully crafted to augment knowledge discovery and bolster analytical reasoning across an ensemble of LLMs. SocraSynth is designed to facilitate a generative process through multi-agent analytical discussions, followed by the evaluation of the resultant arguments for their ``reasonableness.'' This approach significantly enhances interdisciplinary information discovery and complex reasoning, strategically addressing major challenges faced by LLMs, such as the production of contextually inaccurate responses (hallucinations) and entrenched statistical biases. Implementing SocraSynth across various application domains marks a significant advancement in overcoming the limitations of current LLMs, paving the way for more reliable and sophisticated AI-driven analytical tools.
Conference Paper
Full-text available
This paper presents a systematic approach to using the Socratic method in developing prompt templates that effectively interact with large language models, including GPT-3. Various methods are examined, and those that yield precise answers and justifications while fostering creativity and imagination to enhance creative writing are identified. Techniques such as definition, elenchus, dialectic, maieutics, generalization, and counterfactual reasoning are discussed for their application in engineering prompt templates and their connections to inductive, deductive, and abductive reasoning. Through examples, the effectiveness of these dialogue and reasoning methods is demonstrated. An interesting observation is made that when the task's goal and user intent are conveyed to GPT-3 via ChatGPT before the start of a dialogue, the large language model seems to connect to the external context expressed in the intent and perform more effectively. Index Terms-large language model, natural language processing , prompting, the Socratic method.
Article
Full-text available
Ensemble approaches to classification and regression have attracted a great deal of interest in recent years. These methods can be shown both theoretically and empirically to outperform single predictors on a wide range of tasks. One of the elements required for accurate prediction when using an ensemble is recognised to be error “diversity”. However, the exact meaning of this concept is not clear from the literature, particularly for classification tasks. In this paper we first review the varied attempts to provide a formal explanation of error diversity, including several heuristic and qualitative explanations in the literature. For completeness of discussion we include not only the classification literature but also some excerpts of the rather more mature regression literature, which we believe can still provide some insights. We proceed to survey the various techniques used for creating diverse ensembles, and categorise them, forming a preliminary taxonomy of diversity creation methods. As part of this taxonomy we introduce the idea of implicit and explicit diversity creation methods, and three dimensions along which these may be applied. Finally we propose some new directions that may prove fruitful in understanding classification error diversity.
Article
Full-text available
We investigate the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval. The EMD is based on the minimal cost that must be paid to transform one distribution into the other, in a precise sense, and was first proposed for certain vision problems by Peleg, Werman, and Rom. For image retrieval, we combine this idea with a representation scheme for distributions that is based on vector quantization. This combination leads to an image comparison framework that often accounts for perceptual similarity better than other previously proposed methods. The EMD is based on a solution to the transportation problem from linear optimization, for which efficient algorithms are available, and also allows naturally for partial matching. It is more robust than histogram matching techniques, in that it can operate on variable-length representations of the distributions that avoid quantization and other binning problems typical of histograms. When used to compare distributions with the same overall mass, the EMD is a true metric. In this paper we focus on applications to color and texture, and we compare the retrieval performance of the EMD with that of other distances.
Conference Paper
This research addresses the dual objectives of improving the quality and reducing bias in Wikipedia and news articles. Quality is evaluated in terms of breadth, depth, accuracy, and neutrality, reflecting both the soundness of the content and the authority of references. We conceptualize bias as any tilt or slant in the information presented. Our methodology employs multiple Large Language Models (LLMs) in a novel way to appraise and refine articles from different standpoints. One LLM acts as an advocate for the article's current state, promoting its strengths and integrity. Concurrently, other LLMs scrutinize and challenge the article, applying defined metrics for quality and impartiality. This dialectical approach culminates in a synthesized enhancement that consolidates diverse insights, thus advancing the article's quality by diminishing bias. Our empirical findings substantiate the effectiveness of this technique in concurrently advancing the neutrality and caliber of content.
Article
Background Diagnostic errors cause substantial preventable harms worldwide, but rigorous estimates for total burden are lacking. We previously estimated diagnostic error and serious harm rates for key dangerous diseases in major disease categories and validated plausible ranges using clinical experts. Objective We sought to estimate the annual US burden of serious misdiagnosis-related harms (permanent morbidity, mortality) by combining prior results with rigorous estimates of disease incidence. Methods Cross-sectional analysis of US-based nationally representative observational data. We estimated annual incident vascular events and infections from 21.5 million (M) sampled US hospital discharges (2012–2014). Annual new cancers were taken from US-based registries (2014). Years were selected for coding consistency with prior literature. Disease-specific incidences for 15 major vascular events, infections and cancers (‘Big Three’ categories) were multiplied by literature-based rates to derive diagnostic errors and serious harms. We calculated uncertainty estimates using Monte Carlo simulations. Validity checks included sensitivity analyses and comparison with prior published estimates. Results Annual US incidence was 6.0 M vascular events, 6.2 M infections and 1.5 M cancers. Per ‘Big Three’ dangerous disease case, weighted mean error and serious harm rates were 11.1% and 4.4%, respectively. Extrapolating to all diseases (including non-‘Big Three’ dangerous disease categories), we estimated total serious harms annually in the USA to be 795 000 (plausible range 598 000–1 023 000). Sensitivity analyses using more conservative assumptions estimated 549 000 serious harms. Results were compatible with setting-specific serious harm estimates from inpatient, emergency department and ambulatory care. The 15 dangerous diseases accounted for 50.7% of total serious harms and the top 5 (stroke, sepsis, pneumonia, venous thromboembolism and lung cancer) accounted for 38.7%. Conclusion An estimated 795 000 Americans become permanently disabled or die annually across care settings because dangerous diseases are misdiagnosed. Just 15 diseases account for about half of all serious harms, so the problem may be more tractable than previously imagined.
Chapter
Half-title pageSeries pageTitle pageCopyright pageDedicationPrefaceAcknowledgementsContentsList of figuresHalf-title pageIndex
Article
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone-Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games, and prediction of points in n. In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of the new boosting algorithm to the problem of learning functions whose range, rather than being binary, is an arbitrary finite set or a bounded segment of the real line.
Article
An abstract is not available.
Article
Compilación de artículos referentes al pensamiento crítico, en los que el autor plantea la necesidad de situar al pensamiento crítico en el núcleo de las reformas educativas; la importancia de desarrollar formas de pensamiento y aprendizaje más complejas para enfrentar los retos en un mundo de cambios acelerados y el papel imprescindible que este tipo de habilidades desempeñarán en el desarrollo económico futuro.
Article
Jaynes's principle of maximum entropy and Kullbacks principle of minimum cross-entropy (minimum directed divergence) are shown to be uniquely correct methods for inductive inference when new information is given in the form of expected values. Previous justifications use intuitive arguments and rely on the properties of entropy and cross-entropy as information measures. The approach here assumes that reasonable methods of inductive inference should lead to consistent results when there are different ways of taking the same information into account (for example, in different coordinate system). This requirement is formalized as four consistency axioms. These are stated in terms of an abstract information operator and make no reference to information measures. It is proved that the principle of maximum entropy is correct in the following sense: maximizing any function but entropy will lead to inconsistency unless that function and entropy have identical maxima. In other words given information in the form of constraints on expected values, there is only one (distribution satisfying the constraints that can be chosen by a procedure that satisfies the consistency axioms; this unique distribution can be obtained by maximizing entropy. This result is established both directly and as a special case (uniform priors) of an analogous result for the principle of minimum cross-entropy. Results are obtained both for continuous probability densities and for discrete distributions.
Chateval: Towards better llm-based evaluators through multi-agent debate
  • Chi-Min Chan
  • Weize Chen
  • Yusheng Su
  • Jianxuan Yu
  • Wei Xue
  • Shanghang Zhang
  • Jie Fu
  • Zhiyuan Liu
Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. 2023. Chateval: Towards better llm-based evaluators through multi-agent debate. Preprint, arXiv:2308.07201.
The Path to Artificial General Intelligence -Insights from Adversarial LLM Dialogue
  • Y Edward
  • Chang
Edward Y. Chang. 2024b. The Path to Artificial General Intelligence -Insights from Adversarial LLM Dialogue. SocraSynth.com.
Tenenbaum, and Igor Mordatch. 2023. Improving factuality and reasoning in language models through multiagent debate
  • Yilun Du
  • Shuang Li
  • Antonio Torralba
Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. 2023. Improving factuality and reasoning in language models through multiagent debate. Preprint, arXiv:2305.14325.
Adaptive Mixtures of Local Experts
  • Robert A Jacobs
  • Michael I Jordan
  • Steven J Nowlan
  • Geoffrey E Hinton
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. Adaptive Mixtures of Local Experts. Neural Computation, 3(1):79-87.
On the translocation of masses
  • V Leonid
  • Kantorovich
Leonid V Kantorovich. 1942. On the translocation of masses. Doklady Akademii Nauk, 37(7-8):199-201.
Encouraging divergent thinking in large language models through multi-agent debate
  • Tian Liang
  • Zhiwei He
  • Wenxiang Jiao
  • Xing Wang
  • Yan Wang
  • Rui Wang
  • Yujiu Yang
  • Zhaopeng Tu
  • Shuming Shi
Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. 2023. Encouraging divergent thinking in large language models through multi-agent debate. Preprint, arXiv:2305.19118.
Debate helps supervise unreliable experts
  • Julian Michael
  • Salsabila Mahdi
  • David Rein
  • Jackson Petty
  • Julien Dirani
  • Vishakh Padmakumar
  • Samuel R Bowman
Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, and Samuel R. Bowman. 2023. Debate helps supervise unreliable experts. Preprint, arXiv:2311.08702.
Serious Harm From Diagnostic Error in US Healthcare Systems: Estimate of Its Magnitude and Cost
  • David E Newman-Toker
  • Kevin M Mcdonald
  • Christopher J Dy
  • Linda T Kohn
David E. Newman-Toker, Kevin M. McDonald, Christopher J. Dy, and Linda T. Kohn. 2023a. Serious Harm From Diagnostic Error in US Healthcare Systems: Estimate of Its Magnitude and Cost. BMJ Quality & Safety, 32(7):549-557.
Refuel: exploring sparse features in deep reinforcement learning for fast disease diagnosis
  • Yu-Shao Peng
  • Kai-Fu Tang
  • Hsuan-Tien Lin
Yu-Shao Peng, Kai-Fu Tang, Hsuan-Tien Lin, and more. 2018. Refuel: exploring sparse features in deep reinforcement learning for fast disease diagnosis. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, page 7333-7342.
An explanation of in-context learning as implicit bayesian inference
  • Sang Michael Xie
  • Aditi Raghunathan
  • Percy Liang
  • Tengyu Ma
Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. 2021. An explanation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations (ICLR).