Thomas I. Liao’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


Figure 3: (Left) Distribution of group aware consensus (GAC) of all the statements, and threshold for inclusion (red line) (Right) Distribution of the 'polarization indices'. Polarization tends to be low.
Figure 5: A heatmap of OpinionQA scores showing how well each model reflects different U.S. political ideologies.
Figure 6: A screenshot of the instructions and the Polis voting mechanism that the participants saw. A.3.2 Frequently Asked Questions.
Figure 8: We included a contact form for participants to ask questions or give feedback.
Evaluation scores.

+1

Collective Constitutional AI: Aligning a Language Model with Public Input
  • Preprint
  • File available

June 2024

·

21 Reads

Saffron Huang

·

Divya Siddarth

·

Liane Lovitt

·

[...]

·

Deep Ganguli

There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a target population to sourcing principles to training and evaluating a model. We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer. Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model, while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons of the models suggest that the models differ on the basis of their respective constitutions, e.g., when prompted with contentious topics, the CCAI-trained model tends to generate responses that reframe the matter positively instead of a refusal. These results demonstrate a promising, tractable pathway toward publicly informed development of language models.

Download


Figure 4: With Linguistic Prompting, LLM does not appear to be more representative of the corresponding non-Western countries.
Figure 6: Distribution of topics in the data. Majority of the questions are classified into "Politics and policy" and "Regions and countries".
Figure 7: An example where cross-national promoting changes the model's responses, but the model responses do not become more representative of the responses of the participants from Turkey. Corresponding model generations are in Table 7.
Figure 9: An example where the model's response changes when provided with a cross-national prompt, assigning 99.1% probability to the response "Generally bad".
Towards Measuring the Representation of Subjective Global Opinions in Language Models

June 2023

·

200 Reads

·

6 Citations

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.


The Capacity for Moral Self-Correction in Large Language Models

February 2023

·

307 Reads

·

10 Citations

We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereotyping, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.

Citations (3)


... To ensure clauses are treated as ultimately empty, a context-dependent confidence weighting of each constitutional clause could also be learned in this classifier. Importantly, the charter is transparent and modifiable, allowing revisions if the AI's behavior becomes overly cautious or lacks compassion, thereby adjusting both future training data and the classifier's boundaries (Huang et al., 2024). This flexibility enables the base model and classifier to generate AI-supervised data for testing revisions, scaling alignment efficiently with less need for constant human oversight . ...

Reference:

Contemplative Wisdom for Superalignment
Collective Constitutional AI: Aligning a Language Model with Public Input
  • Citing Conference Paper
  • June 2024

... The first study. In 2023, scientists developed the "Chinese Room of Increased Complexity" technology to create algorithmic copies of citizens of any country [11]. This was followed by the Wuhan experiment to predict the US presidential election in 2024 based on the analysis of the AI model of preferences of simulacra rather than people. ...

Towards Measuring the Representation of Subjective Global Opinions in Language Models

... Abdulhai et al. [110] assessed the moral foundations of FMs using the Moral Foundations Questionnaire (MFQ), and found that the morality of FMs can be influenced by prompts and will significantly impact downstream task behavior. Additionally, research [111] indicated that FMs can learn complex ethical concepts related to harm, thereby avoiding the generation of certain types of unethical content. ...

The Capacity for Moral Self-Correction in Large Language Models