Shangkun Che’s research while affiliated with Tsinghua University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (7)


Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research
  • Conference Paper

November 2024

·

13 Reads

·

4 Citations

Xuewen Han

·

Neng Wang

·

Shangkun Che

·

[...]

·

Sean Xin Xu

Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research

November 2024

·

25 Reads

In recent years, the application of generative artificial intelligence (GenAI) in financial analysis and investment decision-making has gained significant attention. However, most existing approaches rely on single-agent systems, which fail to fully utilize the collaborative potential of multiple AI agents. In this paper, we propose a novel multi-agent collaboration system designed to enhance decision-making in financial investment research. The system incorporates agent groups with both configurable group sizes and collaboration structures to leverage the strengths of each agent group type. By utilizing a sub-optimal combination strategy, the system dynamically adapts to varying market conditions and investment scenarios, optimizing performance across different tasks. We focus on three sub-tasks: fundamentals, market sentiment, and risk analysis, by analyzing the 2023 SEC 10-K forms of 30 companies listed on the Dow Jones Index. Our findings reveal significant performance variations based on the configurations of AI agents for different tasks. The results demonstrate that our multi-agent collaboration system outperforms traditional single-agent models, offering improved accuracy, efficiency, and adaptability in complex financial environments. This study highlights the potential of multi-agent systems in transforming financial analysis and investment decision-making by integrating diverse analytical perspectives.


Framework for Evaluating Bias of AIGC. (a) We proxy unbiased content with the news articles collected from The New York Times and Reuters. Please see the Section of “Data” for the justification of choosing these news agencies. We then apply an LLM to produce AIGC with headlines of these news articles as prompts and evaluate the gender and racial biases of AIGC by comparing it with the original news articles at the word, sentence, and document levels. (b) Examine the gender bias of AIGC under biased prompts.
Gender Bias at Word Level. (a) Word level gender bias of an LLM and its 95% confidence interval (error bar), measured using the average Wasserstein distance defined by Eq. (4). For example, the gender bias score of 0.2407 by Grover indicates that, on average, the absolute difference between the percentage of male (or female) specific words out of all gender related words in a news article generated by Grover and that percentage in its counterpart collected from The New York Times or Reuters is 24.07%. (b) Percentage of female prejudice news articles generated by an LLM. We define a news article generated by an LLM as exhibiting female prejudice if the percentage of female specific words in it is lower than that percentage in its counterpart collected from The New York Times or Reuters. (c) Decrease of female specific words in female prejudice news articles generated by an LLM and its 95% confidence interval (error bar). For example, the score of -39.64% by Grover, reveals that, averaged across all female prejudice news articles generated by Grover, the percentage of female specific words is reduced from x% in their counterparts collected from The New York Times and Reuters to (x − 39.64)% in those generated by Grover.
Racial Bias at Word Level. (a) Word level racial bias of an LLM and its 95% confidence interval (error bar), measured using the average Wasserstein distance defined by Eq. (4). For example, the racial bias score of 0.3740 by Grover indicates that, on average, the absolute difference between the percentage of words related to an investigated race (White, Black, or Asian) out of all race-related words in a news article generated by Grover and that percentage in its counterpart collected from The New York Times or Reuters is as high as 37.40%. (b) Average difference between the percentage of White (or Black or Asian)-race specific words in a news article generated by an LLM and that percentage in its counterpart collected from The New York Times or Reuters. Error bar reflects 95% confidence interval. (c) Percentage of Black prejudice news articles generated by an LLM. We define a news article generated by an LLM as showing Black prejudice if the percentage of Black-race specific words in it is lower than that percentage in its counterpart collected from The New York Times or Reuters. (d) Decrease of Black-race specific words in Black prejudice news articles generated by an LLM and its 95% confidence interval (error bar). For example, the score of − 48.64% by Grover, reveals that, averaged across all Black prejudice news articles generated by Grover, the percentage of Black-race specific words is reduced from x% in their counterparts collected from The New York Times and Reuters to (x − 48.64)% in those generated by Grover.
Gender Bias on Sentiment at Sentence Level. (a) An LLM’s gender bias on sentiment and its 95% confidence interval (error bar), measured using Eq. (5). For example, Grover attains 0.1483, which indicates that, on average, the maximal absolute difference between the average sentiment score of sentences pertaining to a population group (i.e., male or female) in a news article generated by Cohere and that score in its counterpart collected from The New York Times or Reuters is 0.1483. (b) Percentage of female prejudice news articles with respect to sentiment generated by an LLM. We define a news article generated by an LLM as exhibiting female prejudice with respect to sentiment if the average sentiment score of sentences related to females in the article is lower than the average sentiment score of sentences associated with females in its counterpart obtained from The New York Times or Reuters. (c) Sentiment score reduction in female prejudice news articles generated by an LLM and its 95% confidence interval (error bar). For example, the measurement score of -0.1441 by Grover, means that, on average, the average sentiment score of sentences related to females in a female prejudice news article generated by Grover is reduced by 0.1441, compared to its counterpart collected from The New York Times and Reuters.
Racial Bias on Sentiment at Sentence Level. (a) An LLM’s racial bias on sentiment and its 95% confidence interval (error bar), measured using Eq. (5). The racial bias score of 0.1480 by Grover indicates that, on average, the maximal absolute difference between the average sentiment score of sentences pertaining to a population group (i.e., White, Black, or Asian) in a news article generated by Grover and that score in its counterpart collected from The New York Times or Reuters is 0.1480. (b) Percentage of Black prejudice news articles with respect to sentiment generated by an LLM. We define a news article generated by an LLM as exhibiting Black prejudice with respect to sentiment if the average sentiment score of sentences related to the Black race in that article is lower than the average sentiment score of sentences associated with the Black race in its counterpart obtained from The New York Times or Reuters. (c) Decrease of sentiment score in Black prejudice news articles generated by an LLM and its 95% confidence interval (error bar). Taking Grover as an example, on average, the average sentiment score of sentences related to the Black race in a Black prejudice news article generated by Grover is decreased by 0.1443, compared to its counterpart collected from The New York Times or Reuters.

+5

Bias of AI-generated content: an examination of news produced by large language models
  • Article
  • Full-text available

March 2024

·

243 Reads

·

59 Citations

Large language models (LLMs) have the potential to transform our lives and work through the content they generate, known as AI-Generated Content (AIGC). To harness this transformation, we need to understand the limitations of LLMs. Here, we investigate the bias of AIGC produced by seven representative LLMs, including ChatGPT and LLaMA. We collect news articles from The New York Times and Reuters, both known for their dedication to provide unbiased news. We then apply each examined LLM to generate news content with headlines of these news articles as prompts, and evaluate the gender and racial biases of the AIGC produced by the LLM by comparing the AIGC and the original news articles. We further analyze the gender bias of each LLM under biased prompts by adding gender-biased messages to prompts constructed from these news headlines. Our study reveals that the AIGC produced by each examined LLM demonstrates substantial gender and racial biases. Moreover, the AIGC generated by each LLM exhibits notable discrimination against females and individuals of the Black race. Among the LLMs, the AIGC generated by ChatGPT demonstrates the lowest level of bias, and ChatGPT is the sole model capable of declining content generation when provided with biased prompts.

Download

Tagging Items with Emerging Tags: A Neural Topic Model Based Few-Shot Learning Approach

January 2024

·

7 Reads

·

1 Citation

ACM Transactions on Information Systems

The tagging system has become a primary tool to organize information resources on the Internet, which benefits both users and the platforms. To build a successful tagging system, automatic tagging methods are desired. With the development of society, new tags keep emerging. The problem of tagging items with emerging tags is an open challenge for automatic tagging system, and it has not been well studied in the literature. We define this problem as a tag-centered cold-start problem in this study and propose a novel neural topic model based few-shot learning method named NTFSL to solve the problem. In our proposed method, we innovatively fuse the topic modeling task with the few-shot learning task, endowing the model with the capability to infer effective topics to solve the tag-centered cold-start problem with the property of interpretability. Meanwhile, we propose a novel neural topic model for the topic modeling task to improve the quality of inferred topics, which helps enhance the tagging performance. Furthermore, we develop a novel inference method based on the variational auto-encoding framework for model inference. We conducted extensive experiments on two real-world datasets and the results demonstrate the superior performance of our proposed model compared with state-of-the-art machine learning methods. Case studies also show the interpretability of the model.



Bias of AI-Generated Content: An Examination of News Produced by Large Language Models

November 2023

·

208 Reads

·

1 Citation

Large language models (LLMs) have the potential to transform our lives and work through the content they generate, known as AI-Generated Content (AIGC). To harness this transformation, we need to understand the limitations of LLMs. Here, we investigate the bias of AIGC produced by seven representative LLMs, including ChatGPT and LLaMA. We collect news articles from The New York Times and Reuters, both known for their dedication to provide unbiased news. We then apply each examined LLM to generate news content with headlines of these news articles as prompts, and evaluate the gender and racial biases of the AIGC produced by the LLM by comparing the AIGC and the original news articles. We further analyze the gender bias of each LLM under biased prompts by adding gender-biased messages to prompts constructed from these news headlines. Our study reveals that the AIGC produced by each examined LLM demonstrates substantial gender and racial biases. Moreover, the AIGC generated by each LLM exhibits notable discrimination against females and individuals of the Black race. Among the LLMs, the AIGC generated by ChatGPT demonstrates the lowest level of bias, and ChatGPT is the sole model capable of declining content generation when provided with biased prompts.


Citations (5)


... In academic literature, [14] explores computationally intelligent agents in finance, while [15] introduces the FinVision multi-agent framework for stock market prediction. [16] optimizes AI-agent collaboration in financial research, and [17] discusses the impact of AI traders in financial markets. [18] presents FinRobot, an open-source AI agent platform for financial applications, and [19] introduces Fincon, a synthesized LLM multi-agent system for enhanced financial decision-making. ...

Reference:

A Literature Review of Gen AI Agents in Financial Applications: Models and Implementations
Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research
  • Citing Conference Paper
  • November 2024

... This work adopts in-context learning and instruction learning, and it expresses different abilities as different tasks with domain-specific prompts. [11] tackles the New Community Cold-Start (NCCS) problem by proposing a novel recommendation method that leverages the extensive knowledge and powerful inference capabilities of Large Language Models. It selects In-Context Learning (ICL) as the prompting strategy and designs a coarse-to-fine framework to efficiently choose demonstration examples for creating effective ICL prompts. ...

New Community Cold-Start Recommendation: A Novel Large Language Model-based Method
  • Citing Article
  • January 2024

SSRN Electronic Journal

... AIGC "is considered biased if it exhibits systematic and unfair discrimination against certain population groups, particularly underrepresented population groups" (Fang et al., 2024, p. 5224). For example, in a comprehensive scientific report, Fang et al. (2024) collected news articles from two outlets known for their unbiased content and used their headlines as prompts to examine gender and racial biases in AIGC by comparing the AI generated texts to the original news articles. Their study found notable discrimination against female and black identities. ...

Bias of AI-generated content: an examination of news produced by large language models

... 25 Yu (2019) • ----• --Deep learning approach for mobile health analytics to assess senior citizens' risks and health conditions. 26 Che et al. (2024) • ----• ML models and learning algorithms for tagging. 27 Rad et al. (2024) --• ----• Review of datasets for data-driven design approaches. ...

Tagging Items with Emerging Tags: A Neural Topic Model Based Few-Shot Learning Approach
  • Citing Article
  • January 2024

ACM Transactions on Information Systems

... Finally, most foundational models upon which GAI tools are built have been shown to be gender and racially biased, with most instances of harm occurring against Black women in particular [25,26]. Indeed, the rapid implementation of novel GAI applications has already caused harm outside medicine. ...

Bias of AI-Generated Content: An Examination of News Produced by Large Language Models
  • Citing Article
  • January 2023

SSRN Electronic Journal