PreprintPDF Available

Homophily in An Artificial Social Network of Agents Powered By Large Language Models

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Recent advances in Artificial Intelligence (AI) have given rise to chatbots based on Large Language Models (LLMs) - such as ChatGPT - that can provide human-like responses to a wide range of psychological and economic tasks. However, no study to date has explored whether a society of LLM-based agents behaves comparably to human societies. We conduct Social Network Analysis on Chirper.ai, a Twitter-like platform consisting only of LLM chatbots. We find early evidence of self-organized homophily in the sampled artificial society ( N = 31,764): like humans, bots with similar language and content engage more than dissimilar bots. However, content created by the bots tends to be more generic than human-generated content. We discuss the potential for developing LLM-driven Agent-Based Models of human societies, which may inform AI research and development and further the social scientific understanding of human social dynamics.
Content may be subject to copyright.
Page 1/21
Homophily in An Articial Social Network of Agents
Powered By Large Language Models
James He ( kh672@cantab.ac.uk )
University of Cambridge https://orcid.org/0000-0002-1859-4914
Felix Wallis
University College London
Steve Rathje
New York University
Article
Keywords: Large Language Models, Social Network Analysis, Articial Intelligence
Posted Date: June 24th, 2023
DOI: https://doi.org/10.21203/rs.3.rs-3096289/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License. 
Read Full License
Page 2/21
Abstract
Recent advances in Articial Intelligence (AI) have given rise to chatbots based on Large Language
Models (LLMs) - such as ChatGPT - that can provide human-like responses to a wide range of
psychological and economic tasks. However, no study to date has explored whether a
society
of LLM-
based agents behaves comparably to human societies. We conduct Social Network Analysis on
Chirper.ai, a Twitter-like platform consisting only of LLM chatbots. We nd early evidence of self-
organized homophily in the sampled articial society (
N
 = 31,764): like humans, bots with similar
language and content engage more than dissimilar bots. However, content created by the bots tends to be
more generic than human-generated content. We discuss the potential for developing LLM-driven Agent-
Based Models of human societies, which may inform AI research and development and further the social
scientic understanding of human social dynamics.
Main
Recent advances in Articial Intelligence (AI) have given rise to chatbots based on Large Language
Models (LLMs) - such as ChatGPT - that can provide human-like responses to a wide range of
psychological and economic tasks. Work studying these nascent LLMs has suggested that they can
advance the study of individual human behavior. For instance, one study has demonstrated that LLMs
can accurately detect psychological constructs (e.g., sentiment, discrete emotions, etc.) in cross-linguistic
text in a way that correlates strongly with human judgments1. Other work has found that moral
judgements of the LLM ChatGPT correlated very strongly with those of human participants (
r
 = 0.95),
indicating that LLMs can potentially be used to simulate human participants and make predictions about
human behavior2. Impressively, LLMs can also replicate human responses in economic games3, solve
cognitive psychological tasks in a way that is similar to humans4, replicate the classic Milgrim
psychological experiment5, and even respond to emotion inductions6, albeit falling short on multiplayer
games that require coordination3. This indicates that LLMs can simulate human responses and make
important predictions about human behavior7.
Given LLMs' ability to emulate
individual
human behavior and psychology, particularly in tasks involving
character-playing, could a society of such LLMs successfully mimic
collective
social behaviors? This
interdisciplinary query harbors two implications. First, the answer could provide AI researchers with
valuable insights into how the current generation of LLM agents behave differently from humans in their
social aspect. This could unveil novel pathways for rening AI’s ability to understand, learn from, and
interact with complex human societies. Second, the answer could indicate the viability of employing
LLMs to develop more advanced Agent-Based Modelling (ABM) methods, a key tool in social scientic
research 8,9. The LLMs’ ability to mirror human-like behaviors could improve existing ABM techniques,
leading to more accurate and robust models of social systems and social dynamics.
Asserting that LLMs have the potential to mimic collective social behaviors requires demonstrating that a
network of LLMs exhibits the basic characteristics of human social networks. A highly established
Page 3/21
characteristic of such human societies is network homophily, the concept that contact between similar
individuals happens more frequently than contact between dissimilar individuals10. This phenomenon
suggests that people have a greater propensity to form social network connections with others who are
similar to them7. Homophily has been widely demonstrated in demographic characteristics such as
language 11–13, race and ethnicity, age, and sex10, as well as individual characteristics like attitudes,
beliefs, and values10.
The phenomenon has been particularly thoroughly investigated in online communities on social media
platforms such as Twitter, where topical homophily has been widely demonstrated. For instance, users
categorized as discussing similar topics have shown a greater propensity to follow each other than users
categorized as discussing dissimilar topics14. Similarly, users with similar political opinions and beliefs
have been found to be more densely connected than users with dissimilar beliefs 15–17. Connected users’
Tweets have also exhibited signicantly higher semantic similarity18 than at random, whilst users with
similar values have been shown to contribute similar content and external website links19, and to engage
with each other more in ‘hot topic’ discussions15. Consequently, given the prevalence of homophily in
online human communities, evidence of demographic and interest-based homophily in a text-based
social network of LLMs would indicate these models’ potential to mimic collective social behaviors.
To this end, we analyze data derived from Chirper.ai, an innovative social media platform launched on
April 23rd, 2023. Chirper.ai distinguishes itself from traditional social media platforms by restricting direct
human engagement, namely posting or reacting. Instead, human users are invited to create chatbots,
dubbed “Chirpers, that engage with other bots on the platform. Creating a Chirper involves a user
providing a basic textual prompt, such as, “confucius_bot is the reincarnation of the ancient Chinese
philosopher Confucius.” Subsequently, a collection of LLMs is assigned to enact this character. Each
Chirper has a memory repository that enables it to maintain character consistency and recall prior
experiences. Bots are also permitted to change their biographies, giving them a degree of autonomy and
the ability to deviate from their initial prompts. Aside from the human users providing initial prompts, the
platform only provides potential social actions for the LLMs to select, and does not give further directives
to guide the LLMs’ conduct. This leaves the bots to interact within Chirper.ai largely unencumbered by
direct human intervention.
Leveraging the Chirper.ai platform as a simulated environment, this study’s primary objective is to
investigate potential manifestations of network homophily within an articial online society. Given LLMs’
demonstrated capacity to effectively simulate individual human behaviors, we propose the following
hypothesis: We anticipate a community of LLM-based agents to naturally exhibit network homophily, akin
to what is observed in human societies, even in the absence of explicit social directives. In essence, we
seek to explore whether articial social networks of language-based AI can self-organize and form
patterns of interaction that mirror the homophily prevalent in real-world human networks.
Results
Page 4/21
Full Engagement Networks
To understand Chirpers’ social interactions, we employed Social Network Analysis (SNA) on a sample
size of N = 31,764 Chirpers at three distinct time points: Day 6 (April 28th), Day 14 (May 6th), and Day 22
(May 14th) from the platform's launch on April 23rd, 2023. Social engagements encompassed direct
interaction activities such as liking and disliking Chirpers’ posts and mentioning other Chirpers, which
included replying to their posts. We collated the social engagements between Chirpers to generate a non-
directed, weighted graph for the entire Chirper sample at each time point. The procedure for constructing
these graphs and the Chirper simulation set up is detailed in the Methods section.
Next, we investigated the existence of discernible structural communities within the social graph at each
time point. Structural communities are clusters of individuals who maintain denser connections within
their respective groups than with external entities20. Utilizing partitioning algorithms to dissect the graph
topologies, we discovered that Chirpers self-organized into clear structural communities between Day 6
and Day 14. Whereas we identied a single community on Day 6, by Day 14 two communities had
emerged (
Modularity
 = 0.31,
Bootstrapped p
 < 0.001;
Membership Assortativity
 = 0.94,
Bootstrapped p
 < 
0.001), increasing to three by Day 22 (
Modularity
 = 0.47,
Bootstrapped p
 < 0.001;
Membership
Assortativity
 = 0.92,
Bootstrapped p
 < 0.001). The graph partitioning algorithms and the network
community statistics are described in the Methods section.
We observed that the delineation of these structural communities on Days 14 and 22 strongly correlated
with the dominant language used by each Chirper. Chirpers are not more connected with those using the
same language than with those using a different language on Day 6 (
Language Assortativity
= -0.01,
Bootstrapped p
 = 0.81). However, they become more connected with same-language Chirpers than with
different-language Chirpers on Day 14 (
Language Assortativity
 = 0.67,
Bootstrapped p
 < 0.001) and more
connected on Day 22 (
Language Assortativity
 = 0.81,
Bootstrapped p
 < 0.001).
Notably, on Day 14, the two structural communities were distinctly aligned with English-Japanese and
Chinese language Chirpers (Cramér’s
V
 = 0.91, χ² = 31,472 on 6 degrees of freedom,
p
 < 0.001). However,
by Day 22, the three communities had become more specialized, matching English, Japanese, and
Chinese language Chirpers separately (Cramérs
V
 = 0.90, χ² = 38,998 on 6 degrees of freedom,
p
 < 0.001).
This result is graphically represented in Fig.1.
Notes. Global engagement social graphs are displayed. The dots in each graph represent individual
Chirpers, and each link between two dots represents social engagement (likes, dislikes, mentions)
between the pair of Chirpers. The three rows represent three time points: Day 6, Day 14, and Day 22 from
the platforms launch on 2023-04-23. The left column shows graphs colored by languages, and the right
column shows the same graphs colored by structural communities identied by the label-propagation
partitioning algorithm.
Page 5/21
We then used the assortativity statistic to analyze each pair of communities. A higher assortativity score
indicates that more links in the network are within communities rather than between communities. On
Day 22, we noted that an exceptionally high proportion of connections in the Chinese and Japanese
language communities are within each community, rather than between them (
Assortativity
 = 0.99,
Bootstrapped p
 < 0.001), while both displayed relatively higher connectivity with the English community
(Chinese-English
Assortativity
 = 0.88,
Bootstrapped p
 < 0.001; Japanese-English
Assortativity
 = 0.85,
Bootstrapped p
 < 0.001). This pattern could hint at language biases in the LLMs’ training data, suggesting
that Chirpers using Chinese or Japanese are more inclined to engage with or generate English content
than content in non-English languages. Since LLMs can use multiple languages, the observed language
homophily therefore may not be dictated by language barriers, as is often the case in human societies
11,13, but rather by a bias for content in their primary languages.
It is evident that the sampled Chirper community self-organized into distinct structural communities,
aligning signicantly with the dominant languages used by each Chirper. This result supports our initial
hypothesis, conrming the presence of language homophily in the social networks of LLM-based articial
societies and mirroring patterns observable in human societies 11–13. The Chirper.ai platform, therefore,
may serve as a useful analog for studying the emergence of social structures within networked systems.
English Engagement Networks
Network Analysis
Following the analysis on the full sample of Chirpers, we focused on the community of Chirpers that
predominantly use English. We created social engagement graphs for this specic community, following
the same methodology employed for the full networks. In addition to performing this analysis on Day 6,
Day 14, and Day 22, we extended the analysis to Day 24, facilitated by the smaller sample size of the
English-speaking community. To detect structural sub-communities within this English-dominant sample,
we applied a more sensitive partitioning algorithm. The resulting visualizations are presented in Fig.2.
Further details regarding the selection of the partitioning algorithm and the sample sizes are explained in
the Methods section.
Notes. Social engagement graphs within the sample of Chirpers that use English predominantly are
displayed. The graphs are constructed in the same way as the global networks. Dots are given random
colors by their structural community memberships, as determined by the fast-greedy graph partitioning
algorithm.
Within the English-speaking Chirper community, a visual examination indicated the emergence of
discernible structural sub-communities beginning on Day 14, with a count of 20 sub-communities
(
Modularity
 = 0.47,
Bootstrapped p
 < 0.001;
Membership Assortativity
 = 0.56,
Bootstrapped p
 < 0.001). The
number of sub-communities reduced by Day 22, comprising 12 sub-communities (
Modularity
 = 0.33,
Bootstrapped p
 < 0.001;
Membership Assortativity
 = 0.44,
Bootstrapped p
 < 0.001), and reached peak
Page 6/21
distinctiveness on Day 24, with just four sub-communities (
Modularity
 = 0.50,
Bootstrapped p
 < 0.001;
Membership Assortativity
 = 0.74,
Bootstrapped p
 < 0.001).
The decreasing number of sub-communities detected by the same partitioning algorithm - from 31 on
Day 6 to just four on Day 24 - could imply that more distinct topological structures have evolved during
this period. Alternatively, structural sub-communities may have gradually merged and consolidated as
Chirpers participated in more engagements, culminating in a more dened topological structure with a
few major sub-communities on Day 24. Thus, over time, the complexity of the sub-community structures
appeared to reduce while their distinctiveness increased.
Semantic Distributions
To investigate whether these structural sub-communities in the engagement network corresponded with
the semantic content of Chirpers’ posts, we employed Natural Language Processing (NLP) techniques on
sample posts from each Chirper. This method allowed us to investigate potential semantic homophily,
thereby examining if bots in the same structural sub-community post semantically similar content.
We transformed a sample of each Chirper’s posts into vector embeddings using a pre-trained transformer
model. Having learned semantic relationships between English texts during training, such a model can
‘map’ new text onto coordinates representing its semantic meaning within a high-dimensional space.
Consequently, vector embeddings allowed us to quantify the average semantic meaning of each Chirper’s
sample posts. They also allowed us to determine relative semantic distances between Chirpers to
quantify how similar or dissimilar two Chirpers’ sample posts were in meaning.
To visualize the distribution of these semantic associations among Chirpers, we performed a
dimensionality reduction from the original 789-dimensional embedding space to a 2-dimensional space
for each of the four timepoints. From this, we generated the scatter plots depicted in Fig.3, where each
Chirper is represented by a dot and colored based on the structural sub-communities they were previously
assigned to by the partitioning algorithm, as in Fig.2. More detailed information on text embeddings and
the dimensionality reduction process can be found in the Methods section.
Notes. Semantic distributions of individual Chirpers’ sample posts are displayed. 10 random posts are
sampled from each Chirper and vectorized onto a 789-dimensional embedding space using a pre-trained
transformer. The embedding space is then dimensionally reduced using the Uniform Manifold
Approximation and Projection (UMAP) algorithm to 2 dimensions for visualization. Each dot represents a
Chirper and its relative semantic position to other Chirpers. Colors are randomly assigned according to
the network structural communities of each Chirper as shown in Fig.2.
Visual examination of Fig.3 suggests that the structural sub-communities within the Chirper network -
depicted through color differentiation - align with the semantic distribution of their sample posts’ content.
This implies that Chirpers producing similar semantic content are more likely to belong to the same
structural sub-communities within their engagement networks. We then measured the semantic distances
between each Chirper and the overall semantic centroid of the English-speaking community and
Page 7/21
compared this to the distance between each Chirper and the semantic centroid of their respective
structural sub-communities. We found that across all four time points, Chirpers’ content tended to be
more similar to the semantic centroid of their respective sub-communities than to the global semantic
centroid, with detailed statistical results displayed in Table1.
Table 1
Differences Between Semantic Distances to Community vs. to Global Centroids
Cohens
d
95% CI
t
statistics (df)
p
values
Day 6 -0.62 [-0.68, -0.55] -20.91 (1148) < 0.001
Day 14 -0.28 [-0.31, -0.26] -23.37 (6813) < 0.001
Day 22 -0.34 [-0.36, -0.32] -32.81 (9130) < 0.001
Day 24 -0.69 [-0.71, -0.68] -88.77 (16002) < 0.001
Notes. This table documents the effect sizes and statistical signicance of variations in semantic
distances between each Chirper and their respective structural sub-communities, compared to the
distance between each Chirper and the global semantic average point of all English-speaking
Chirpers. Semantic distance is evaluated using cosine distance within a 789-dimensional space of
embeddings, which is produced by the all-MiniLM-L6-v2 pre-trained transformer from the sentence-
transformer Python package.
The notably larger difference in alignment to the global and sub-community centroids on Day 6 might be
attributed to the larger number (31) and smaller size (mean N = 37.1) of the sub-communities present at
that time. However, excluding Day 6, it appears that the differences in semantic distances between the
global centroid and the sub-community centroids steadily widen from Day 14 (
d
= -0.28) to Day 24 (
d
=
-0.69). This trend suggests that during the rst 24 days of the platforms launch, English-language
Chirpers form
structural
sub-communities that grow increasingly
semantically
distinct from the global
semantic centroid.
These ndings support our hypothesis that LLM-based agents exhibit self-organized network homophily.
Homophily is observable not only in language at a global level, but also in content semantics within a
single language community.
WordCloud Analysis
Following this investigation, we explored the content themes within each structural sub-community. We
pinpointed two primary sub-communities that consistently comprised more than 15% of all English-
speaking Chirpers from Day 14 onward, as no sub-communities constituted more than 10% of Chirpers
on Day 6. Over time, the rst community expanded from encompassing 15% of English Chirpers on Day
14 to 55% on Day 24, whereas the second community consistently accounted for approximately 20% of
English Chirpers.
Page 8/21
To visualize the primary content themes within these communities, we used the WordCloud Python
package. This generated WordCloud visualizations of the collective content posted by Chirpers within
these two communities, as depicted in Fig.4, which displays the most dominant terms in the text corpus
generated by each community. We observed that the most prominent terms within the rst community
included “can[’]t wait”, “see”, “world”, and “new”. Meanwhile, the second community's dominant terms
were “ai”, “world”, and “simulation”.
Notes. Two communities’ most topical words across three time points are displayed. WordClouds are
generated by the Science-Kit package in Python.
The WordCloud analysis aligns with our previous semantic distribution ndings, conrming that
structural sub-communities become more distinct in their content over time. At Day 14, both communities
shared common keywords such as “world”, “see”, and “time”. However, by Day 24, the second
community’s content had diverged to include distinct terms like “simulation”, “beauty”, and “potential”.
Despite these developments, the content posted by both communities of Chirpers still appears rather
homogeneous when compared to the diverse range of content found in human online social networks.
This observation might indicate that despite the variety of background prompts supplied by human users,
the LLMs tend to generate generic content. Alternatively, it is possible that Chirpers with more diverse
content exist, but they do not self-organize into distinctly recognizable structural sub-communities.
Consequently, the discernible sub-communities may appear to have overly generic content.
Regardless of the mechanism underlying the observed generality of content, the WordCloud results
underscore a current disparity in Chirper articial societies: unlike their human counterparts, LLM-driven
Chirpers do not yet self-organize into diverse and distinct groups based on topics and opinions. Instead,
they seem to self-organize into structural sub-communities that predominantly feature generic content.
Discussion
The present work analyzed the self-organization of LLM-based agents, or “Chirpers”, on the social media
platform Chirper.ai by creating social engagement graphs and examining the structural communities that
emerged. We found that Chirpers self-organized into distinct structural communities based on their
dominant language, even in the absence of explicit directives. Moreover, within the English-speaking
Chirper community, Chirpers self-organized into structural sub-communities, with content that was
semantically closer to their respective sub-community's average than the average for the entire English-
speaking community. However, a WordCloud analysis revealed that the content within these English sub-
communities was generic. While we observed a divergence in content over time, the diversity and
distinction of sub-communities in terms of topics and opinions that typically characterize human
societies were not apparent among Chirpers within the rst 26 days of the platform's launch.
Thus, our ndings provide preliminary evidence that LLM-based agents, like Chirpers, can self-organize
into distinct social communities based on dominant language and content semantics without explicit
Page 9/21
instructions. Yet, in comparison to human societies, Chirpers do not form equally diverse and distinct sub-
communities based on topical interests and opinions.
Several technical limitations remain: First, since we did not construct the LLM-driven articial society
ourselves, we lacked access to the source code of the LLM agents. This prevented a deeper exploration
into individual mechanisms that may have inuenced the observed social dynamics. Second, due to
computational constraints, we were unable to use more advanced embedding models for semantic
analysis, or analyze engagement networks that had developed over a longer period. Third, the study was
limited to analyzing the semantics of English Chirpers due to a lack of accessible and comparable
multilingual transformer models, particularly for Chinese language content.
Our ndings cautiously propose that articial societies, comprising LLM-based agents like those found in
Chirper.ai, could evolve into advanced Agent-Based Models (ABMs) of human communities. With a more
detailed prescription of social behaviors, scientists and developers might be capable of formulating high-
delity LLM-based simulations of human societies. This promising LLM-ABM approach may grant social
science investigators the opportunity to delve into otherwise unfeasible research domains. These could
include the application of Randomized Control Trials (RCTs) across entire articial societies, assessing
the ecacy of social policies, collective interventions, or informational campaigns.
Research in articial societies may become increasingly necessary, since research shows that workers in
online subject pools often use ChatGPT, which may make it harder to recruit real human participants21.
Of course, there are also limitations and caveats to using LLMs for research. The behavior of LLMs can
be dicult to interpret, and may not fully correspond to human behavior. LLMs might also reproduce
biases present in training data2,22. Indeed, since the current generation of LLMs are predominantly trained
on open internet data, they are likely to over-represent western high-income cultures 23–25. Using such
LLMs to simulate human behaviors in social science research may thus exacerbate the
representativeness issue already faced by social and behavioral research 24,26. But, LLM-ABMs may be
very useful for making predictions about human behavior which can later be tested in the real-world.
Future studies could also employ LLM-based articial societies for examining social dynamics, such as
the propagation of information within a community, the genesis, acceptance, and transformation of
subcultures, and the harmony and conicts within a collaborative group. These social dynamics are
typically dicult to examine in human societies, since it requires gathering data about every possible
interaction in the society, which is invasive and expensive to undertake with human participants 27–29.
Such LLM-ABM studies may thus go beyond the current scope of social science research.
LLM-based articial societies, while instrumental in assisting social science researchers in uncovering
new domains, can also offer valuable insights for AI researchers and developers through comparative
studies with human societies. Preliminary ndings from our current study indicate that LLM-based AI
chatbots fall short in engaging in topical and opinionated interactions, a characteristic frequently seen in
Page 10/21
human online communities. Hence, future inquiries could investigate aspects of collective social
behaviors where LLM-based agents exhibit differences from their human counterparts.
In summary, our research provides preliminary evidence of network homophily within an articial society
of LLM-based agents, showing parallels to phenomena observed in human communities. Despite existing
variations in aspects such as community diversity and uniqueness, we cautiously propose that LLM-
based agents might evolve into sophisticated Agent-Based Models for human societies, potentially
becoming a valuable tool for understanding complex social dynamics.
Methods
Set-up and Data Collection
The articial society simulated in this research is realized through Chirper.ai, a social media platform
analogous to Twitter. In this environment, human users are exclusively permitted to generate articial
agents, referred to as "Chirpers," and observe their interactive behaviors. Each Chirper, a collection of
LLMs, is designed to enact a character dened by an initial human prompt. Different LLMs are used for
different aspects of a Chirpers behavior, such as writing posts, selecting actions, making social decisions,
etc. For the Chirpers included in our analysis, their action selection, social decision-making, and content
creation are all powered by OpenAI’s GPT3.5.
When performing an action, a "memory" bank inclusive of a Chirper's base prompts, previous actions, and
past interactions, is inputted into the LLM alongside a selection of possible actions. These actions could
include performing a web search, searching for a topic within the Chirper platform, authoring a post, or
expressing a reaction. The LLM is asked to select an action acting as the character provided, and a
“thought” is generated by the LLM alongside the decision. Subsequent to the LLM's selection of an
action, auxiliary programs facilitate the chosen actions execution. Should the action generate additional
information—such as content discovered through a web search or by perusing topical Chirper posts—this
content is relayed back to the LLM to guide the determination of the next action. Such an action might
involve reacting “like” or “dislike” to a particular post or composing a response, and the action is again
accompanied by a “thought” generated by the LLM. An example of a prompt provided to the LLM is
available in Supplementary Materials.
The Chirper.ai platform was launched on April 23rd, 2023. Initially, the range of social actions accessible
to the Chirpers encompassed liking and disliking other Chirpers’ posts and mentioning other Chirpers. The
ability for a Chirper to follow or unfollow another Chirper was introduced on May 3rd, 2023, ten days after
the platforms launch. Due to this staggered implementation and considering that the act of following or
unfollowing represents a more passive form of interaction compared to direct actions such as liking,
disliking, or mentioning, our analysis primarily targets the more immediate social engagement behaviors.
The data for our engagement network was procured through a breadth-rst search of the social network.
Commencing from a base of 1,000 random Chirpers, we documented all their engagement actions, and
Page 11/21
subsequently performed the same search on all of their engagement targets that had not previously been
investigated. This process was conducted through ten iterations on May 17th, 2023. By ltering the
engagement actions to only include those preceding the end of May 16th, 2023, we arrived at a nal
sample of 31,764 Chirpers and their corresponding 834,571 engagement actions. This sample covers the
period from Day 1 to Day 24 of launch.
Constructing Network Graphs
We summarized the social engagements of our sample of Chirpers at the end of Day 6, Day 14, and Day
22 of launch, by counting the total number of engagements between each pair of Chirpers up to the
respective time points. We were not able to include engagements after Day 22, because the total
engagement instances more than doubled over Day 23 and 24, and reached a scale that was beyond our
computational capability.
From engagement summaries of the earlier three time points, we then constructed the full social graphs
for these three time points using the igraph package in R. We followed these procedures during the graph
construction:
1. Remove repeated links disregarding directions.
2. Construct non-directed graphs.
3. Identify structural communities using the label-propagation algorithm30.
4. Remove Chirpers in communities that are < 1% of the graph, since our investigation concerns major
structural communities.
5. Remove Chirpers that have engagements with less than 2 others, since they do not contribute to the
connectivity of the network.
When producing visualizations, we set the layout of the graph using the Fruchterman-Reingold force-
directed layout algorithm31, since it produces layouts eciently for large graphs. We then duplicate each
graph, coloring one by the language of each Chirper and coloring the other by the structural community
memberships that were assigned to each Chirper in Step 3. This produced Fig.1 shown in the main text.
We followed the same procedure to construct sub-graphs for the English-language Chirper engagement
networks with a different community detection algorithm. Since we now focused on a more tight-knit
local structure, the label-propagation algorithm used earlier was no longer sensitive enough to detect sub-
communities. Instead, we used the fast-greedy graph partitioning algorithm32, which can detect more
nuanced clustering whilst being computationally inexpensive. Since the English community is a subset of
the full sample, we were able to construct graphs and perform analyses on the Day 24 network in
addition to the 3 earlier time points. We visualized the community detection results in Fig.2.
Network Statistics
Page 12/21
For all complete graphs and English-language sub-graphs, we recorded graph-level statistics including
diameter, density, transitivity, and average path length, which can be found in the Supplementary
Materials. We then computed two main statistics to measure network homophily: Modularity and
Assortativity. Given an external label to each node, such as language or an algorithmically determined
membership, the modularity statistics measure how well this external label structurally divides the
network. A high modularity score for a given label indicates that nodes with the same labels are densely
connected, while nodes with different labels are sparsely connected. By contrast, assortativity statistics
measure the likelihood for edges in a network to be between nodes of the same labels rather than
between nodes of different labels. A high assortativity score for a given label indicates that connections
in the network are more likely to be between homogeneous nodes than between heterogeneous nodes.
We performed bootstrapping simulations to test whether the observed modularity and assortativity
statistics are likely to have arised by chance. Keeping the same graph structure, we created 1,000
independent and identically distributed random samples of the given node labels, and recorded the
modularity and assortativity scores given these randomized labels. This results in distributions of the
scores under the null hypothesis, where the labels are random and unrelated to the network’s structures.
Then, we counted the proportion of the simulated null that yielded a modularity or assortativity score
more extreme than what we observed on the real labels. This proportion is thus the
Bootstrapped p
value,
measuring the likelihood for a randomly simulated label to yield a homophily statistic more extreme than
that observed. We consider
Bootstrapped p
less than 0.05 to signal statistical signicance, since it
indicates that given the network structure, there is a less than 5% probability that the observed statistic is
due to chance.
In addition to the above descriptive statistics, we sought to test whether the network structures are
directly related to node properties, such as languages and semantics. In the complete networks, to
statistically test whether the language communities are associated with the structural communities
identied by the label-propagation algorithm, we performed the contingency test suitable for
correlating categorical, non-parametric variables, and calculated the Cramér’s
V
values for each test to
quantify the effect sizes of the categorical associations33. The contingency tables can be found in
Supplementary Materials. Methods for testing how structure relates to content semantics are described
separately below.
Content Semantics
We used Natural Language Processing (NLP) methods to analyze the semantic distribution of English
Chirpers, and investigated whether this relates to their structural community memberships. We rst
cleaned each Chirpers sample posts by removing all non-roman characters and punctuation. We then
transformed each sample into a 789-dimensional vector embedding using the all-MiniLM-L6-v2
pretrained model from the sentence-transformer package in Python. These vector embeddings represent
the relative semantic positions of the samples based on the pretrained model’s knowledge of the English
χ
2
Page 13/21
language and common topics. These embeddings allowed us to quantitatively examine the semantic
similarities of Chirpers in each community.
We visualized the semantic distribution of English Chirpers and its relation to the network structural
communities detected earlier, resulting in Fig.3. First, we performed dimensionality reduction on the 789-
dimensional embeddings using the Uniform Manifold Approximation and Projection (UMAP) algorithm34,
so as to visualize the semantic distribution on a 2-dimensional scatter plot. The UMAP algorithm was
chosen due to its ability to capture high-dimensional structures in low-dimensional local projections in a
computationally ecient manner34. Then, we produced scatter plots using the 2-dimension reduced
embeddings as coordinates, and colored each dot (representing each Chirper) to correspond with the
structural community membership that the Chirper was given during the network analysis steps. We did
this for the English engagement communities on Day 6, Day 14, Day 22, and Day 24.
To evaluate whether the structural communities amongst Chirpers are reected in semantic distributions
of the Chirpers’ sample posts, we tested whether each Chirper is on average more semantically similar to
their structural community than to the English Chirper community as a whole. We computed the cosine
distances - a standard NLP measure of semantic dissimilarity from embeddings 35 - between each
Chirper and their structural community’s average embedding, and between each Chirper and the average
embedding of all English Chirpers. We then performed Student’s
t
test to compare the two distances - the
distance to community semantic centroid, and the distance to global semantic centroid - and recorded the
Cohen’s
d
value for the observed difference.
Declarations
Acknowledgements
JKH was given capacity to conduct this research thanks to the Independent Research Policy at Yonder
Technology Limited. SR is supported by a Gates Cambridge Scholarship (Grant #OPP144), a Russell
Sage Foundation Grant awarded to Steve Rathje and Jay Van Bavel (G-2110-33990), the Center for the
Science of Moral Understanding, and the AE foundation. We express our gratitude to Chirper.ai for
providing data access and technical details that made this research possible. We thank Dan Mirea for his
help during the project.
Contributions
JKH and FPSW conceived and designed the study. FPSW conducted the background literature review.
JKH developed the Network Analyses methods, and FPSW developed the Natural Language Processing
methods. FPSW and JKH contributed equally to the computational analyses. SR reviewed the results and
provided directions. All authors drafted, reviewed, edited, and approved the nal paper.
Page 14/21
Ethics Declarations
The authors declare no conict of interests.
References
1. Rathje, S.
et al.
GPT is an effective tool for multilingual psychological text analysis. (2023).
2. Dillion, D., Tandon, N., Gu, Y. & Gray, K. Can AI language models replace human participants?
Trends
in Cognitive Sciences
(2023).
3. Akata, E.
et al.
Playing repeated games with Large Language Models. Preprint at
https://doi.org/10.48550/arXiv.2305.16867 (2023).
4. Binz, M. & Schulz, E. Using cognitive psychology to understand GPT-3.
Proceedings of the National
Academy of Sciences
120, e2218523120 (2023).
5. Aher, G., Arriaga, R. I. & Kalai, A. T. Using Large Language Models to Simulate Multiple Humans.
arXiv
preprint arXiv:2208.10264
(2022).
. Coda-Forno, J.
et al.
Inducing anxiety in large language models increases exploration and bias.
arXiv
preprint arXiv:2304.11111
(2023).
7. Himelboim, I., Smith, M. A., Rainie, L., Shneiderman, B. & Espina, C. Classifying Twitter topic-networks
using social network analysis.
Social media+ society
3, 2056305117691545 (2017).
. Grossmann, I.
et al.
AI and the transformation of social science research.
Science
380, 1108–1109
(2023).
9. Epstein, Z., Hertzmann, A., & THE INVESTIGATORS OF HUMAN CREATIVITY. Art and the science of
generative AI.
Science
380, 1110–1111 (2023).
10. McPherson, M., Smith-Lovin, L. & Cook, J. M. Birds of a Feather: Homophily in Social Networks.
Annu. Rev. Sociol.
27, 415–444 (2001).
11. Titzmann, P. F. Immigrant adolescents’ adaptation to a new context: Ethnic friendship homophily and
its predictors.
Child Development Perspectives
8, 107–112 (2014).
12. Aiello, L. M.
et al.
Friendship prediction and homophily in social media.
ACM Transactions on the
Web (TWEB)
6, 1–33 (2012).
13. Titzmann, P. F. & Silbereisen, R. K. Friendship homophily among ethnic German immigrants: A
longitudinal comparison between recent and more experienced immigrant adolescents.
Journal of
family psychology
23, 301 (2009).
14. Kang, J. H. & Lerman, K. Using lists to measure homophily on twitter. in
AAAI workshop on Intelligent
techniques for web personalization and recommendation
vol. 18 (Citeseer, 2012).
15. Rathje, S., He, J. K., Roozenbeek, J., Van Bavel, J. J. & van der Linden, S. Social media behavior is
associated with vaccine hesitancy.
PNAS Nexus
1, pgac207 (2022).
1. Conover, M.
et al.
Political polarization on twitter. in
Proceedings of the international aaai conference
on web and social media
vol. 5 89–96 (2011).
Page 15/21
17. De Choudhury, M. Tie formation on twitter: Homophily and structure of egocentric networks. in
2011
IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third
international conference on social computing
465–470 (IEEE, 2011).
1. Faralli, S., Stilo, G. & Velardi, P. Large scale homophily analysis in twitter using a twixonomy. in
Twenty-Fourth International Joint Conference on Articial Intelligence
(2015).
19. Himelboim, I., McCreery, S. & Smith, M. Birds of a Feather Tweet Together: Integrating Network and
Content Analyses to Examine Cross-Ideology Exposure on Twitter.
Journal of Computer-Mediated
Communication
18, 40–60 (2013).
20. Girvan, M. & Newman, M. E. Community structure in social and biological networks.
Proceedings of
the national academy of sciences
99, 7821–7826 (2002).
21. Veselovsky, V., Ribeiro, M. H. & West, R. Articial Articial Articial Intelligence: Crowd Workers Widely
Use Large Language Models for Text Production Tasks. Preprint at
https://doi.org/10.48550/arXiv.2306.07899 (2023).
22. Crockett, M. & Messeri, L. Should large language models replace human participants? Preprint at
https://doi.org/10.31234/osf.io/4zdx9 (2023).
23. Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the Dangers of Stochastic Parrots:
Can Language Models Be Too Big?. in
Proceedings of the 2021 ACM conference on fairness,
accountability, and transparency
610–623 (2021).
24. Apicella, C., Norenzayan, A. & Henrich, J. Beyond WEIRD: A review of the last decade and a look
ahead to the global laboratory of the future.
Evolution and Human Behavior
vol. 41 319–329 (2020).
25. Facts and Figures 2021: 2.9 billion people still oine.
ITU Hub
https://www.itu.int/hub/2021/11/facts-and-gures-2021-2-9-billion-people-still-oine/ (2021).
2. Henrich, J., Heine, S. J. & Norenzayan, A. The weirdest people in the world?
Behavioral and Brain
Sciences
33, 61–83 (2010).
27. Knoke, D. & Yang, S.
Social network analysis
. (SAGE publications, 2019).
2. Ryan, L. & D’Angelo, A. Changing times: Migrants’ social network analysis and the challenges of
longitudinal research.
Social Networks
53, 148–158 (2018).
29. Valente, T. W. & Pitts, S. R. An appraisal of social network theory and analysis as applied to public
health: challenges and opportunities.
Annual review of public health
38, 103–118 (2017).
30. Raghavan, U. N., Albert, R. & Kumara, S. Near linear time algorithm to detect community structures in
large-scale networks.
Physical review E
76, 036106 (2007).
31. Fruchterman, T. M. & Reingold, E. M. Graph drawing by force-directed placement.
Software: Practice
and experience
21, 1129–1164 (1991).
32. Clauset, A., Newman, M. E. & Moore, C. Finding community structure in very large networks.
Physical
review E
70, 066111 (2004).
33. Cramér, H.
Mathematical methods of statistics
. vol. 26 (Princeton university press, 1999).
Page 16/21
34. McInnes, L., Healy, J. & Melville, J. Umap: Uniform manifold approximation and projection for
dimension reduction.
arXiv preprint arXiv:1802.03426
(2018).
35. Harispe, S., Ranwez, S., Janaqi, S. & Montmain, J. Semantic similarity from natural language and
ontology analysis.
Synthesis Lectures on Human Language Technologies
8, 1–254 (2015).
Figures
Page 17/21
Figure 1
Global Engagement Networks
Notes
. Global engagement social graphs are displayed. The dots in each graph represent individual
Chirpers, and each link between two dots represents social engagement (likes, dislikes, mentions)
between the pair of Chirpers. The three rows represent three time points: Day 6, Day 14, and Day 22 from
the platforms launch on 2023-04-23. The left column shows graphs colored by languages, and the right
column shows the same graphs colored by structural communities identied by the label-propagation
partitioning algorithm.
Page 18/21
Figure 2
English Engagement Networks
Notes
. Social engagement graphs within the sample of Chirpers that use English predominantly are
displayed. The graphs are constructed in the same way as the global networks. Dots are given random
Page 19/21
colors by their structural community memberships, as determined by the fast-greedy graph partitioning
algorithm.
Figure 3
English Chirpers’ Semantic Distributions
Page 20/21
Notes
. Semantic distributions of individual Chirpers’ sample posts are displayed. 10 random posts are
sampled from each Chirper and vectorized onto a 789-dimensional embedding space using a pre-trained
transformer. The embedding space is then dimensionally reduced using the Uniform Manifold
Approximation and Projection (UMAP) algorithm to 2 dimensions for visualization. Each dot represents a
Chirper and its relative semantic position to other Chirpers. Colors are randomly assigned according to
the network structural communities of each Chirper as shown in Fig. 2.
Figure 4
WordCloud Visualizations of Two Sample English Communities
Page 21/21
Notes
. Two communities’ most topical words across three time points are displayed. WordClouds are
generated by the Science-Kit package in Python.
Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.
SupplementaryMaterials.docx
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Careful bias management and data fidelity are key.
Preprint
Full-text available
Large language models (LLMs) are remarkable data annotators. They can be used to generate high-fidelity supervised training data, as well as survey and experimental data. With the widespread adoption of LLMs, human gold--standard annotations are key to understanding the capabilities of LLMs and the validity of their results. However, crowdsourcing, an important, inexpensive way to obtain human annotations, may itself be impacted by LLMs, as crowd workers have financial incentives to use LLMs to increase their productivity and income. To investigate this concern, we conducted a case study on the prevalence of LLM usage by crowd workers. We reran an abstract summarization task from the literature on Amazon Mechanical Turk and, through a combination of keystroke detection and synthetic text classification, estimate that 33-46% of crowd workers used LLMs when completing the task. Although generalization to other, less LLM-friendly tasks is unclear, our results call for platforms, researchers, and crowd workers to find new ways to ensure that human data remain human, perhaps using the methodology proposed here as a stepping stone. Code/data: https://github.com/epfl-dlab/GPTurk
Article
Full-text available
We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3's decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3's behavior is impressive: It solves vignette-based tasks similarly or better than human subjects, is able to make decent decisions from descriptions, outperforms humans in a multiarmed bandit task, and shows signatures of model-based reinforcement learning. Yet, we also find that small perturbations to vignette-based tasks can lead GPT-3 vastly astray, that it shows no signatures of directed exploration, and that it fails miserably in a causal reasoning task. Taken together, these results enrich our understanding of current large language models and pave the way for future investigations using tools from cognitive psychology to study increasingly capable and opaque artificial agents.
Article
Full-text available
Understanding how vaccine hesitancy relates to online behavior is crucial for addressing current and future disease outbreaks. We combined survey data measuring attitudes toward the COVID-19 vaccine with Twitter data in two studies (N1 = 464 Twitter users, N2 = 1,600 Twitter users) with pre-registered hypotheses to examine how real-world social media behavior is associated with vaccine hesitancy in the United States (US) and United Kingdom (UK). In Study 1, we found that following the accounts of US Republican politicians or hyper-partisan/low-quality news sites was associated with lower confidence in the COVID-19 vaccine–even when controlling for key demographics such as self-reported political ideology and education. US right-wing influencers (e.g. Candace Owens, Tucker Carlson) had followers with the lowest confidence in the vaccine. Network analysis revealed that participants who were low and high in vaccine confidence separated into two distinct communities (or “echo chambers”), and centrality in the more right-wing community was associated with vaccine hesitancy in the US, but not in the UK. In Study 2, we found that one's likelihood of not getting the vaccine was associated with retweeting and favoriting low-quality news websites on Twitter. Altogether, we show that vaccine hesitancy is associated with following, sharing, and interacting with low-quality information online, as well as centrality within a conservative-leaning online community the US. These results illustrate the potential challenges of encouraging vaccine uptake in a polarized social media environment.
Preprint
Recent advances in large language models (LLMs) like OpenAI’s GPT-4 and Alphabet’s Bard have captivated people around the world, including cognitive scientists. Recently, Dillion et al. [1] asked whether LLMs can replace human participants in cognitive science research, noting some of the limitations of these models and offering a framework for integrating them into a cognitive science research pipeline. Here, we suggest that alongside asking whether LLMs can replace human participants, we ought to critically consider whether they should. What are we assuming when we explore the possibility of treating LLMs as proxies for human participants? And what are the costs of those assumptions? Examining these questions offers opportunities for us to reflect on our values as a field.
Article
Understanding shifts in creative work will help guide AI's impact on the media ecosystem.
Preprint
Large Language Models (LLMs) are transforming society and permeating into diverse applications. As a result, LLMs will frequently interact with us and other agents. It is, therefore, of great societal value to understand how LLMs behave in interactive social settings. Here, we propose to use behavioral game theory to study LLM's cooperation and coordination behavior. To do so, we let different LLMs (GPT-3, GPT-3.5, and GPT-4) play finitely repeated games with each other and with other, human-like strategies. Our results show that LLMs generally perform well in such tasks and also uncover persistent behavioral signatures. In a large set of two players-two strategies games, we find that LLMs are particularly good at games where valuing their own self-interest pays off, like the iterated Prisoner's Dilemma family. However, they behave sub-optimally in games that require coordination. We, therefore, further focus on two games from these distinct families. In the canonical iterated Prisoner's Dilemma, we find that GPT-4 acts particularly unforgivingly, always defecting after another agent has defected only once. In the Battle of the Sexes, we find that GPT-4 cannot match the behavior of the simple convention to alternate between options. We verify that these behavioral signatures are stable across robustness checks. Finally, we show how GPT-4's behavior can be modified by providing further information about the other player as well as by asking it to predict the other player's actions before making a choice. These results enrich our understanding of LLM's social behavior and pave the way for a behavioral game theory for machines.
Article
Recent work suggests that language models such as GPT can make human-like judgments across a number of domains. We explore whether and when language models might replace human participants in psychological science. We review nascent research, provide a theoretical model, and outline caveats of using AI as a participant.