Content uploaded by Molood Arman
Author content
All content in this area was uploaded by Molood Arman on May 21, 2025
Content may be subject to copyright.
MAPS: Modeling Co-Existing Subjective
Perspectives and Shared Meaning in Multi-Agent
Cognitive Dialogue
Molood Arman
arman.molood@gmail.com
Cl´
ement Bonnafous
cbonnafo31@guardiaschool.fr
Preprint – This is a preliminary version. A formal submission is in
progress.
Abstract
Human dialogue involves more than exchanging information—it expresses be-
liefs, emotions, and subjective cognitive styles. Yet current AI dialogue systems
often enforce semantic uniformity, sacrificing diversity and interpretability. We
present MAPS (Multi-Agent Perspective Spaces), a novel framework that mod-
els dialogue between cognitively distinct agents through domain-weighted profiles,
dynamic GRU-based memory, and interpretable token-level attention. MAPS en-
ables agents to maintain individualized reasoning while progressively converging
on shared meaning. Evaluations on EmpatheticDialogues,TopicalChat, and Mul-
tiWOZ show that MAPS supports semantic alignment without collapsing subjec-
tivity. Our results demonstrate a path toward cognitively grounded, interpretable
dialogue systems that balance expressiveness and coherence.
Keywords: multi-agent dialogue, subjective reasoning, interpretability, shared mean-
ing, cognitive AI
1 Introduction
Dialogue is more than information exchange — it is a delicate process of negotiating
meaning between diverse subjective worlds. Human conversations are shaped not only
by the need to converge on shared understanding, but also by individual beliefs, emo-
tional tones, cognitive biases, and interpretative stances. These subjective elements
enrich dialogue, fostering empathy, creativity, and interpretability. Yet, most current
AI dialogue systems treat language generation as a purely semantic task—optimizing
for coherence and accuracy while suppressing the cognitive heterogeneity that defines
real-world communication.
1
Existing approaches to dialogue modeling typically fall into two broad paradigms.
On one side, large-scale neural dialogue systems generate fluent responses, but treat
agents as stateless transformers of input, devoid of stable internal perspectives or reflec-
tive processes. On the other, symbolic and modular systems offer greater transparency
and control, but struggle with open-domain flexibility and generalization. Critically,
neither approach effectively addresses the dynamic balance between subjective real-
ization and shared meaning — a balance that is essential for human-like conversational
intelligence.
In this work, we introduce MAPS (Multi-Agent Perspective Spaces), a novel
framework for modeling co-existing subjective perspectives within collaborative di-
alogue. Rather than enforcing strict semantic convergence or collapsing agent indi-
viduality, MAPS allows cognitively diverse agents to retain distinct internal profiles
while interacting in structured exchanges aimed at mutual understanding. Inspired by
philosophical traditions of deliberative dialogue and cognitive theories of distributed
reasoning, MAPS operationalizes subjectivity through three key mechanisms: domain-
weighted perspective adapters,dynamic GRU-based memory modules, and token-level
attention visualizations.
These components collectively enable MAPS agents to reason, adapt, and con-
tribute based on individualized cognitive profiles while remaining aligned to a shared
semantic space. The result is a cognitively interpretable framework in which internal
reasoning pathways—such as domain influence and attention salience—can be ana-
lyzed and visualized, providing transparent insights into agent decision-making.
We validate MAPS through extensive experiments across three domains: emotion-
ally nuanced conversations (EmpatheticDialogues), open-domain discussions (Topi-
calChat), and task-oriented interactions (MultiWOZ). Results show that MAPS agents
achieve semantic convergence without sacrificing individuality, producing responses
that are both effective and human-like in their diversity. We also conduct ablation stud-
ies and cross-domain evaluations to demonstrate the robustness and generality of our
approach.
Contributions. Our key contributions are as follows:
• We propose MAPS, a cognitively grounded multi-agent dialogue framework that
enables subjective reasoning diversity and shared meaning construction through
domain-weighted profiles, dynamic memory, and perspective-conditioned gen-
eration.
• We introduce an interpretability mechanism that makes visible the cognitive pro-
cesses of dialogue agents, including domain influence, token salience, and sub-
jective divergence.
• We conduct comprehensive evaluations across emotionally expressive, open-
domain, and goal-driven dialogues, demonstrating that MAPS supports semantic
coherence without collapsing agent individuality.
By modeling agents as cognitively aware and socially situated participants—capable
of acting, reflecting, and negotiating meaning—MAPS bridges the gap between sym-
bolic reasoning and large-scale neural generation, offering a new path toward human-
aligned, interpretable dialogue systems.
2
2 Related Work
Interpretability in Dialogue Models. Interpretability in dialogue models has been
approached through latent variable modeling and explicit dialogue planning. Hudeˇ
cek
and Duˇ
sek [5] proposed a variational RNN that learns discrete latent dialogue acts,
yielding interpretable action codes such as ask price or confirm. Similarly, Dialogue
Distillery [1] distills large language models into symbolic flow graphs, combining neu-
ral flexibility with explicit plan representations. While effective, these approaches often
constrain open-endedness to achieve interpretability.
Memory-Augmented and Cognitive Architectures. Explicit memory systems
and cognitive-inspired modules have been proposed to enhance dialogue reasoning
transparency. Hou et al. (2024) [4] integrated human-like memory hierarchies into
LLMs, while Sumers et al. (2024) [10] introduced CoALA, a blueprint featuring mod-
ular memory and internal planners. Personality-based theory-of-mind approaches [12]
further improved adaptivity in negotiation tasks by inferring interlocutor traits. How-
ever, these methods typically focus on single-agent reasoning.
Empathy and Cognitive Reasoning. Emotion-aware dialogue models aim to ex-
plain empathetic responses via intermediate reasoning stages. Commonsense-augmented
models like CEM [9] infer user intent and emotions, whereas chain-of-thought reason-
ing frameworks [14] explicitly model user emotional causes and support strategies.
While interpretable, these systems mainly address empathy generation within single-
agent paradigms.
Multi-Agent Dialogue Systems. Research on multi-agent dialogue spans coop-
erative tasks, negotiation, and emergent communication. Early referential games [6]
demonstrated agents’ ability to develop communication protocols, though often with-
out human interpretability. CAMEL [7] and Solo Performance Prompting [11] ex-
plored role-based collaborative LLM dialogues, showing improved task-solving capa-
bilities. Competitive dialogue agents, such as CICERO (Meta AI, 2022) [2], modelled
strategic negotiation but often lacked transparent reasoning traces. Existing multi-agent
systems thus focus either on cooperation or competition, with limited attention to per-
sistent agent individuality and subjective reasoning.
Summary. While prior work has introduced valuable approaches for interpretable
single-agent and multi-agent dialogue, most systems either constrain flexibility or ne-
glect cognitive individuality among agents. In the following section, we introduce
MAPS, a multi-agent framework that advances cognitive interpretability by modeling
stable agent personalities, domain-weighted subjectivity, and Socratic critique to sim-
ulate nuanced, human-like collaborative reasoning.
3 MAPS: Model and Methods
MAPS (Multi-Agent Perspective Spaces) models dialogue as a collaborative reasoning
process among multiple cognitively diverse agents. Each agent maintains a subjective
perspective while contributing to the construction of a shared understanding. In this
section, we describe the key components of the model.
3
3.1 Shared Semantic Space
To facilitate semantic alignment, MAPS aggregates agent-specific embeddings into a
unified representation. Given per-agent token embeddings E(i)
tat dialogue turn tfor
each agent i= 1, . . . , N , the shared semantic embedding is computed as:
Eshared
t=1
N
N
X
i=1
E(i)
t.
This shared space serves as the common ground, providing a reference point for
agents to align while retaining individual perspectives.
3.2 Domain-Conditioned Perspective Adapter
While Eshared
tcaptures collective meaning, each agent conditions its private realization
based on its domain expertise and personality traits. Each agent iis associated with
a domain weight vector w(i)∈[0,1]nencoding its relevance and priority across n
cognitive or task domains. The agent’s subjective interpretation is then produced via:
E(i)
private,t = MLPEshared
t⊕w(i),
where ⊕denotes vector concatenation and MLP is a learnable multilayer percep-
tron. This mechanism introduces agent-specific biases into the otherwise shared repre-
sentation, enabling subjective realization.
3.3 Dynamic Memory via GRU
To enable temporal reasoning and contextual awareness, each agent maintains a dy-
namic memory state. The private representation E(i)
private,t is integrated into the agent’s
recurrent state using a gated recurrent unit (GRU):
h(i)
t= GRUE(i)
private,t, h(i)
t−1,
where h(i)
t−1is the agent’s memory from the previous turn. This state encodes the
agent’s evolving perspective and serves as the basis for generating responses and fur-
ther reasoning.
3.4 Token-Level Self-Attention
Within each agent, token-level self-attention is applied to E(i)
private,t to model intra-
agent salience and focus. The resulting attention weights highlight which tokens the
agent considers most relevant during reasoning and generation. This mechanism pro-
vides interpretable insights into the internal decision-making pathways of each agent,
enabling analysis of how individual words and concepts influence the final output.
4
3.5 Interpretability by Design
Each module in MAPS contributes to cognitive interpretability: the shared semantic
space reflects collective meaning, the perspective adapter models subjective biases,
the memory module captures agent histories, and self-attention reveals token-level
salience. Together, they enable MAPS to produce dialogues that are both pragmati-
cally coherent and reflective of diverse internal viewpoints.
4 Experimental Setup
4.1 Dataset
We conduct experiments using the EmpatheticDialogues corpus [8]1, a benchmark
dataset containing emotionally rich and subjective conversational exchanges. This
dataset was selected to evaluate MAPS’s ability to model individualized agent per-
spectives in the presence of complex emotional and contextual cues.
4.2 Agent Profiles
To simulate cognitive and affective diversity, we define two distinct agent profiles with
complementary characteristics:
•Spiritual / High Emotion: Emphasizes emotional, existential, and empathetic
dimensions, reflecting a subjective and affect-oriented reasoning style.
•Rational / Low Emotion: Prioritizes logical and confidence-related dimensions,
embodying a detached and analytical reasoning stance.
These profiles are instantiated through predefined domain weight vectors applied
in the perspective adapter module, encoding each agent’s subjective emphasis across
cognitive domains.
4.3 Model Architecture
MAPS integrates multiple modules to support subjective realization and cognitive in-
terpretability:
•Encoder: Sentence-BERT (all-MiniLM-L6-v2; 384-dimensional) is used to pro-
duce token-level and utterance-level embeddings for each dialogue turn, provid-
ing a compact yet semantically rich representation.
•Perspective Adapter: A two-layer multilayer perceptron (MLP) receives con-
catenated shared representations and agent-specific domain weights, transform-
ing them into private, perspective-conditioned embeddings.
1https://huggingface.co/datasets/empathetic_dialogues
5
•Memory Module: Each agent maintains a dynamic internal state using a one-
layer gated recurrent unit (GRU; hidden size 384) to capture temporal dependen-
cies and evolving subjective viewpoints.
•Response Generator: FLAN-T5 Large, a state-of-the-art language model, is
used to generate natural language responses conditioned on the agent’s private
representation and contextual cues. The generator is guided using few-shot
prompting to align output style with agent profiles.
4.4 Training Procedure
During training, domain weights for each agent are optimized using the Adam opti-
mizer (learning rate 0.05) over 20 epochs. The training objective minimizes inter-agent
embedding distance while preserving profile-driven subjectivity, ensuring agents con-
verge semantically without collapsing individuality.
4.5 Interpretability and Visualization
In addition to textual outputs, MAPS produces interpretable intermediate signals. Token-
level self-attention visualizations reveal which parts of the input influenced each agent’s
reasoning, while domain influence plots illustrate how cognitive profiles shape private
representations. These analyses offer insights into the balance between semantic con-
vergence and subjective realization achieved by each agent during dialogue generation.
5 Evaluation and Results
5.1 Evaluation Metrics
To assess MAPS’s ability to balance semantic alignment and subjective individuality,
we employ three complementary metrics:
•Semantic Bias: The Euclidean distance ∥h(1)
t−h(2)
t∥between agent hidden
states, measured per dialogue turn and averaged across epochs. Lower values
indicate stronger semantic convergence.
•Distinct-2: A lexical diversity metric measuring the ratio of unique bigrams to
total bigrams in generated responses. Higher scores reflect more varied and less
templated responses.
•Subjectivity Score: The variance between private and shared representations,
computed as 1
NPi∥E(i)
private −Eshared∥2. Higher values indicate greater sub-
jective realization per agent.
6
5.2 Semantic Alignment vs. Subjectivity
As illustrated in Figure 1, agents progressively align their semantic representations
throughout training. Semantic bias decreases across all dialogues, suggesting that
MAPS facilitates convergence despite agents maintaining distinct cognitive profiles.
Figure 1: Semantic bias reduction across dialogues. Agents gradually align while pre-
serving subjective individuality.
Quantitative results in Table 1 further confirm this dynamic. Notably, while seman-
tic bias remains low at convergence, subjectivity scores remain relatively high, indi-
cating that agents continue to maintain distinct realization styles even after achieving
semantic coherence.
Table 1: Quantitative results across selected dialogues.
Dialogue Semantic Bias (↓) Distinct-2 (↑) Subjectivity (↑)
1 0.074 0.12 0.82
5 0.084 0.31 0.91
5.3 Visual Interpretability: Domain and Token Analysis
To better understand how agents realize their subjective interpretations, we visualize
domain influence and token-level attention.
Domain Influence. Figure 2 demonstrates that even when responses are aligned,
agents emphasize different cognitive dimensions. For example, Agent 1 (Spiritual /
Low Emotion) assigns higher relevance to existential and empathetic domains, while
Agent 2 (Rational / High Emotion) prioritizes logical and confidence-related dimen-
sions.
Token Attention. Complementary analysis of token-level attention (Figure 3)
reveals how agents focus on different linguistic cues. While Agent 1 tends to em-
7
Dialogue 1 – Agent 1 Domain Influence Dialogue 1 – Agent 2 Domain Influence
Dialogue 5 – Agent 1 Domain Influence Dialogue 5 – Agent 2 Domain Influence
Figure 2: Domain-influence patterns: Agent 1 emphasizes logical/existential domains,
while Agent 2 highlights emotional/empathetic dimensions.
phasize emotionally charged or existential terms, Agent 2 often attends to factual or
clarification-related tokens, highlighting differences in reasoning pathways.
5.4 Qualitative Case Analysis
Dialogue 1. Both agents produce similar responses (”She is gone”), indicating se-
mantic convergence. However, domain and token analysis reveal differing emphasis:
Agent 1 focuses on existential loss, while Agent 2 emphasizes acknowledgment and
factuality.
Dialogue 5. In ambiguous contexts, agent behaviors diverge more distinctly. Agent 1
(Spiritual / Low Emotion) expresses empathetic interpretation (”I can understand that
you are feeling this way”), whereas Agent 2 (Rational / High Emotion) issues more
detached, analytical responses (”I’m not sure what you mean”). This reflects sustained
subjective divergence, aligned with each agent’s cognitive profile.
5.5 Additional Experiments: TopicalChat and MultiWOZ
To evaluate the generalization of MAPS across dialogue domains with varying degrees
of complexity and subjectivity, we extended our experiments to include two additional
datasets:
8
Dialogue 1 – Agent 1 Token Attention Dialogue 1 – Agent 2 Token Attention
Dialogue 5 – Agent 1 Token Attention Dialogue 5 – Agent 2 Token Attention
Figure 3: Token-level attention heatmaps reveal divergent internal pathways despite
surface alignment.
•TopicalChat [3]: A knowledge-grounded dialogue dataset with casual conver-
sations about popular topics. TopicalChat offers dialogues with moderately sub-
jective interpretations and conversational variety.
•MultiWOZ v2.2 [13]: A large multi-domain task-oriented dialogue dataset fea-
turing complex service-based scenarios. MultiWOZ provides more constrained,
goal-driven dialogues that often admit less subjective variation in responses.
The same MAPS configuration, architecture, and agent profiles were employed
without task-specific tuning. We selected 10 dialogues per dataset and followed iden-
tical evaluation protocols, including tracking semantic bias, response diversity, and
relevance.
9
5.6 6.6 Results on TopicalChat and MultiWOZ
Semantic Convergence. In both datasets, agents demonstrated progressive reduc-
tion in semantic bias over multi-agent exchanges. Notably, convergence was slower
in TopicalChat than in MultiWOZ—reflecting the higher interpretative flexibility of
open-domain conversations.
Response Diversity. Figure 4 plots final semantic bias against response diversity. A
weak inverse relationship is observed. Some dialogues maintain high diversity even
under low bias, highlighting MAPS’s capacity to preserve subjectivity while aligning
meaning.
Figure 4: Relationship between final semantic bias and response diversity across dia-
logues (MultiWOZ). Each point represents one dialogue.
Qualitative Analysis. In deterministic, goal-oriented settings such as MultiWOZ,
agent responses tend to be more uniform (e.g., “Yes, please book it”), yet still pre-
serve subtle subjective framing (e.g., polite vs. factual confirmations). In contrast,
TopicalChat prompts more interpretative divergence—agents express opinions or nu-
anced emotional stances (e.g., “That must be very frustrating” vs. “It’s difficult, but
manageable”).
Summary. These findings confirm MAPS’s flexibility in adapting to both constrained
and open-ended dialogue settings. In goal-driven domains (e.g., MultiWOZ), agents
converge pragmatically while maintaining a minimal level of subjective individuality.
In contrast, in more conversationally open domains (e.g., TopicalChat), MAPS encour-
ages more expressive, cognitively diverse agent behaviors.
10
5.7 Overall Summary
Taken together, these results indicate that MAPS agents are capable of achieving se-
mantic coherence without collapsing into cognitive uniformity. Domain-weighted adapters
and dynamic GRU memories enable each agent to maintain a unique subjective view-
point, while token-level attention and semantic bias tracking provide interpretable win-
dows into internal reasoning. These combined capabilities validate MAPS as a cogni-
tively interpretable dialogue framework that balances shared meaning with individual
realization.
6 Benchmark Evaluation
We further conducted a benchmark evaluation comparing our main MAPS algorithm
with two baseline variants to assess the contribution of the dynamic GRU memory
module and the perspective adapters.
6.1 Benchmark Evaluation with Ablation Studies
To rigorously evaluate the contributions of different components of MAPS, we con-
ducted a systematic benchmark evaluation involving ablation studies. We compared
the full MAPS algorithm against two variants:
•MAPS without GRU: Removes the dynamic memory module, relying solely on
the perspective adapter.
•Full Shallow Model: Removes both the dynamic memory (GRU) and the Per-
spective Adapter. Agents share a unified semantic representation without any
subjective realization.
We evaluated these models on the MultiWOZ dataset, using metrics for Bias (se-
mantic alignment), Diversity, and Relevance.
Table 2: Benchmark results comparison across model variants (MultiWOZ)
Full MAPS MAPS without GRU Full Shallow Model
Dialogue Bias Diversity Relevance Bias Diversity Relevance Diversity Relevance
1 0.082 0.000 0.128 0.002 0.000 0.082 0.907 0.062
2 0.088 0.000 0.235 0.002 0.000 0.227 0.000 0.227
3 0.083 0.000 0.138 0.002 0.000 0.179 0.382 0.158
4 0.084 0.460 0.443 0.002 0.000 0.632 0.000 0.632
5 0.084 0.000 0.838 0.002 0.000 0.838 0.000 0.838
6 0.088 0.365 0.539 - - - - -
7 0.083 0.390 0.241 - - - - -
8 0.084 0.349 0.220 - - - - -
9 0.082 0.447 0.138 - - - - -
10 0.090 0.000 0.309 - - - - -
Avg. 0.085 0.201 0.323 0.002 0.000 0.392 0.258 0.383
11
6.2 Benchmark Analysis and Discussion
Table 2 summarizes the results of our ablation experiments. We observe several notable
patterns:
•Semantic Bias (Alignment): The Full MAPS model demonstrates moderate
bias, reflecting its capacity to balance semantic coherence with subjective diver-
sity. The shallow model shows nearly no bias but sacrifices nuanced differentia-
tion, while removing GRU drastically reduces bias but loses temporal coherence.
•Diversity: The Full MAPS model achieves significant diversity in selected di-
alogues, indicating that it successfully preserves distinct cognitive perspectives.
The shallow model showed sporadic and inconsistent diversity scores, suggest-
ing limited ability to capture subjective differences explicitly.
•Relevance: The Full MAPS generally achieves higher relevance scores com-
pared to both ablations, highlighting the importance of dynamic memory and
subjective perspective adapters in maintaining pragmatic effectiveness.
These benchmark comparisons reinforce the value of both the GRU dynamic mem-
ory and the perspective adapter modules within MAPS, emphasizing their roles in
achieving semantic coherence, subjective realization, and dialogue relevance.
6.3 Visual Benchmark Analysis
Figure 5 illustrates the relative performance of model variants across the three metrics,
emphasizing the critical role played by the complete MAPS architecture in managing
trade-offs between coherence and subjectivity.
Figure 5: Benchmark evaluation comparing MAPS with No-GRU and Full Shallow
models across dialogues from MultiWOZ dataset.
7 Discussion
Our experiments demonstrate that cognitive interpretability in dialogue systems can
be substantially enhanced through multi-agent collaboration with individualized rea-
soning profiles. Unlike prior approaches relying on single-agent attention mechanisms
12
or symbolic planning, MAPS introduces structured diversity and subjectivity through
agent-specific domain conditioning and memory.
Core Contributions
•Semantic Convergence Without Uniform Cognition. MAPS enables multiple
cognitively diverse agents to collaboratively align their understanding of dia-
logue context. Our results show that semantic bias between agents decreases
during training, while private representations retain subject-specific nuance.
•Interpretable Reasoning via Domain Conditioning. By incorporating domain-
weighted profiles for each agent, MAPS makes individual contributions inter-
pretable at a higher cognitive level. Instead of relying solely on token-level atten-
tion, decisions are attributed to reasoning styles grounded in emotional, logical,
or existential dimensions.
•Balanced Diversity and Relevance. MAPS supports both subjective variation
and pragmatic effectiveness. Our benchmark evaluation shows that removing key
modules (memory or perspective adaptation) reduces either relevance or diver-
sity, confirming that the full architecture is necessary for balanced performance.
Extended Multi-Agent Evaluation
In addition to the initial 2-agent experiments on the EmpatheticDialogues dataset, we
evaluated MAPS in more complex scenarios using TopicalChat and MultiWOZ with
4 agents. This extension allowed us to assess the scalability of MAPS in both open-
domain and goal-oriented dialogue settings.
We found that increasing the number of agents preserved the semantic stability of
the shared representation while enriching the diversity of perspectives. In TopicalChat,
agents exhibited more interpretive divergence, while in MultiWOZ, responses were
more aligned yet stylistically varied. These results demonstrate that MAPS generalizes
effectively across both dataset types and agent populations.
Limitations and Future Work
While MAPS presents a promising approach to cognitively interpretable dialogue, sev-
eral limitations remain:
•Manual Domain Profiles. Current agent profiles are manually defined. Future
work should explore learning domain weights automatically from data or context
to enable dynamic personality adaptation.
•Scaling Challenges. As the number of agents increases, dialogue outputs be-
come harder to summarize and evaluate. Structured summarization or graph-
based dialogue visualization methods may help interpret multi-agent dynamics
at scale.
13
•Short-Term Memory. While agents use GRU-based internal memory within
a session, they do not yet retain long-term conversational history. Introducing
persistent memory could support richer inter-agent relationships and evolving
behavior over time.
•Evaluation of Subjectivity. Metrics such as diversity and bias provide useful
proxies, but subjective realization remains difficult to quantify. Human evalua-
tions focusing on coherence, interpretability, and perceived personality expres-
sion could complement automatic metrics.
Conclusion
Our findings validate MAPS as a cognitively grounded framework that supports both
interpretability and meaningful agent individuality. By extending the architecture to
multiple datasets and scaling the number of agents, we demonstrated its flexibility
across conversational contexts. Future extensions may include adaptive personality
learning, multimodal settings, or integration with human collaborators for real-world
deployment.
8 Conclusion and Future Work
We presented MAPS (Multi-Agent Perspective Spaces), a cognitively grounded dia-
logue framework that models co-existing subjective perspectives in multi-agent con-
versational settings. By integrating domain-weighted adapters, dynamic memory, and
token-level interpretability, MAPS enables agents to express individualized reasoning
styles while collaboratively constructing shared meaning. Unlike traditional dialogue
systems that enforce semantic uniformity, MAPS supports cognitively diverse agents
capable of transparent, interpretable, and pragmatically aligned communication.
Our results across EmpatheticDialogues,TopicalChat, and MultiWOZ demonstrate
the framework’s adaptability to both emotionally expressive and task-oriented domains.
In 2-agent settings, MAPS achieves semantic convergence without collapsing individ-
ual perspectives. In extended 4-agent simulations, the system maintains coherent di-
alogue while capturing richer, multi-dimensional viewpoints — highlighting its scala-
bility and flexibility.
Future Work
Building on these findings, we identify several key directions for advancing MAPS:
•Learning Agent Profiles from Data. Current domain weights are manually
defined. Future work will explore learning agent personality traits and cognitive
emphases directly from dialogue corpora using unsupervised or reinforcement-
based methods.
•Dynamic Adaptation of Subjectivity. Enabling agents to adapt their reasoning
profiles over time—either in response to evolving dialogue context or through
interaction history—could enhance realism and long-term engagement.
14
•Multimodal and Situated Interaction. Extending MAPS to process non-verbal
cues (e.g., gesture, tone, gaze) could support richer, embodied dialogue systems
suitable for virtual agents, assistive robotics, or social simulation.
•Human-AI Collaborative Scenarios. Incorporating human participants along-
side agents would allow for evaluation of MAPS in real-world collaborative con-
texts, such as decision-making, tutoring, or co-creative tasks.
•Interpretable Multi-Agent Planning. Integrating MAPS with symbolic reason-
ing modules or goal-oriented planners could yield interpretable agent coalitions
that reason about plans, commitments, and shared goals from diverse viewpoints.
Closing Remarks
By embracing subjective diversity rather than suppressing it, MAPS offers a novel
paradigm for cognitively interpretable dialogue. The framework provides a founda-
tion for future systems where meaning is not only exchanged but actively negoti-
ated—reflecting the richness of human communication. Our vision is to develop MAPS
into a socially aware and transparently reasoned multi-agent architecture capable of op-
erating in complex, collaborative, and human-centric environments.
Acknowledgements
We gratefully acknowledge the creators of the EmpatheticDialogues,TopicalChat, and
MultiWOZ datasets for making their work publicly available. We also thank the open-
source community—particularly the contributors to HuggingFace, PyTorch, and the
SentenceTransformers ecosystem—for providing essential tools and infrastructure that
enabled this research. Finally, we appreciate the broader research community’s ongo-
ing efforts toward transparent, interpretable, and socially aligned AI systems.
References
[1] Ed H. Chi et al. Chirpy cardinal: Dialogue distillery—crafting interpolable, in-
terpretable, and introspectable dialogue from llms. In Proceedings of the Alexa
Prize SocialBot Grand Challenge 5, 2023.
[2] Meta Fundamental AI Research Diplomacy Team (FAIR)†, Anton Bakhtin,
Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, An-
drew Goff, Jonathan Gray, Hengyuan Hu, et al. Human-level play in the game
of diplomacy by combining language models with strategic reasoning. Science,
378(6624):1067–1074, 2022.
[3] Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi,
Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, and Dilek Hakkani-Tur.
Topical-chat: Towards knowledge-grounded open-domain conversations. arXiv
preprint arXiv:2308.11995, 2023.
15
[4] Yuki Hou, Haruki Tamoto, and Homei Miyashita. ”my agent understands me
better”: Integrating dynamic human-like memory recall and consolidation in llm-
based agents. In Extended Abstracts of the CHI Conference on Human Factors in
Computing Systems (CHI EA ’24), Honolulu, HI, USA, 2024. ACM.
[5] Vojtˇ
ech Hudeˇ
cek and Ondˇ
rej Duˇ
sek. Learning interpretable latent dialogue ac-
tions with less supervision. In Proceedings of the 2nd Conference of the Asia-
Pacific Chapter of the Association for Computational Linguistics and the 12th In-
ternational Joint Conference on Natural Language Processing (Volume 1: Long
Papers), pages 297–308, Online, November 2022. Association for Computational
Linguistics.
[6] Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, and Stephen Clark. Emer-
gence of linguistic communication from referential games with symbolic and
pixel input. arXiv preprint arXiv:1804.03984, 2018.
[7] Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin,
and Bernard Ghanem. CAMEL: Communicative agents for “mind” exploration
of large language model society. In Advances in Neural Information Processing
Systems 36 (NeurIPS 2023), 2023.
[8] Hannah Rashkin, Eric Michael Smith, Margaret Li, and Y-Lan Boureau. Towards
empathetic open-domain conversation models: A new benchmark and dataset. In
Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics, pages 5370–5381, 2019.
[9] Sahand Sabour, Chujie Zheng, and Minlie Huang. CEM: Commonsense-aware
empathetic response generation. In Proceedings of the 37th AAAI Conference on
Artificial Intelligence (AAAI), Washington, DC, 2023. AAAI Press.
[10] Theodore R. Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas L. Griffiths.
Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427,
2023.
[11] Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, and Heng
Ji. Unleashing the emergent cognitive synergy in large language models: A task-
solving agent through multi-persona self-collaboration. In Proceedings of the
2024 Conference of the North American Chapter of the Association for Computa-
tional Linguistics: Human Language Technologies, pages 257–279, Mexico City,
Mexico, June 2024. Association for Computational Linguistics.
[12] Runzhe Yang, Jingxiao Chen, and Karthik Narasimhan. Improving dialog sys-
tems for negotiation with personality modeling. In Proceedings of the 59th
Annual Meeting of the Association for Computational Linguistics (Volume 1:
Long Papers), pages 681–693. Association for Computational Linguistics, Au-
gust 2021.
16
[13] Xiaoxue Zang, Abhinav Rastogi, Srinivas Sunkara, Raghav Gupta, Jianguo
Zhang, and Jindong Chen. Multiwoz 2.2: A dialogue dataset with ad-
ditional annotation corrections and state tracking baselines. arXiv preprint
arXiv:2007.12720, 2020.
[14] Tenggan Zhang, Xinjie Zhang, Jinming Zhao, Li Zhou, and Qin Jin. ESCoT:
Towards interpretable emotional support dialogue systems. In Proceedings of the
62nd Annual Meeting of the Association for Computational Linguistics (Volume
1: Long Papers), pages 13395–13412, Bangkok, Thailand, August 2024. Associ-
ation for Computational Linguistics.
17