Ufuk Topcu’s research while affiliated with University of Texas at Austin and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (604)


Figure 1: A depiction of our framework. (Cyan) We distill information about the dynamics of a prior data set into a goal-conditioned value estimator˜Vestimator˜ estimator˜V g . (Red) An expert provides demonstrations for a new target task. (Purple) The framework combines them to construct a potential function Φ(s) (8), implicitly estimating the number of steps needed to reach the task-specific goal from state s. For this estimate, Φ j (s) first measures (7) the number of steps needed to reach the j-th demonstration τ j viã V g , and then follow it to the goal via V j d (6). (Green) We use the overall potential to synthesize dense dynamics-aware rewards for the target task.
Figure 4: Potential heatmap for the push slot with single reset grid. The agent constructs the V j d (s j t ) + ˜ V g (s; s t j ) in (7) and (8) from a fixed state s (green) and different demonstrated states (purple to yellow). The state chosen by the maximizations in (8) when constructing Φ is marked in pink.
Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations
  • Preprint
  • File available

December 2024

·

2 Reads

Cevahir Koprulu

·

Po-han Li

·

Tianyu Qiu

·

[...]

·

Ufuk Topcu

Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.

Download

How Media Competition Fuels the Spread of Misinformation

November 2024

·

5 Reads

Competition among news sources may encourage some sources to share fake news and misinformation to influence the public. While sharing misinformation may lead to a short-term gain in audience engagement, it may damage the reputation of these sources, resulting in a loss of audience. To understand the rationale behind sharing misinformation, we model the competition as a zero-sum sequential game, where each news source influences individuals based on its credibility-how trustworthy the public perceives it-and the individual's opinion and susceptibility. In this game, news sources can decide whether to share factual information to enhance their credibility or disseminate misinformation for greater immediate attention at the cost of losing credibility. We employ the quantal response equilibrium concept, which accounts for the bounded rationality of human decision-making, allowing for imperfect or probabilistic choices. Our analysis shows that the resulting equilibria for this game reproduce the credibility-bias distribution observed in real-world news sources, with hyper-partisan sources more likely to spread misinformation than centrist ones. It further illustrates that disseminating misinformation can polarize the public. Notably, our model reveals that when one player increases misinformation dissemination, the other player is likely to follow, exacerbating the spread of misinformation. We conclude by discussing potential strategies to mitigate the spread of fake news and promote a more factual and reliable information landscape.


Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction

November 2024

·

1 Read

Autonomous agents perceive and interpret their surroundings by integrating multimodal inputs, such as vision, audio, and LiDAR. These perceptual modalities support retrieval tasks, such as place recognition in robotics. However, current multimodal retrieval systems encounter difficulties when parts of the data are missing due to sensor failures or inaccessibility, such as silent videos or LiDAR scans lacking RGB information. We propose Any2Any-a novel retrieval framework that addresses scenarios where both query and reference instances have incomplete modalities. Unlike previous methods limited to the imputation of two modalities, Any2Any handles any number of modalities without training generative models. It calculates pairwise similarities with cross-modal encoders and employs a two-stage calibration process with conformal prediction to align the similarities. Any2Any enables effective retrieval across multimodal datasets, e.g., text-LiDAR and text-time series. It achieves a Recall@5 of 35% on the KITTI dataset, which is on par with baseline models with complete modalities.


Figure 7: Probability of specification compliance during fine-tuning (color band shows standard deviation (SD)).
Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

November 2024

·

16 Reads

Multimodal foundation models offer a promising framework for robotic perception and planning by processing sensory inputs to generate actionable plans. However, addressing uncertainty in both perception (sensory interpretation) and decision-making (plan generation) remains a critical challenge for ensuring task reliability. We present a comprehensive framework to disentangle, quantify, and mitigate these two forms of uncertainty. We first introduce a framework for uncertainty disentanglement, isolating perception uncertainty arising from limitations in visual understanding and decision uncertainty relating to the robustness of generated plans. To quantify each type of uncertainty, we propose methods tailored to the unique properties of perception and decision-making: we use conformal prediction to calibrate perception uncertainty and introduce Formal-Methods-Driven Prediction (FMDP) to quantify decision uncertainty, leveraging formal verification techniques for theoretical guarantees. Building on this quantification, we implement two targeted intervention mechanisms: an active sensing process that dynamically re-observes high-uncertainty scenes to enhance visual input quality and an automated refinement procedure that fine-tunes the model on high-certainty data, improving its capability to meet task specifications. Empirical validation in real-world and simulated robotic tasks demonstrates that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines. These improvements are attributed to the combined effect of both interventions and highlight the importance of uncertainty disentanglement which facilitates targeted interventions that enhance the robustness and reliability of autonomous systems.



Human-Agent Coordination in Games under Incomplete Information via Multi-Step Intent

October 2024

·

3 Reads

Strategic coordination between autonomous agents and human partners under incomplete information can be modeled as turn-based cooperative games. We extend a turn-based game under incomplete information, the shared-control game, to allow players to take multiple actions per turn rather than a single action. The extension enables the use of multi-step intent, which we hypothesize will improve performance in long-horizon tasks. To synthesize cooperative policies for the agent in this extended game, we propose an approach featuring a memory module for a running probabilistic belief of the environment dynamics and an online planning algorithm called IntentMCTS. This algorithm strategically selects the next action by leveraging any communicated multi-step intent via reward augmentation while considering the current belief. Agent-to-agent simulations in the Gnomes at Night testbed demonstrate that IntentMCTS requires fewer steps and control switches than baseline methods. A human-agent user study corroborates these findings, showing an 18.52% higher success rate compared to the heuristic baseline and a 5.56% improvement over the single-step prior work. Participants also report lower cognitive load, frustration, and higher satisfaction with the IntentMCTS agent partner.


Figure 1: (a) Snapshot of an episode from the navigation game example. (b-c): Regularized and standard Nash equilibrium policy matrices, denoted asˆíasˆ asˆí µí±ƒ í µí±¡ in Eq. (14) and í µí±ƒ í µí±¡ in Eq. (13). A policy matrix í µí±ƒ í µí±¡ = [í µí±ƒ 1⊤ í µí±¡ , . . . , í µí±ƒ í µí± ⊤ í µí±¡ ] ⊤ , aggregated over players, maps all players' states to joint controls: í µí±¢ í µí±¡ = −í µí±ƒ í µí±¡ í µí±¥ í µí±¡ − í µí»¼ í µí±¡ . Each policy matrix is divided into í µí± × í µí± blocks.
Figure 3: Snapshot of the proposed approach for an 8-player navigation game and corresponding policy matrices.
Figure 4: (a) A three-player formation game: player 1 tracks a reference trajectory shown in grey while players 2 and 3 maintain a formation with respect to player 1. (b) Convergence of regularized policies with different í µí¼†.
Figure 6: Formation game trajectories under noise level 520 with different regularization levels; left to right: í µí¼† = 0, 1.5, 5 (optimal), 15.
Policies with Sparse Inter-Agent Dependencies in Dynamic Games: A Dynamic Programming Approach

October 2024

·

19 Reads

Common feedback strategies in multi-agent dynamic games require all players' state information to compute control strategies. However, in real-world scenarios, sensing and communication limitations between agents make full state feedback expensive or impractical, and such strategies can become fragile when state information from other agents is inaccurate. To this end, we propose a regularized dynamic programming approach for finding sparse feedback policies that selectively depend on the states of a subset of agents in dynamic games. The proposed approach solves convex adaptive group Lasso problems to compute sparse policies approximating Nash equilibrium solutions. We prove the regularized solutions' asymptotic convergence to a neighborhood of Nash equilibrium policies in linear-quadratic (LQ) games. We extend the proposed approach to general non-LQ games via an iterative algorithm. Empirical results in multi-robot interaction scenarios show that the proposed approach effectively computes feedback policies with varying sparsity levels. When agents have noisy observations of other agents' states, simulation results indicate that the proposed regularized policies consistently achieve lower costs than standard Nash equilibrium policies by up to 77% for all interacting agents whose costs are coupled with other agents' states.


Figure 1: An example of the Goedendag game: The example shows two players í µí±1, í µí±2 and five states í µí±ž1, ..., í µí±ž5. The game board comprises four hexagons; the blue and red hexagons indicate where í µí±1 and í µí±2's pieces are located. The solid arrows are actions the player takes, and the dashed arrows are state transitions. The action with the same color triggers each transition. í µí±ž5 is a terminated state of drawing, neither of them wins.
Figure 5: Cross entropy loss at every epoch during the finetuning procedure. The area and action agents converge to a lower loss compared to the end-to-end agent, indicating a potential better performance.
Reasoning, Memorization, and Fine-Tuning Language Models for Non-Cooperative Games

October 2024

·

7 Reads

We develop a method that integrates the tree of thoughts and multi-agent framework to enhance the capability of pre-trained language models in solving complex, unfamiliar games. The method decomposes game-solving into four incremental tasks -- game summarization, area selection, action extraction, and action validation -- each assigned to a specific language-model agent. By constructing a tree of thoughts, the method simulates reasoning paths and allows agents to collaboratively distill game representations and tactics, mitigating the limitations of language models in reasoning and long-term memorization. Additionally, an automated fine-tuning process further optimizes the agents' performance by ranking query-response pairs based on game outcomes, e.g., winning or losing. We apply the method to a non-cooperative game and demonstrate a 65 percent winning rate against benchmark algorithms, with an additional 10 percent improvement after fine-tuning. In contrast to existing deep learning algorithms for game solving that require millions of training samples, the proposed method consumes approximately 1000 training samples, highlighting its efficiency and scalability.


Rules to convert abstract syntax trees to FSA-based representations. The keyword_processor handles these con- versions. The keywords that define the grammar are in bold.
Joint Verification and Refinement of Language Models for Safety-Constrained Planning

October 2024

·

4 Reads

Although pre-trained language models can generate executable plans (e.g., programmatic policies) for solving robot tasks, the generated plans may violate task-relevant logical specifications due to the models' black-box nature. A significant gap remains between the language models' outputs and verifiable executions of plans. We develop a method to generate executable plans and formally verify them against task-relevant safety specifications. Given a high-level task description in natural language, the proposed method queries a language model to generate plans in the form of executable robot programs. It then converts the generated plan into an automaton-based representation, allowing formal verification of the automaton against the specifications. We prove that given a set of verified plans, the composition of these plans also satisfies the safety specifications. This proof ensures the safety of complex, multi-component plans, obviating the computation complexity of verifying the composed plan. We then propose an automated fine-tuning process that refines the language model to generate specification-compliant plans without the need for human labeling. The empirical results show a 30 percent improvement in the probability of generating plans that meet task specifications after fine-tuning.


Reduce, Reuse, Recycle: Categories for Compositional Reinforcement Learning

October 2024

·

4 Reads

In reinforcement learning, conducting task composition by forming cohesive, executable sequences from multiple tasks remains challenging. However, the ability to (de)compose tasks is a linchpin in developing robotic systems capable of learning complex behaviors. Yet, compositional reinforcement learning is beset with difficulties, including the high dimensionality of the problem space, scarcity of rewards, and absence of system robustness after task composition. To surmount these challenges, we view task composition through the prism of category theory—a mathematical discipline exploring structures and their compositional relationships. The categorical properties of Markov decision processes untangle complex tasks into manageable sub-tasks, allowing for strategical reduction of dimensionality, facilitating more tractable reward structures, and bolstering system robustness. Experimental results support the categorical theory of reinforcement learning by enabling skill reduction, reuse, and recycling when learning complex robotic arm tasks.


Citations (30)


... While FNOs and DeepONet operate in general Banach spaces, the Basis-to-Basis (B2B) approach [82] leverages the geometric properties of Hilbert spaces to generate more interpretable and generalizable operators. By exploiting the inner product structure, spectral theory, and optimization geometry of Hilbert spaces, B2B learns basis functions for both input and output spaces, and then maps between their coefficients. ...

Reference:

Machine Learning Aided Modeling of Granular Materials: A Review
Basis-to-Basis Operator Learning Using Function Encoders

... In the past decade, researchers have increasingly focused on studying misinformation, particularly its production 4, 7, 8 , dissemination 1,5,9,10 , detection [11][12][13][14][15] , and countering [16][17][18][19][20] . However, the potential relationship between news source competition on public opinion and the production of misinformation remains poorly understood. ...

Control of Misinformation with Safety and Engagement Guarantees
  • Citing Conference Paper
  • July 2024

... Uα(u) was introduced initially in the context of network congestion control by Mo and Walrand (2000). This utility function is widely used in communication networks for resource allocation (Shakkottai, Srikant, et al. 2008) and, more recently, in applications like air traffic management (Yu et al. 2023). When α = 0, Uα(u) reduces to the sum of the individual utilities, i.e., 1⩽i⩽n u i , that is referred to as the "utilitarian" criteria (Xinying Chen and Hooker 2023). ...

Alpha-Fair Routing in Urban Air Mobility with Risk-Aware Constraints
  • Citing Conference Paper
  • July 2024

... In the past decade, researchers have increasingly focused on studying misinformation, particularly its production 4, 7, 8 , dissemination 1,5,9,10 , detection [11][12][13][14][15] , and countering [16][17][18][19][20] . However, the potential relationship between news source competition on public opinion and the production of misinformation remains poorly understood. ...

Optimization-Based Countering of Misinformation on Social Networks
  • Citing Conference Paper
  • July 2024

... Direct communication offers a solution by enabling the exchange of relevant information be-tween players. Recent research has explored the use of large language models to interpret intentions and predict actions through natural language, enhancing alignment and coordination within the human-AI team (Guan et al. 2023;Chen, Fried, and Topcu 2024;Liu et al. 2024). However, the effectiveness of such approaches is constrained by the limitations of the language module and the high latency associated with API calls during the inference process (Liang et al. 2023). ...

Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication
  • Citing Conference Paper
  • August 2024

... MM-Gaussian [30] develop a relocalization module that is designed to correct the system's trajectory in the event of localization failures. Meanwhile, MM3DGS-SLAM [31] uses a visual-inertial framework to optimize camera poses. It addresses the scale ambiguity of monocular depth estimates. ...

MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements

... However, identifying all the possible local GNEs is required to enable effective coordination. Therefore, computing multiple local GNEs is critical to aligning players' preferences over different equilibria [7] or helping coordinate players' actions via recommendation [8]. Furthermore, different local GNEs correspond to different modes of interactions in multi-agent interactive settings. ...

Coordination in Noncooperative Multiplayer Matrix Games via Reduced Rank Correlated Equilibria

IEEE Control Systems Letters

... The learning of RMs from noisy labels has only been previously considered by Verginis et al. (2024). Their work focuses on optimizing the noisy labelling function to model the perfect labelling function after thresholding, enabling the use of existing algorithms for RM learning. ...

Joint Learning of Reward Machines and Policies in Environments with Partially Known Semantics
  • Citing Article
  • May 2024

... Deceptive policy synthesis via KL divergence minimization admits a convex formulation with dimension polynomial in the MDP size. One may also formulate similar KL divergence minimization problems for partially observable agent dynamics [19], continuous dynamics [29], and stochastic games [20]. Our work contrasts [18,19,29] with the addition of multiple supervised agents. ...

Identity concealment games: How I learned to stop revealing and love the coincidences
  • Citing Article
  • March 2024

Automatica

... Additionally, our methodology has only been tested in a single racing car scenario. To better understand its effectiveness, exploring its performance in a real racing setting with multiple cars and common maneuvers such as overtaking would be valuable, similar to multi-vehicle approaches presented in [31], [32]. However, investigating such scenarios is beyond the scope of this work and remains a potential avenue for future research. ...

Hierarchical Control for Cooperative Teams in Competitive Autonomous Racing
  • Citing Article
  • May 2024

IEEE Transactions on Intelligent Vehicles