Chao Qian’s research while affiliated with Nanjing University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (160)


Fig. 1: The illustrations of micro-bumping, hybrid bonding, and monolithic 3D integration technologies are presented in subfigures (a), (b), and (c), respectively. The comparison of the dimensions of the three bonding vias is shown in subfigure (d).
Fig. 3: Overview of the proposed 3D backend implementation flow. The specific open-source tool adopted for each stage is marked in red.
Fig. 7: Heat map visualization of Hier-RTLMP-2D and Open3D-Tiling on design black_parrot. While the 3D-IC architecture achieves reduced power consumption, it exhibits a higher peak temperature due to increased integration density.
Fig. 8: Runtime breakdown comparison between DREAMPlace-2D and Open3D-DMP for case bp_quad. The time spent by each component is normalized by 6336 seconds, the total runtime of DREAMPlace-2D.
Open3DBench: Open-Source Benchmark for 3D-IC Backend Implementation and PPA Evaluation
  • Preprint
  • File available

March 2025

·

14 Reads

Yunqi Shi

·

Chengrui Gao

·

Wanqi Ren

·

[...]

·

This work introduces Open3DBench, an open-source 3D-IC backend implementation benchmark built upon the OpenROAD-flow-scripts framework, enabling comprehensive evaluation of power, performance, area, and thermal metrics. Our proposed flow supports modular integration of 3D partitioning, placement, 3D routing, RC extraction, and thermal simulation, aligning with advanced 3D flows that rely on commercial tools and in-house scripts. We present two foundational 3D placement algorithms: Open3D-Tiling, which emphasizes regular macro placement, and Open3D-DMP, which enhances wirelength optimization through cross-die co-placement with analytical placer DREAMPlace. Experimental results show significant improvements in area (51.19%), wirelength (24.06%), timing (30.84%), and power (5.72%) compared to 2D flows. The results also highlight that better wirelength does not necessarily lead to PPA gain, emphasizing the need of developing PPA-driven methods. Open3DBench offers a standardized, reproducible platform for evaluating 3D EDA methods, effectively bridging the gap between open-source tools and commercial solutions in 3D-IC design.

Download

Fig. 3. Visualization of a specific critical path optimized using different distance losses. The slack of each path is given on the top of each figure.
Timing-Driven Global Placement by Efficient Critical Path Extraction

Timing optimization during the global placement of integrated circuits has been a significant focus for decades, yet it remains a complex, unresolved issue. Recent analytical methods typically use pin-level timing information to adjust net weights, which is fast and simple but neglects the path-based nature of the timing graph. The existing path-based methods, however, cannot balance the accuracy and efficiency due to the exponential growth of number of critical paths. In this work, we propose a GPU-accelerated timing-driven global placement framework, integrating accurate path-level information into the efficient DREAMPlace infrastructure. It optimizes the fine-grained pin-to-pin attraction objective and is facilitated by efficient critical path extraction. We also design a quadratic distance loss function specifically to align with the RC timing model. Experimental results demonstrate that our method significantly outperforms the current leading timing-driven placers, achieving an average improvement of 40.5% in total negative slack (TNS) and 8.3% in worst negative slack (WNS), as well as an improvement in half-perimeter wirelength (HPWL).



Stochastic Population Update Provably Needs An Archive in Evolutionary Multi-objective Optimization

January 2025

·

6 Reads

Evolutionary algorithms (EAs) have been widely applied to multi-objective optimization, due to their nature of population-based search. Population update, a key component in multi-objective EAs (MOEAs), is usually performed in a greedy, deterministic manner. However, recent studies have questioned this practice and shown that stochastic population update (SPU), which allows inferior solutions have a chance to be preserved, can help MOEAs jump out of local optima more easily. While introducing randomness in the population update process boosts the exploration of MOEAs, there is a drawback that the population may not always preserve the very best solutions found, thus entailing a large population. Intuitively, a possible solution to this issue is to introduce an archive that stores the best solutions ever found. In this paper, we theoretically show that using an archive can allow a small population and accelerate the search of SPU-based MOEAs substantially. Specifically, we analyze the expected running time of two well-established MOEAs, SMS-EMOA and NSGA-II, with SPU for solving a commonly studied bi-objective problem OneJumpZeroJump, and prove that using an archive can bring (even exponential) speedups. The comparison between SMS-EMOA and NSGA-II also suggests that the (μ+μ)(\mu+\mu) update mode may be more suitable for SPU than the (μ+1)(\mu+1) update mode. Furthermore, our derived running time bounds for using SPU alone are significantly tighter than previously known ones. Our theoretical findings are also empirically validated on a well-known practical problem, the multi-objective traveling salesperson problem. We hope this work may provide theoretical support to explore different ideas of designing algorithms in evolutionary multi-objective optimization.


Pareto Optimization with Robust Evaluation for Noisy Subset Selection

January 2025

Subset selection is a fundamental problem in combinatorial optimization, which has a wide range of applications such as influence maximization and sparse regression. The goal is to select a subset of limited size from a ground set in order to maximize a given objective function. However, the evaluation of the objective function in real-world scenarios is often noisy. Previous algorithms, including the greedy algorithm and multi-objective evolutionary algorithms POSS and PONSS, either struggle in noisy environments or consume excessive computational resources. In this paper, we focus on the noisy subset selection problem with a cardinality constraint, where the evaluation of a subset is noisy. We propose a novel approach based on Pareto Optimization with Robust Evaluation for noisy subset selection (PORE), which maximizes a robust evaluation function and minimizes the subset size simultaneously. PORE can efficiently identify well-structured solutions and handle computational resources, addressing the limitations observed in PONSS. Our experiments, conducted on real-world datasets for influence maximization and sparse regression, demonstrate that PORE significantly outperforms previous methods, including the classical greedy algorithm, POSS, and PONSS. Further validation through ablation studies confirms the effectiveness of our robust evaluation function.


Pareto Set Learning for Multi-Objective Reinforcement Learning

January 2025

·

5 Reads

Multi-objective decision-making problems have emerged in numerous real-world scenarios, such as video games, navigation and robotics. Considering the clear advantages of Reinforcement Learning (RL) in optimizing decision-making processes, researchers have delved into the development of Multi-Objective RL (MORL) methods for solving multi-objective decision problems. However, previous methods either cannot obtain the entire Pareto front, or employ only a single policy network for all the preferences over multiple objectives, which may not produce personalized solutions for each preference. To address these limitations, we propose a novel decomposition-based framework for MORL, Pareto Set Learning for MORL (PSL-MORL), that harnesses the generation capability of hypernetwork to produce the parameters of the policy network for each decomposition weight, generating relatively distinct policies for various scalarized subproblems with high efficiency. PSL-MORL is a general framework, which is compatible for any RL algorithm. The theoretical result guarantees the superiority of the model capacity of PSL-MORL and the optimality of the obtained policy network. Through extensive experiments on diverse benchmarks, we demonstrate the effectiveness of PSL-MORL in achieving dense coverage of the Pareto front, significantly outperforming state-of-the-art MORL methods in the hypervolume and sparsity indicators.


Monte Carlo Tree Search based Space Transfer for Black-box Optimization

December 2024

·

3 Reads

Bayesian optimization (BO) is a popular method for computationally expensive black-box optimization. However, traditional BO methods need to solve new problems from scratch, leading to slow convergence. Recent studies try to extend BO to a transfer learning setup to speed up the optimization, where search space transfer is one of the most promising approaches and has shown impressive performance on many tasks. However, existing search space transfer methods either lack an adaptive mechanism or are not flexible enough, making it difficult to efficiently identify promising search space during the optimization process. In this paper, we propose a search space transfer learning method based on Monte Carlo tree search (MCTS), called MCTS-transfer, to iteratively divide, select, and optimize in a learned subspace. MCTS-transfer can not only provide a well-performing search space for warm-start but also adaptively identify and leverage the information of similar source tasks to reconstruct the search space during the optimization process. Experiments on synthetic functions, real-world problems, Design-Bench and hyper-parameter optimization show that MCTS-transfer can demonstrate superior performance compared to other search space transfer methods under different settings. Our code is available at \url{https://github.com/lamda-bbo/mcts-transfer}.


Reinforcement Learning Policy as Macro Regulator Rather than Macro Placer

December 2024

In modern chip design, placement aims at placing millions of circuit modules, which is an essential step that significantly influences power, performance, and area (PPA) metrics. Recently, reinforcement learning (RL) has emerged as a promising technique for improving placement quality, especially macro placement. However, current RL-based placement methods suffer from long training times, low generalization ability, and inability to guarantee PPA results. A key issue lies in the problem formulation, i.e., using RL to place from scratch, which results in limits useful information and inaccurate rewards during the training process. In this work, we propose an approach that utilizes RL for the refinement stage, which allows the RL policy to learn how to adjust existing placement layouts, thereby receiving sufficient information for the policy to act and obtain relatively dense and precise rewards. Additionally, we introduce the concept of regularity during training, which is considered an important metric in the chip design industry but is often overlooked in current RL placement methods. We evaluate our approach on the ISPD 2005 and ICCAD 2015 benchmark, comparing the global half-perimeter wirelength and regularity of our proposed method against several competitive approaches. Besides, we test the PPA performance using commercial software, showing that RL as a regulator can achieve significant PPA improvements. Our RL regulator can fine-tune placements from any method and enhance their quality. Our work opens up new possibilities for the application of RL in placement, providing a more effective and efficient approach to optimizing chip design. Our code is available at \url{https://github.com/lamda-bbo/macro-regulator}.


Analyzing the Expected Hitting Time of Evolutionary Computation-Based Neural Architecture Search Algorithms

December 2024

·

10 Reads

·

1 Citation

IEEE Transactions on Emerging Topics in Computational Intelligence

Evolutionary computation-based neural architecture search (ENAS) is a popular technique for automating architecture design of deep neural networks. Despite its groundbreaking applications, there is no theoretical study for ENAS. The expected hitting time (EHT) is one of the most important theoretical issues, since it implies the average computational time complexity. This paper proposes a general method by integrating theory and experiment for estimating the EHT of ENAS algorithms, which includes common configuration, search space partition, transition probability estimation, population distribution fitting, and hitting time analysis. By exploiting the proposed method, we consider the ( λ\lambda + λ\lambda )-ENAS algorithms with different mutation operators and estimate the lower bounds of the EHT. Furthermore, we study the EHT on the NAS-Bench-101 problem, and the results demonstrate the validity of the proposed method. To the best of our knowledge, this work is the first attempt to establish a theoretical foundation for ENAS algorithms.


Open and real-world human-AI coordination by heterogeneous training with communication

November 2024

·

5 Reads

Frontiers of Computer Science (electronic)

Human-AI coordination aims to develop AI agents capable of effectively coordinating with human partners, making it a crucial aspect of cooperative multi-agent reinforcement learning (MARL). Achieving satisfying performance of AI agents poses a long-standing challenge. Recently, ah-hoc teamwork and zero-shot coordination have shown promising advancements in open-world settings, requiring agents to coordinate efficiently with a range of unseen human partners. However, these methods usually assume an overly idealistic scenario by assuming homogeneity between the agent and the partner, which deviates from real-world conditions. To facilitate the practical deployment and application of human-AI coordination in open and real-world environments, we propose the first benchmark for open and real-world human-AI coordination (ORC) called ORCBench. ORCBench includes widely used human-AI coordination environments. Notably, within the context of real-world scenarios, ORCBench considers heterogeneity between AI agents and partners, encompassing variations in capabilities and observations, which aligns more closely with real-world applications. Furthermore, we introduce a framework known as Heterogeneous training with Communication (HeteC) for ORC. HeteC builds upon a heterogeneous training framework and enhances partner population diversity by using mixed partner training and frozen historical partners. Additionally, HeteC incorporates a communication module that enables human partners to communicate with AI agents, mitigating the adverse effects of partially observable environments. Through a series of experiments, we demonstrate the effectiveness of HeteC in improving coordination performance. Our contribution serves as an initial but important step towards addressing the challenges of ORC.


Citations (43)


... Lu et al. [9] further demonstrated that the running times of R-NSGA-II on the OneJumpZeroJump and OneMinMax functions are all asymptotically more efficient than the NSGA-II. Bian et al. [10], [11] reduced the running times of NSGA-II and SMS-EMOA on certain problems. Furthermore, they proved the time complexity of the standard NSGA-II on the LOTZ problem [12]. ...

Reference:

Multiple-gain Estimation for Running Time of Evolutionary Combinatorial Optimization
Stochastic Population Update Can Provably Be Helpful in Multi-Objective Evolutionary Algorithms
  • Citing Article
  • February 2025

Artificial Intelligence

... Furthermore, some other works focus on the robustness when coordinating with different teammates, referring to adhoc teamwork (Stone et al. 2010;Gu et al. 2022;Mirsky et al. 2022), or zero-shot coordination (ZSC) (Hu et al. 2020;Lupu et al. 2021;Xue et al. 2022a). The former methods aim at creating an autonomous agent that can efficiently and robustly collaborate with previously unknown teammates on tasks to which they are all individually capable of contributing as team members. ...

Heterogeneous Multi-Agent Zero-Shot Coordination by Coevolution
  • Citing Article
  • January 2024

IEEE Transactions on Evolutionary Computation

... Hence the classic NSGA-II obtains a Θ(n 2 log n) runtime (function evaluations) only with the smallest admissible population size of Θ(n). Recently, a bound of O(N n log n) function evaluations was also proven for the SPEA2 (Ren et al. 2024a). For completeness, we note that the simplistic SEMO algorithm finds the full Pareto front of OMM in O(n 2 log n) iterations and function evaluations (Giel and Lehre 2010). ...

A First Running Time Analysis of the Strength Pareto Evolutionary Algorithm 2 (SPEA2)
  • Citing Chapter
  • September 2024

... MNC content was calculated using glucosamine (GluN) and muramic acid (MurA), which were extracted and quantified using the gas chromatography methods provided by Zhang et al. (2024). MNC content was calculated as follows (Hu et al. 2024): where GluN is the content of glucosamine in soil (g kg −1 ); MurA is the content of muramic acid in soil (g kg −1 ). ...

Reducing the uncertainty in estimating soil microbial-derived carbon storage
  • Citing Article
  • August 2024

Proceedings of the National Academy of Sciences

... The first runtime analysis has been provided for the classical knapsack problem (Nikfarjam et al., 2022) where it has been shown that QD is able to solve this problem in expected pseudo-polynomial time. Afterwards, runtime results have been proven for the computation of minimum spanning trees and the optimisation of submodular monotone functions under cardinality constraints and its generalisation to approximately submodular functions as well as the classical set cover problems (Qian et al., 2024). The mentioned studies have in common that they only seek on a single solution. ...

Quality-Diversity Algorithms Can Provably Be Helpful for Optimization
  • Citing Conference Paper
  • August 2024

... However, it often requires additional search procedures (e.g., Monte Carlo Tree Search for TSP) to achieve high-quality solutions (Fu et al., 2021;Qiu et al., 2022;Sun & Yang, 2023). In contrast, dynamic SSR (Fang et al., 2024;Gao et al., 2024;Wang et al., 2024) adaptively adjusts the candidate node set at each construction step based on real-time problem states, enabling more effective search space reduction for constructive NCO methods. Despite their advantages, existing dynamic SSR methods are fundamentally 1 arXiv:2503.03137v1 ...

Towards Generalizable Neural Solvers for Vehicle Routing Problems via Ensemble with Transferrable Local Policy
  • Citing Conference Paper
  • August 2024

... Algorithm-level methods (Zhou and Liu, 2005;Fernández et al., 2018;He et al., 2024) attempt to mitigate the preference for majority classes by modifying existing machine learning algorithms. However, these methods often require extensive domain knowledge and experimentation, which may not be ideal for few-shot scenarios. ...

Multi-Class Imbalance Problem: A Multi-Objective Solution
  • Citing Article
  • July 2024

Information Sciences

... However, in this case, anchors referred to promising candidate solutions in the case of vanilla BO and were unrelated to human uncertainty in PBO like us. KDE was also employed to learn the distribution of contextual variables in BO [21]. ...

Stochastic Bayesian Optimization with Unknown Continuous Context Distribution via Kernel Density Estimation
  • Citing Article
  • March 2024

Proceedings of the AAAI Conference on Artificial Intelligence