Jun Wang

Jun Wang

About

578
Publications
69,201
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,300
Citations

Publications

Publications (578)
Preprint
Full-text available
The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in the artificial intelligence (AI) research community. However, many research endeavors have been focused on developing practical MARL algorithms whose effectiveness has been studied only empirically, thereby lacking theor...
Preprint
Deriving a good variable selection strategy in branch-and-bound is essential for the efficiency of modern mixed-integer programming (MIP) solvers. With MIP branching data collected during the previous solution process, learning to branch methods have recently become superior over heuristics. As branch-and-bound is naturally a sequential decision ma...
Preprint
Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent con...
Preprint
Oligodendrocytes are the most iron-rich cells in the brain. Studies have shown that oligodendrocytes are very sensitive to oxidative stress, and iron overload is more likely to cause damage to oligodendrocytes. The purpose of this experiment was to investigate the damaging effect and mechanism of ferric ammonium citrate (FAC) on MO3.13 oligodendroc...
Article
Full-text available
Estrogen is a steroid hormone produced mainly by the ovaries. It has been found that estrogen could regulate iron metabolism in neurons and astrocytes in different ways. The role of estrogen on iron metabolism in microglia is currently unknown. In this study, we investigated the effect and mechanism of 17β-estrogen (E2) on iron transport proteins....
Article
In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisatio...
Conference Paper
Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in n-player games, which builds the foundation for modern multi-agent learning algorithms. Although FP has provable convergence guarantees on zero-sum games and potential games, many real-world problems are often a mixture of both...
Article
Periodontium possesses stem cell populations for its self‐maintenance and regeneration, and has been proved to be an optimal stem cell source for tissue engineering. In vitro studies have shown that stem cells can be isolated from periodontal ligament, alveolar bone marrow and gingiva. In recent years, more studies have focused on identification of...
Preprint
Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations -- a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is non...
Preprint
Many real-world settings involve costs for performing actions; transaction costs in financial systems and fuel costs being common examples. In these settings, performing actions at each time step quickly accumulates costs leading to vastly suboptimal outcomes. Additionally, repeatedly acting produces wear and tear and ultimately, damage. Determinin...
Preprint
Efficient reinforcement learning (RL) involves a trade-off between "exploitative" actions that maximise expected reward and "explorative'" ones that sample unvisited states. To encourage exploration, recent approaches proposed adding stochasticity to actions, separating exploration and exploitation phases, or equating reduction in uncertainty with...
Preprint
Large sequence model (SM) such as GPT series and BERT has displayed outstanding performance and generalization capabilities on vision, language, and recently reinforcement learning tasks. A natural follow-up question is how to abstract multi-agent decision making into an SM problem and benefit from the prosperous development of SMs. In this paper,...
Preprint
In multi-agent systems, intelligent agents are tasked with making decisions that have optimal outcomes when the actions of the other agents are as expected, whilst also being prepared for unexpected behaviour. In this work, we introduce a new risk-averse solution concept that allows the learner to accommodate unexpected actions by finding the minim...
Article
Full-text available
Objectives: This cross-sectional study aimed to evaluate the associations among orthodontic history, psychological status, and temporomandibular-related quality of life. Methods: A questionnaire was developed and distributed to students in a local college, containing questions about demographic information, the Patient Health Questionnaire-4 (PH...
Preprint
Faced with problems of increasing complexity, recent research in Bayesian Optimisation (BO) has focused on adapting deep probabilistic models as flexible alternatives to Gaussian Processes (GPs). In a similar vein, this paper investigates the feasibility of employing state-of-the-art probabilistic transformers in BO. Upon further investigation, we...
Preprint
Full-text available
In the intelligent communication field, deep learning (DL) has attracted much attention due to its strong fitting ability and data-driven learning capability. Compared with the typical DL feedforward network structures, an enhancement structure with direct data feedback have been studied and proved to have better performance than the feedfoward net...
Preprint
Full-text available
Reinforcement learning has achieved tremendous success in many complex decision making tasks. When it comes to deploying RL in the real world, safety concerns are usually raised, leading to a growing demand for safe reinforcement learning algorithms, such as in autonomous driving and robotics scenarios. While safety control has a long history, the...
Preprint
Mobile communication standards were developed for enhancing transmission and network performance by utilizing more radio resources and improving spectrum and energy efficiency. How to effectively address diverse user requirements and guarantee everyone's Quality of Experience (QoE) remains an open problem. The future Sixth Generation (6G) system ca...
Article
Full-text available
Purpose: To evaluate the relationship between oral habits, psychological status, and temporomandibular-related quality of life among college students. Materials and methods: An online questionnaire was sent to college students who were willing to participate in this anonymous survey, which contained questions about the demographic characteristic...
Preprint
Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker infor...
Preprint
Full-text available
Fictitious play (FP) is one of the most fundamental game-theoretical learning frameworks for computing Nash equilibrium in $n$-player games, which builds the foundation for modern multi-agent learning algorithms. Although FP has provable convergence guarantees on zero-sum games and potential games, many real-world problems are often a mixture of bo...
Article
Bone stromal cells are critical for bone homeostasis and regeneration. Growing evidence suggests that non-stem bone niche cells support bone homeostasis and regeneration via paracrine mechanisms, which remain to be elucidated. Here, we show that physiologically quiescent SM22α-lineage stromal cells expand after bone injury to regulate diverse proce...
Article
Background: To date, controversies still exist regarding the exact cellular origin and regulatory mechanisms of periodontium development, which hinders efforts to achieve ideal periodontal tissue regeneration. Axin2-expressing cells in the periodontal ligament (PDL) have been shown to be a novel progenitor cell population that is essential for per...
Article
Learning to rank from logged user feedback, such as clicks or purchases, is a central component of many real-world information systems. Different from human-annotated relevance labels, the user feedback is always noisy and biased. Many existing learning to rank methods infer the underlying relevance of query–item pairs based on different assumption...
Preprint
Full-text available
Green's function plays a significant role in both theoretical analysis and numerical computing of partial differential equations (PDEs). However, in most cases, Green's function is difficult to compute. The troubles arise in the following three folds. Firstly, compared with the original PDE, the dimension of Green's function is doubled, making it i...
Preprint
Full-text available
Mainstream numerical Partial Differential Equation (PDE) solvers require discretizing the physical domain using a mesh. Mesh movement methods aim to improve the accuracy of the numerical solution by increasing mesh resolution where the solution is not well-resolved, whilst reducing unnecessary resolution elsewhere. However, mesh movement methods, s...
Article
A range of heteroleptic aluminium(III) formamidinate/formamidine complexes have been prepared involving metathesis reactions between AlX3 (X = Cl, Br, I) and alkali metal formamidinates. The mononuclear, bis-substituted complexes of the composition [Al(XylForm)2Cl] (XylForm = N,N′-bis(2,6-dimethylphenyl)formamidinate) (1), [Al(XylForm)2I]·PhMe (2),...
Article
Many industrial practitioners are facing the challenge of solving large-scale scheduling problems within a limited time. In this paper, we propose a novel bilevel scheduler based on constraint Markov Decision Process to solve large-scale flexible flow shop scheduling problems (FFSP). There are many intelligent algorithms proposed to solve FFSP, but...
Article
Full-text available
Normal development of craniofacial sutures is crucial for cranial and facial growth in all three dimensions. These sutures provide a unique niche for suture stem cells (SuSCs), which are indispensable for homeostasis, damage repair as well as stress balance. Expansion appliances are now routinely used to treat underdevelopment of the skull and maxi...
Article
Objective: To provide reliable prediction models based on dentoskeletal and soft tissue variables for customizing maxillary incisor positions and to optimize digitalized orthodontic treatment planning. Methods: This study included 244 Chinese women (age, 18-40 years old) with esthetic profiles after orthodontic treatment with fixed appliances (1...
Article
Full-text available
Mechanical force, being so ubiquitous that it is often taken for granted and overlooked, is now gaining the spotlight for reams of evidence corroborating their crucial roles in the living body. The bone, particularly, experiences manifold extraneous force like strain and compression, as well as intrinsic cues like fluid shear stress and physical pr...
Preprint
Full-text available
Recent progress in state-only imitation learning extends the scope of applicability of imitation learning to real-world settings by relieving the need for observing expert actions. However, existing solutions only learn to extract a state-to-action mapping policy from the data, without considering how the expert plans to the target. This hinders th...
Article
Introduction Orthodontic students need to accurately identify cephalometric landmarks to perform cephalometric measurements, which is the prerequisite to proper orthodontic diagnosis and treatment. To provide insights into future cephalometric education, we compared the performance of different methods that can be used in tracing practice, includin...
Article
Full-text available
Regenerating periodontal bone tissues in the aggravated inflammatory periodontal microenvironment under diabetic conditions is a great challenge. Here, a polydopamine-mediated graphene oxide (PGO) and hydroxyapatite nanoparticle (PHA)-incorporated conductive alginate/gelatin (AG) scaffold is developed to accelerate periodontal bone regeneration by...
Preprint
It is evidence that representation learning can improve model's performance over multiple downstream tasks in many real-world scenarios, such as image classification and recommender systems. Existing learning approaches rely on establishing the correlation (or its proxy) between features and the downstream task (labels), which typically results in...
Preprint
We consider a context-dependent Reinforcement Learning (RL) setting, which is characterized by: a) an unknown finite number of not directly observable contexts; b) abrupt (discontinuous) context changes occurring during an episode; and c) Markovian context evolution. We argue that this challenging case is often met in applications and we tackle it...
Preprint
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the s...
Preprint
Full-text available
Since proposed in the 70s, the Non-Equilibrium Green Function (NEGF) method has been recognized as a standard approach to quantum transport simulations. Although it achieves superiority in simulation accuracy, the tremendous computational cost makes it unbearable for high-throughput simulation tasks such as sensitivity analysis, inverse design, etc...
Article
In the intelligent communication field, deep learning (DL) has attracted much attention due to its strong fitting ability and data-driven learning capability. Compared with the typical DL feedforward network structures, an enhancement structure with direct data feedback have been studied and proved to have better performance than the feedfoward net...
Article
Full-text available
Disrupted iron homeostasis in the substantia nigra pars compacta (SNpc) is an important pathological mechanism in Parkinson’s disease (PD). It is unclear what role microglia play in iron metabolism and selective iron deposition in the SNpc of PD brain. In this study, we observed that 6-hydroxydopamine (6-OHDA) induced the expression of divalent met...
Preprint
The Schr\"odinger equation is at the heart of modern quantum mechanics. Since exact solutions of the ground state are typically intractable, standard approaches approximate Schr\"odinger equation as forms of nonlinear generalized eigenvalue problems $F(V)V = SV\Lambda$ in which $F(V)$, the matrix to be decomposed, is a function of its own top-$k$ s...
Article
η⁶-Arene(iodidoaluminato)lanthanoid(II) complexes, {[Ln(η⁶-C6H5Me)(AlI4)2]}n [Ln = Sm, 1, Eu, 2, Yb, 3; C6H5Me = toluene] have been prepared by reactions of in situ generated aluminium triiodide with the corresponding lanthanoid metals and 1,2-diiodoethane in toluene (molar ratio : 2:1:1). Compounds 1-3 are polymeric and the lanthanoid(II) atom of...
Preprint
Antibodies are canonically Y-shaped multimeric proteins capable of highly specific molecular recognition. The CDRH3 region located at the tip of variable chains of an antibody dominates antigen-binding specificity. Therefore, it is a priority to design optimal antigen-specific CDRH3 regions to develop therapeutic antibodies to combat harmful pathog...
Preprint
Many real-world scenarios involve a team of agents that have to coordinate their policies to achieve a shared goal. Previous studies mainly focus on decentralized control to maximize a common reward and barely consider the coordination among control policies, which is critical in dynamic and complicated environments. In this work, we propose factor...
Preprint
Debiased recommendation has recently attracted increasing attention from both industry and academic communities. Traditional models mostly rely on the inverse propensity score (IPS), which can be hard to estimate and may suffer from the high variance issue. To alleviate these problems, in this paper, we propose a novel debiased recommendation frame...
Article
Full-text available
Several new trivalent dinuclear rare earth 2,2’-methylenebis(6- tert -butyl-4-methylphenolate) (mbmp 2- ) complexes with the general form [Ln 2 (mbmp) 3 (thf) n ] (Ln = Sm 1, Tb 2 (n = 2), and Ho 3, Yb 4 (n = 3), and a tetravalent cerium complex [Ce(mbmp) 2 (thf) 2 ] (5) have been synthesised by RTP (redox transmetallation/protolysis) reactions fro...
Article
Mesenchymal stem cells (MSCs) are remarkable and noteworthy. Identification of markers for MSCs enables the study of their niche in vivo. It has been identified that glioma-associated oncogene 1 positive (Gli1+) cells are mesenchymal stem cells supporting homeostasis and injury repair, especially in the skeletal system and teeth. This review outlin...
Article
η6‐Arene(iodido‐/bromido‐aluminato)lanthanoid(III) complexes, [Ln(η6‐C6H5Me)(AlI4)3] [Ln=La (1), Ce (2), Nd (3), (Gd) (4); C6H5Me=toluene], [Ln(η6‐C6H3Me3‐1,3,5)(AlI4)3] [Ln=La (5), Ce (6), Pr (7), Nd (8), Sm (9), Gd (10); C6H3Me3‐1,3,5=mesitylene], and [Ln(η6‐C6H5Me)(AlBr4)3] [Ln=La (11), Nd (12), Sm (13)] were prepared by reactions of aluminium t...
Article
Cryptocaryon irritans, a holotrichous ciliate parasitic protozoan, can trigger marine white spot disease and cause substantial economic losses in mariculture. However, methods of preventing and curing the disease have negatively affect fish, human, other organisms, and the natural environment. The antiparasitic activity of some antimicrobial peptid...
Article
Full-text available
In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel acquisition functions for out-of-sample Gaussian process evaluation optimised through a multi-objective probl...
Preprint
In recent years, gradient based Meta-RL (GMRL) methods have achieved remarkable successes in either discovering effective online hyperparameter for one single task (Xu et al., 2018) or learning good initialisation for multi-task transfer learning (Finn et al., 2017). Despite the empirical successes, it is often neglected that computing meta gradien...
Article
Background Temporomandibular joint osteoarthritis (TMJ-OA) causes severe symptoms such as chewing difficulties, acute pain and even maxillofacial deformity. However, there is hardly any effective disease-curing strategy because of uncertainty in etiology. Animal model is an excellent tool to investigate the mechanism, prevention and treatment on di...
Article
Full-text available
Objective: To assess the differences in hyoid bone position in patients with and without temporomandibular joint osteoarthrosis (TMJOA). Methods: The present cross-sectional study was conducted in 427 participants whose osseous status was evaluated using cone-beam computed tomography and classified into normal, indeterminate osteoarthrosis (OA),...
Preprint
Offline reinforcement learning leverages static datasets to learn optimal policies with no necessity to access the environment. This technique is desirable for multi-agent learning tasks due to the expensiveness of agents' online interactions and the demanding number of samples during training. Yet, in multi-agent reinforcement learning (MARL), the...
Preprint
Efficient exploration is important for reinforcement learners (RL) to achieve high rewards. In multi-agent systems, coordinated exploration and behaviour is critical for agents to jointly achieve optimal outcomes. In this paper, we introduce a new general framework for improving coordination and performance of multi-agent reinforcement learners (MA...
Article
Full-text available
Regulated cell death (RCD) is a ubiquitous process in living organisms that is essential for tissue homeostasis or to restore biological balance under stress. Over the decades, various forms of RCD have been reported and are increasingly being found to involve in human pathologies and clinical outcomes. We focus on five high-profile forms of RCD, i...
Article
Full-text available
Osteoporosis is a prevalent bone disorder characterized by bone mass reduction and deterioration of bone microarchitecture leading to bone fragility and fracture risk. In recent decades, knowledge regarding the etiological mechanisms emphasizes that inflammation, oxidative stress and senescence of bone cells contribute to the development of osteopo...
Preprint
Full-text available
Optimising the quality-of-results (QoR) of circuits during logic synthesis is a formidable challenge necessitating the exploration of exponentially sized search spaces. While expert-designed operations aid in uncovering effective sequences, the increase in complexity of logic circuits favours automated procedures. Inspired by the successes of machi...
Preprint
In this paper, we shed new light on the generalization ability of deep learning-based solvers for Traveling Salesman Problems (TSP). Specifically, we introduce a two-player zero-sum framework between a trainable \emph{Solver} and a \emph{Data Generator}, where the Solver aims to solve the task instances provided by the Generator, and the Generator...
Preprint
Exploring in an unknown system can place an agent in dangerous situations, exposing to potentially catastrophic hazards. Many current approaches for tackling safe learning in reinforcement learning (RL) lead to a trade-off between safe exploration and fulfilling the task. Though these methods possibly incur fewer safety violations, they often also...
Conference Paper
Top-N recommendation, which aims to learn user ranking-based preference, has long been a fundamental problem in a wide range of applications. Traditional models usually motivate themselves by designing complex or tailored architectures based on different assumptions. However, the training data of recommender system can be extremely sparse and imbal...
Article
Hydrogels consisting of a three-dimensional hydrophilic network of biocompatible polymers have been widely used in tissue engineering. Owing to their tunable mechanical properties, hydrogels have been applied in both hard and soft tissues. However, most hydrogels lack self-adhesive properties that enable integration with surrounding tissues, which...
Preprint
We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret...
Preprint
Full-text available
Developing reinforcement learning algorithms that satisfy safety constraints is becoming increasingly important in real-world applications. In multi-agent reinforcement learning (MARL) settings, policy optimisation with safety awareness is particularly challenging because each individual agent has to not only meet its own safety constraints, but al...