Zhi-Hua Zhou

Zhi-Hua Zhou
Nanjing University | NJU · Department of Computer Science & Technology

PhD

About

522
Publications
161,642
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
45,772
Citations

Publications

Publications (522)
Preprint
We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions. We choose dynamic regret as the performance measure, defined as the performance difference between the learner and any sequence of feasible changing policies. The measure is strictly stronger than the standard static regret that...
Article
Given a ground set of items, the result diversification problem aims to select a subset with high “quality” and “diversity” while satisfying some constraints. It arises in various real-world artificial intelligence applications, such as web-based search, document summarization and feature selection, and also has applications in other areas, e.g., c...
Chapter
Ensemble learning trains and combines multiple base learners for a single learning task, and has been among the state-of-the-art learning techniques. Ensemble pruning tries to select a subset of base learners instead of combining them all, with the aim of achieving a better generalization performance as well as a smaller ensemble size. Previous met...
Preprint
The standard supervised learning paradigm works effectively when training data shares the same distribution as the upcoming testing samples. However, this assumption is often violated in real-world applications, especially when testing data appear in an online fashion. In this paper, we formulate and investigate the problem of online label shift (O...
Article
Complex-valued neural networks have attracted increasing attention in recent years, while it remains open on the advantages of complex-valued neural networks in comparison with real-valued networks. This work takes one step on this direction by introducing the complex-reaction network with fully-connected feed-forward architecture. We prove the uni...
Preprint
We consider the problem of combining and learning over a set of adversarial bandit algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL algorithm of Agarwal et al. (2017) and its variants (Foster et al., 2020a) achieve this goal with a regret overhead of order $\widetilde{O}(\sqrt{MT})$ where $M$ is the number of base...
Preprint
We investigate online convex optimization in non-stationary environments and choose the \emph{dynamic regret} as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path-length that essentially reflects...
Chapter
This chapter will introduce ABL (ABductive Learning), a new paradigm which integrates machine learning and logical reasoning in a balanced loop enabling them to work together in a mutually beneficial way.
Preprint
Flexible Transmitter Network (FTNet) is a recently proposed bio-plausible neural network and has achieved competitive performance with the state-of-the-art models when handling temporal-spatial data. However, there remains an open problem about the theoretical understanding of FTNet. This work investigates the theoretical properties of one-hidden-l...
Preprint
Mimicking and learning the long-term memory of efficient markets is a fundamental problem in the interaction between machine learning and financial economics to sequential data. Despite the prominence of this issue, current treatments either remain largely limited to heuristic techniques or rely significantly on periodogram or Gaussianty assumption...
Preprint
Given a ground set of items, the result diversification problem aims to select a subset with high "quality" and "diversity" while satisfying some constraints. It arises in various real-world artificial intelligence applications, such as web-based search, document summarization and feature selection, and also has applications in other areas, e.g., c...
Article
We introduce Isolation Distributional Kernel as a new way to measure the similarity between two distributions. Existing approaches based on kernel mean embedding, which convert a point kernel to a distributional kernel, have two key issues: the point kernel employed has a feature map with intractable dimensionality; and it is data independent. This...
Preprint
Multivariate time series (MTS) prediction is ubiquitous in real-world fields, but MTS data often contains missing values. In recent years, there has been an increasing interest in using end-to-end models to handle MTS with missing values. To generate features for prediction, existing methods either merge all input dimensions of MTS or tackle each i...
Preprint
Full-text available
In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces. This situation generates significant obstacles for existing imitation learning approaches to work, even when they are combined with traditional space adaptation techniques. The main challenge lies in bridging expert's...
Article
Given a publicly available pool of machine learning models constructed for various tasks, when a user plans to build a model for her own machine learning application, is it possible to build upon models in the pool such that the previous efforts on these existing models can be reused rather than starting from scratch Here, a grand challenge is how...
Preprint
Full-text available
One of the key issues for imitation learning lies in making policy learned from limited samples to generalize well in the whole state-action space. This problem is much more severe in high-dimensional state environments, such as game playing with raw pixel inputs. Under this situation, even state-of-the-art adversary-based imitation learning algori...
Conference Paper
Full-text available
One of the key issues for imitation learning lies in making policy learned from limited samples to generalize well in the whole state-action space. This problem is much more severe in high-dimensional state environments, such as game playing with raw pixel inputs. Under this situation, even state-of-the-art adversary-based imitation learning algori...
Article
The learnware paradigm attempts to change the current style of machine learning deployment, i.e., user builds her own machine learning application almost from scratch, to a style where the previous efforts of other users can be reused, given a publicly available pool of machine learning models constructed by previous users for various tasks. Each l...
Article
In conventional supervised learning, a training dataset is given with ground-truth labels from a known label set, and the learned model will classify unseen instances to known labels. In real situations, when the learned models do not work well, users generally attribute the failure to the inadequate selection of learning algorithms or the lack of...
Article
Feature evolvable learning has been widely studied in recent years where old features will vanish and new features will emerge when learning with streams. Conventional methods usually assume that a label will be revealed after prediction at each time step. However, in practice, this assumption may not hold whereas no label will be given at most tim...
Article
Learning with feature evolution studies the scenario where the features of the data streams can evolve, i.e., old features vanish and new features emerge. Its goal is to keep the model always performing well even when the features happen to evolve. To tackle this problem, canonical methods assume that the old features will vanish simultaneously and...
Article
In plenty of real-life tasks, strongly supervised information is hard to obtain, and thus weakly supervised learning has drawn considerable attention recently. This paper investigates the problem of learning from incomplete and inaccurate supervision, where only a limited subset of training data is labeled but potentially with noise. This setting i...
Preprint
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions and thus captures temporal effects of learning problems. In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments, which competes algorithms' dec...
Article
Most studies about deep learning are based on neural network models, where many layers of parameterized nonlinear differentiable modules are trained by backpropagation. Recently, it has been shown that deep learning can also be realized by non-differentiable modules without backpropagation training called deep forest. We identify that deep forest h...
Preprint
We introduce Isolation Distributional Kernel as a new way to measure the similarity between two distributions. Existing approaches based on kernel mean embedding, which convert a point kernel to a distributional kernel, have two key issues: the point kernel employed has a feature map with intractable dimensionality; and it is {\em data independent}...
Preprint
Full-text available
Feature evolvable learning has been widely studied in recent years where old features will vanish and new features will emerge when learning with streams. Conventional methods usually assume that a label will be revealed after prediction at each time step. However, in practice, this assumption may not hold whereas no label will be given at most tim...
Preprint
We investigate online convex optimization in non-stationary environments and choose the dynamic regret as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path-length that essentially reflects the non...
Preprint
Full-text available
Gradient Boosting Machine has proven to be one successful function approximator and has been widely used in a variety of areas. However, since the training procedure of each base learner has to take the sequential order, it is infeasible to parallelize the training process among base learners for speed-up. In addition, under online or incremental l...
Article
There still involve lots of challenges when applying machine learning algorithms in unknown environments, especially those with limited training data. To handle the data insufficiency and make a further step towards robust learning, we adopt the learnware notion which equips a model with an essential reusable property---the model learned in a relat...
Preprint
Full-text available
Background: Mounting evidence suggests that there is an undetected pool of COVID-19 asymptomatic but infectious cases. Estimating the number of asymptomatic infections has been crucial to understand the virus and contain its spread, which is, however, hard to be accurately counted. Methods: We propose an approach of machine learning based fine-grai...
Article
Full-text available
Multi-label support vector machine (Rank-SVM) is a classic and effective algorithm for multi-label classification. The pivotal idea is to maximize the minimum margin of label pairs, which is extended from SVM. However, recent studies disclosed that maximizing the minimum margin does not necessarily lead to better generalization performance, and ins...
Article
Full-text available
In many real-world applications, data are often collected in the form of a stream, and thus the distribution usually changes in nature, which is referred to as concept drift in the literature. We propose a novel and effective approach to handle concept drift via model reuse, that is, reusing models trained on previous data to tackle the changes. Ea...
Preprint
In conventional supervised learning, a training dataset is given with ground-truth labels from a known label set, and the learned model will classify unseen instances to the known labels. In this paper, we study a new problem setting in which there are unknown classes in the training dataset misperceived as other labels, and thus their existence ap...
Preprint
Given a publicly available pool of machine learning models constructed for various tasks, when a user plans to build a model for her own machine learning application, is it possible to build upon models in the pool such that the previous efforts on these existing models can be reused rather than starting from scratch? Here, a grand challenge is how...
Conference Paper
Full-text available
Perception and reasoning are two representative abilities of intelligence that are integrated seamlessly during human problem-solving processes. In the area of artificial intelligence (AI), the two abilities are usually realised by machine learning and logic programming, respectively. However, the two categories of techniques were developed separat...
Preprint
In multi-label learning, each instance is associated with multiple labels and the crucial task is how to leverage label correlations in building models. Deep neural network methods usually jointly embed the feature and label information into a latent space to exploit label correlations. However, the success of these methods highly depends on the pr...
Article
Diverse applications involving complicated data objects such as proteins and images are solved by applying multi-instance learning (MIL) algorithms. However, few MIL algorithms can deal with problems in an open and dynamic environment, where new categories of samples emerge. In this type of emerging novel class setting, algorithms should be able to...
Preprint
In this paper, we study the problem of learning with augmented classes (LAC), where new classes that do not appear in the training dataset might emerge in the testing phase. The mixture of known classes and new classes in the testing distribution makes the LAC problem quite challenging. Our discovery is that by exploiting cheap and vast unlabeled d...
Article
Internet companies are facing the need for handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large-scale tasks with great performance is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its bui...
Conference Paper
Full-text available
Experience reuse is key to sample-efficient reinforcement learning. One of the critical issues is how the experience is represented and stored. Previously, the experience can be stored in the forms of features, individual models, and the average model, each lying at a different granularity. However, new tasks may require experience across multiple...
Conference Paper
Partial label learning deals with training examples each associated with a set of candidate labels, among which only one label is valid. Previous studies typically assume that the candidate label sets are provided for all training examples. In many real-world applications such as video character classification, however, it is generally difficult to...
Preprint
Bandit Convex Optimization (BCO) is a fundamental framework for modeling sequential decision-making with partial information, where the only feedback available to the player is the one-point or two-point function values. In this paper, we investigate BCO in non-stationary environments and choose the \emph{dynamic regret} as the performance measure,...
Conference Paper
Set-level problems are as important as instance-level problems. The core in solving set-level problems is: how to measure the similarity between two sets. This paper investigates data-dependent kernels that are derived directly from data. We introduce Isolation Set-Kernel which is solely dependent on data distribution, requiring neither class infor...
Conference Paper
In plenty of real-life tasks, strongly supervised information is hard to obtain, such that there is not sufficient high-quality supervision to make traditional learning approaches succeed. Therefore, weakly supervised learning has drawn considerable attention recently. In this paper, we consider the problem of learning from incomplete and inaccurat...
Preprint
To deal with changing environments, a new performance measure---adaptive regret, defined as the maximum static regret over any interval, is proposed in online learning. Under the setting of online convex optimization, several algorithms have been successfully developed to minimize the adaptive regret. However, existing algorithms lack universality...
Preprint
Unsupervised domain adaptation aims to transfer the classifier learned from the source domain to the target domain in an unsupervised manner. With the help of target pseudo-labels, aligning class-level distributions and learning the classifier in the target domain are two widely used objectives. Existing methods often separately optimize these two...
Preprint
We study the problem of computing the minimum adversarial perturbation of the Nearest Neighbor (NN) classifiers. Previous attempts either conduct attacks on continuous approximations of NN models or search for the perturbation by some heuristic methods. In this paper, we propose the first algorithm that is able to compute the minimum adversarial pe...
Conference Paper
Manifold clustering, which regards clusters as groups of points around compact manifolds, has been realized as a promising generalization of traditional clustering. A number of linear or nonlinear manifold clustering approaches have been developed recently. Although they have attained better performances than traditional clustering methods in many...
Conference Paper
Manifold clustering, which regards clusters as groups of points around compact manifolds, has been realized as a promising generalization of traditional clustering. A number of linear or nonlinear manifold clustering approaches have been developed recently. Although they have attained better performances than traditional clustering methods in many...
Chapter
This chapter presents the switch analysis approach for analyzing the running time complexity of evolutionary algorithms. Switch analysis works by comparing two optimization processes, thus can help analyze a complicated optimization process by comparing with a simpler reference process. It is applied to prove the expected running time lower bound o...
Chapter
This chapter studies the evolutionary learning method for selective ensemble learning problem, which needs to select some component learners out of all learners. We show that a Pareto optimization algorithm, POSE, solves the learning problem better than previous ordering-based selective ensemble methods as well as the heuristic single-objective opt...
Chapter
This chapter presents the parallel version of Pareto optimization algorithm, PPOSS, for subset selection. We disclose that the parallelization does not break the effectiveness of Pareto optimization while reducing the total time. Moreover, given sufficient processors, PPOSS can be both faster and more accurate than parallel greedy methods. The effi...
Chapter
This chapter studies the influence of solution representation, by comparing the genetic programming with the genetic algorithm, which employ tree representation and vector representation, respectively. We show that tree representation can lead to better running time than vector representation, on two classical combinatorial problems.
Chapter
This chapter studies the general subset selection problem that is involved in various learning problems. Based on Pareto optimization, we present the POSS algorithm that achieves equal or better performance than the greedy algorithm. The advantage is also supported by the experiment results.
Chapter
This chapter studies how to deal with infeasible solutions when evolutionary algorithms are used for constrained optimization. We derive sufficient and necessary conditions to judge the usefulness of infeasible solutions in concrete problems. We then disclose that Pareto optimization, transforming the original constrained optimization problem into...
Chapter
This chapter studies an extension of the subset selection problem, i.e., maximizing monotone k-submodular functions subject to a size constraint. Based on Pareto optimization, we present the POkSS algorithm for the problem, which is proven to have the state-of-the-art performance and is verified empirically on the applications of influence maximiza...
Chapter
This chapter studies the easiest and hardest instances of a problem class respect to the given evolutionary algorithm, for the understanding of the algorithm. Through the derived theorem, the easiest and hardest functions in the pseudo-Boolean function class with a unique global optimal solution are identified for (1+1)-EA with any mutation probabi...
Chapter
This chapter introduces preliminaries. Including basic evolutionary algorithms, pseudo-Boolean functions for theoretical studies, and basic knowledge for analyzing running time complexity of evolutionary algorithms.
Chapter
This chapter studies the relationship among different analysis approaches for running time complexity of evolutionary algorithms, through the defined reducibility relation between two approaches. Consequently, we find that switch analysis can serve as a unified analysis approach, as other approaches can be reduced to switch analysis. This unificati...
Chapter
This chapter studies minimizing the ratio \( f/g \), such as optimizing the F-measure objective in machine learning tasks. We prove the approximation bound of a greedy-style algorithm, as well as a Pareto optimization based algorithm PORM, where PORM is shown to be able to achieve better performance. The advantage of PORM is also verified by empiri...
Preprint
In this work, we consider one challenging training time attack by modifying training data with bounded perturbation, hoping to manipulate the behavior (both targeted or non-targeted) of any corresponding trained classifier during test time when facing clean samples. To achieve this, we proposed to use an auto-encoder-like network to generate the pe...
Chapter
This chapter studies the approximation performance of evolutionary algorithms through the SEIP framework. SEIP adopts an isolation function to manage competition among solutions and offers a general characterization of approximation behaviors. The framework is applied to the set cover problem, delivering an -approximation ratio that matches the asy...
Chapter
This chapter studies the subset selection problem under multiplicative and additive noise. We disclose that the greedy algorithm and POSS algorithms achieve nearly the same approximation guarantee under noise. Moreover, the PONSS algorithm using a noise handling strategy can achieve a better approximation ratio for independently and identically dis...
Chapter
This chapter studies the influence of noise on evolutionary algorithms. We disclose that the noise is not always bad. For hard problems, noise can be helpful, while for easy problems, it can be harmful. The findings are verified in the experiments. We also prove that the two common strategies, i.e., threshold selection and sampling, can bring robus...
Chapter
This chapter studies the influence of population on evolutionary algorithms. We show that, on one hand, population is unexpected for simple functions such as OneMax and LeadningOnes by derving the lower running time bound, and on the other hand, in the presence of noise, using population can enhance the robustness against noise.
Chapter
This chapter studies the influence of recombination operators. We show that, in multi-objective evolutionary optimization, recombination operators are useful for multi-objective evolutionary optimization by accelerating the filling of the Pareto front. This principle may also hold in more situations.
Chapter
This chapter presents the convergence-based analysis approach for analyzing the running time complexity of evolutionary algorithms, which is derived from bridging two fundamental theoretical issues. The approach is applied to show the exponential lower bound of the expected running time for (1+1)-EA and randomized local search solving the constrain...
Preprint
We investigate online convex optimization in changing environments, and choose the adaptive regret as the performance measure. The goal is to achieve a small regret over every interval so that the comparator is allowed to change over time. Different from previous works that only utilize the convexity condition, this paper further exploits smoothnes...
Article
Full-text available
The use of distance metrics such as the Euclidean or Manhattan distance for nearest neighbour algorithms allows for interpretation as a geometric model, and it has been widely assumed that the metric axioms are a necessary condition for many data mining tasks. We show that this assumption can in fact be an impediment to producing effective models....
Preprint
Stochastic approximation (SA) is a classical approach for stochastic convex optimization. Previous studies have demonstrated that the convergence rate of SA can be improved by introducing either smoothness or strong convexity condition. In this paper, we make use of smoothness and strong convexity simultaneously to boost the convergence rate. Let $...
Book
Many machine learning tasks involve solving complex optimization problems, such as working on non-differentiable, non-continuous, and non-unique objective functions; in some cases it can prove difficult to even define an explicit objective function. Evolutionary learning applies evolutionary algorithms to address optimization problems in machine le...
Book
The three-volume set LNAI 11439, 11440, and 11441 constitutes the thoroughly refereed proceedings of the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2019, held in Macau, China, in April 2019. The 137 full papers presented were carefully reviewed and selected from 542 submissions. The papers present new ideas, original...
Book
The three-volume set LNAI 11439, 11440, and 11441 constitutes the thoroughly refereed proceedings of the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2019, held in Macau, China, in April 2019. The 137 full papers presented were carefully reviewed and selected from 542 submissions. The papers present new ideas, original...
Book
The three-volume set LNAI 11439, 11440, and 11441 constitutes the thoroughly refereed proceedings of the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2019, held in Macau, China, in April 2019. The 137 full papers presented were carefully reviewed and selected from 542 submissions. The papers present new ideas, original...
Conference Paper
This paper investigates data dependent kernels that are derived directly from data. This has been an outstanding issue for about two decades which hampered the development of kernel-based methods. We introduce Isolation Kernel which is solely dependent on data distribution, requiring neither class information nor explicit learning to be a classifie...
Article
We consider multi-label crowdsourcing learning in two scenarios. In the first scenario, we aim at inferring instances' groundtruth given the crowds' annotations. We propose two approaches NAM/RAM (Neighborhood/Relevance Aware Multi-label crowdsourcing) modeling the crowds' expertise and label correlations from different perspectives. Extended from...