About
42
Publications
2,748
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
507
Citations
Introduction
Research interests include machine learning and data mining.
Current institution
Publications
Publications (42)
The development of the civil aviation industry has contributed to a steady increase in the number of daily flight operations at airports, which in turn has led to increasingly complex airport ground layouts. To aid airport managers in understanding the operational situation on the airport surface, this paper introduces a predictive model for airpor...
Accurate prediction of the degree of airport delays under the influence of convective weather is crucial for collaborative traffic management implementation and improving the efficiency of airport operations. However, existing studies usually only consider numerical-type quantitative features of weather-affected traffic in their models, and lack th...
Since air traffic complexity determines the workload of controllers, it is a popular topic in the research field. Benefiting from deep learning, this paper proposes an air traffic complexity assessment method based on the deep metric of air traffic images. An Ordered Deep Metric (ODM) is proposed to measure the similarity of the ordered samples. Fo...
In order to quantify the degree of influence of weather on traffic situations in real time, this paper proposes a terminal traffic situation prediction model under the influence of weather (TSPM-W) based on deep learning approaches. First, a feature set for predicting traffic situations is constructed based on data such as weather, traffic demand,...
Semi-supervised learning is ubiquitous in real-world machine learning applications due to its good performance for handling the data where only a few number of samples are labeled while most of then are unlabeled. Transductive support vector machine (TSVM) is an important semi-supervised learning method which formulates the problem as a nonconvex c...
Since semi-supervised learning can use fewer labelled samples to train a better model, semi-supervised methods are becoming popular in data mining. As an important algorithm of semi-supervised support vector machines (S $ ^{3} $ 3VM), transductive support vector machine (TSVM) sometimes may get worse models trained on both labelled samples and unla...
As a special case of multi-classification, ordinal regression (also known as ordinal classification) is a popular method to tackle the multi-class problems with samples marked by a set of ranks. Semi-supervised ordinal regression (SSOR) is especially important for data mining applications because semi-supervised learning can make use of the unlabel...
k nearest neighbor (kNN) is a simple and widely used classifier; it can achieve comparable performance with more complex classifiers including decision tree and artificial neural network. Therefore, kNN has been listed as one of the top 10 algorithms in machine learning and data mining. On the other hand, in many classification problems, such as me...
A sector is a basic unit of airspace whose operation is managed by air traffic controllers. The operation complexity of a sector plays an important role in air traffic management system, such as airspace reconfiguration, air traffic flow management, and allocation of air traffic controller resources. Therefore, accurate evaluation of the sector ope...
Air traffic complexity is a critical indicator for air traffic operation, and plays an important role in air traffic management (ATM), such as airspace reconfiguration, air traffic flow management and allocation of air traffic controllers (ATCos). Recently, many machine learning techniques have been used to evaluate air traffic complexity by constr...
Symbolic approximation representation is a key problem in time series which can significantly affect the accuracy and efficiency of data mining. However, since currently used methods divide the original sequence into segments with equal size, they ignore one of the most important features of time series: the trend. To overcome the defect of equal-s...
Ordinal regression (OR), also called ordinal classification, is a special multi-classification designed for problems with ordered classes. Imbalanced data hinders the performance of classification algorithms, especially for OR algorithms, as imbalanced class distributions often arise in OR problems. In this article, we address an active learning ba...
Miguel Zhang H Xie Y Zhang- [...]
Z Wu
Accurate classification of the traffic busyness of airspace units is a necessary prerequisite for the implementation of airspace management and traffic management. In this paper, firstly, the complexity index of waypoints is calculated based on a waypoint complexity model, and then the busyness of waypoint in different time periods is evaluated and...
Semi-Supervised Support Vector Machine (S3VM) is one of the most popular methods for semi-supervised learning. To avoid the trivial solution of classifying all the unlabeled examples to a same class, balancing constraint is often used with S3VM (denoted as BCS3VM). Recently, a novel incremental learning algorithm (IL-S3VM) based on the path followi...
Due to the highly dynamic nature of flight operations, the prediction for flight delay has been a global problem. At the same time, existed traditional prediction models have difficulty capturing sequence information of delay, which may be caused by the subsequent transmission of delay. In this paper, a delay prediction method based on Long Short-T...
Purpose
In large-scale monitoring systems, sensors in different locations are deployed to collect massive useful time-series data, which can help in real-time data analytics and its related applications. However, affected by hardware device itself, sensor nodes often fail to work, resulting in a common phenomenon that the collected data are incompl...
In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble method...
Feature selection plays an important role in data mining and recognition, especially in the large scale text, image and biological data. Specifically, the class label information is unavailable to guide the selection of minimal feature subset in unsupervised feature selection, which is challenging and interesting. An unsupervised feature selection...
Feature selection (FS) plays an important role in data mining and recognition, especially regarding large scale text, images and biological data. The Markov blanket provides a complete and sound solution to the selection of optimal features in supervised feature selection, and investigates thoroughly the relevance of features relating to class and...
AdaBoost has been theoretically and empirically proved to be a very successful ensemble learning algorithm, which iteratively generates a set of diverse weak learners and combines their outputs using the weighted majority voting rule as the final decision. However, in some cases, AdaBoost leads to overfitting especially for mislabeled noisy trainin...
The importance of metrics in machine learning and pattern recognition algorithms has led to an increasing interest for optimizing distance metrics in recent years. Most of the state-of-the-art methods focus on learning Mahalanobis distances and the learned metrics are in turn heavily used for the nearest neighbor-based classification (NN). However,...
In the past two decades, some successful ensemble learning algorithms have been proposed, typically as Bagging, AdaBoost, DECORATE, etc. Although all adopting diversity-based combination, the former two algorithms are generated by manipulating the original training set, while the latter one by augmenting the original training set using artificial t...
Data acquisition anomalies often occur in remote monitoring. In this paper, a software solution is presented to discover the abnormal monitoring node and predict the monitored data, rather than using hardware maintenance. Firstly, by analyzing the distribution characteristics of the monitoring data from each node, the highly correlated nodes of the...
Airport noise prediction is of vital importance to the planning and designing of airports and flights, as well as in controlling airport noise. Benefited from the development of the Internet of things, large-scale noise monitoring systems have been developed and applied to monitor the airport noise, thus a large amount of real-time noise data has b...
Fisher discriminant analysis (FDA) is a classic supervised dimensionality reduction method in statistical pattern recognition. FDA can maximize the scatter between different classes, while minimizing the scatter within each class. As it only utilizes the labeled data and ignores the unlabeled data in the analysis process of FDA, it cannot be used t...
Flight delay prediction remains an important research topic due to the dynamics of the flight operating process. To solve this problem, a dynamic data-driven approach from the control area has been introduced, in which real-time data was collected and injected into the prediction process to get more accurate and reliable results. In the case of pre...
Flight delay prediction remains an important research topic due to dynamic nature in flight operation and numerous delay factors. Dynamic data-driven application system in the control area can provide a solution to this problem. However, in order to apply the approach, a state-space flight delay model needs to be established to represent the relati...
Flight delay prediction remains an important research topic due to its dynamic nature. Dynamic data-driven approach might
provide a solution to this problem. To apply the approach, a flight delay state-space model is required to represent relationship
among system states, as well as relationship between system states and input/output variables. Bas...
Flight delay early warning can reduce the negative impact of the delay. Determining the delay grade of each interval is essentially a multi-class classification problem. This paper presents a flight delay early warning model based on a fuzzy support vector machine with weighted margin (WMSVM) , which adjust the penalties to samples and the margins...
Ranking support vector machine (RSVM) learning is equivalent to solving a convex quadratic programming problem. Currently there exists some difficulties for exact online ranking learning. This paper presents an exact and effective method that can solve the online ranking learning problem and shows the feasibility and finite convergence of the algor...