Bo Yang

Jilin University, Yung-chi, Jilin Sheng, China

Are you Bo Yang?

Claim your profile

Publications (59)27.09 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Proper parameter settings of support vector machine (SVM) and feature selection are of great importance to its efficiency and accuracy. In this paper, we propose a parallel time variant particle swarm optimization (TVPSO) algorithm to simultaneously perform the parameter optimization and feature selection for SVM, termed PTVPSO-SVM. It is implemented in a parallel environment using Parallel Virtual Machine (PVM). In the proposed method, a weighted function is adopted to design the objective function of PSO, which takes into account the average classification accuracy rates (ACC) of SVM, the number of support vectors (SVs) and the selected features simultaneously. Furthermore, mutation operators are introduced to overcome the problem of the premature convergence of PSO algorithm. In addition, an improved binary PSO algorithm is employed to enhance the performance of PSO algorithm in feature selection task. The performance of the proposed method is compared with that of other methods on a comprehensive set of 30 benchmark data sets. The empirical results demonstrate that the proposed method cannot only obtain much more appropriate model parameters, discriminative feature subset as well as smaller sets of SVs but also significantly reduce the computational time, giving high predictive accuracy.
    Applied Mathematics and Computation 07/2014; 239:180–197. · 1.35 Impact Factor
  • Applied Soft Computing. 01/2014; 24:585–596.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Given a network and a group of target nodes, the task of proximity alignment is to find out a sequence of nodes that are the most relevant to the targets in terms of the linkage structure of the network. Proximity alignment will find important applications in many areas such as online recommendation in e-commerce and infectious disease controlling in public healthcare. In spite of great efforts having been made to design various metrics of similarities and centralities in terms of network structure, to the best of our knowledge, there have been no studies in the literature that address the issue of proximity alignment by explicitly and adequately exploring the intrinsic connections between macroscopic community structure and microscopic node proximities. However, the influence of community structure on proximity alignment is indispensable not only because they are ubiquitous in real-world networks but also they can characterize node proximity in a more natural way. In this work, a novel proximity alignment method called the PAA is proposed to address this problem. The PAA first decomposes the given network into communities based on its global structure and then compute node proximities based on the local structure of communities. In this way, the solution of the PAA is expected to be more reasonable in the sense of both global and local relevance among nodes being sufficiently considered during the process of proximity aligning. To handle large-scale networks, the PAA is implemented by a proposed online-offline schema, in which expensive computations such as community detection will be done offline so that online queries can be quickly responded by calculating node proximities in an efficient way based on indexed communities. The efficacy and the applications of the PAA have been validated and demonstrated. Our work shows that the PAA outperforms existing methods and enables us to explore real-world networks from a novel perspective.
    Integrated Computer Aided Engineering 01/2014; 21(1):59-76. · 3.37 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Discovery of communities in complex networks is a fundamental data analysis task in various domains. Generative models are a promising class of techniques for identifying modular properties from networks, which has been actively discussed recently. However, most of them cannot preserve the degree sequence of networks, which will distort the community detection results. Rather than using a blockmodel as most current works do, here we generalize a configuration model, namely, a null model of modularity, to solve this problem. Towards decomposing and combining sub-graphs according to the soft community memberships, our model incorporates the ability to describe community structures, something the original model does not have. Also, it has the property, as with the original model, that it fixes the expected degree sequence to be the same as that of the observed network. We combine both the community property and degree sequence preserving into a single unified model, which gives better community results compared with other models. Thereafter, we learn the model using a technique of nonnegative matrix factorization and determine the number of communities by applying consensus clustering. We test this approach both on synthetic benchmarks and on real-world networks, and compare it with two similar methods. The experimental results demonstrate the superior performance of our method over competing methods in detecting both disjoint and overlapping communities.
    Journal of Statistical Mechanics Theory and Experiment 09/2013; 2013(09):P09013. · 1.87 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Network community mining algorithms aim at efficiently and effectively discovering all such communities from a given network. Many related methods have been proposed and applied to different areas including social network analysis, gene network analysis and web clustering engines. Most of the existing methods for mining communities are centralized. In this paper, we present a multi-agent based decentralized algorithm, in which a group of autonomous agents work together to mine a network through a proposed self-aggregation and self-organization mechanism. Thanks to its decentralized feature, our method is potentially suitable for dealing with distributed networks, whose global structures are hard to obtain due to their geographical distributions, decentralized controls or huge sizes. The effectiveness of our method has been tested against different benchmark networks.
    Mathematical and Computer Modelling 06/2013; 57(s 11–12):2998–3008. · 1.42 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a multi-layer ant-based algorithm MABA, which detects communities from networks by means of locally optimizing modularity using individual ants. The basic version of MABA, namely SABA, combines a self-avoiding label propagation technique with a simulated annealing strategy for ant diffusion in networks. Once the communities are found by SABA, this method can be reapplied to a higher level network where each obtained community is regarded as a new vertex. The aforementioned process is repeated iteratively, and this corresponds to MABA. Thanks to the intrinsic multi-level nature of our algorithm, it possesses the potential ability to unfold multi-scale hierarchical structures. Furthermore, MABA has the ability that mitigates the resolution limit of modularity. The proposed MABA has been evaluated on both computer-generated benchmarks and widely used real-world networks, and has been compared with a set of competitive algorithms. Experimental results demonstrate that MABA is both effective and efficient (in near linear time with respect to the size of network) for discovering communities.
    03/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: In order to further improve the performance of current genetic algorithms aiming at discovering communities, a local search based genetic algorithm GALS is here proposed. The core of GALS is a local search based mutation technique. In order to overcome the drawbacks of traditional mutation methods, the paper develops the concept of marginal gene and then the local monotonicity of modularity function Q is deduced from each nodes local view. Based on these two elements, a new mutation method combined with a local search strategy is presented. GALS has been evaluated on both synthetic benchmarks and several real networks, and compared with some presently competing algorithms. Experimental results show that GALS is highly effective and efficient for discovering community structure.
    International Journal of Computational Intelligence Systems 03/2013; 6(2). · 1.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The real-world continuous double auction CDA market is a dynamic environment. However, most of the existing agent bidding strategies are simply designed for static markets. A new detecting method for bidding strategy is necessary for more practical simulations and applications. In this paper, we present a novel agent-based computing approach called the GDX Plus GDXP model. In the proposed model, trades are decided according to the market events in history combined with the forecast of market trends. The GDXP model employs a dynamic adjustment mechanism to make the bidding strategy adapt to the shocks in a dynamic environment. The experimental results of the comparison between GDXP and other typical models, with respect to both static and dynamic CDA markets, demonstrate the performance of the GDXP model.
    Web Intelligence and Agent Systems 01/2013; 11(1):55-65.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Community structure is ubiquitous in real-world networks and community detection is of fundamental importance in many applications. Although considerable efforts have been made to address the task, the objective of seeking a good trade-off between effectiveness and efficiency, especially in the case of large-scale networks, remains challenging. This paper explores the nature of community structure from a probabilistic perspective and introduces a novel community detection algorithm named as PMC, which stands for probabilistically mining communities, to meet the challenging objective. In PMC, community detection is modeled as a constrained quadratic optimization problem that can be efficiently solved by a random walk based heuristic. The performance of PMC has been rigorously validated through comparisons with six representative methods against both synthetic and real-world networks with different scales. Moreover, two applications of analyzing real-world networks by means of PMC have been demonstrated.
    Data & Knowledge Engineering. 01/2013; 83:20–38.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Discovery of communities in complex networks is a fundamental data analysis problem with applications in various domains. Most of the existing approaches have focused on discovering communities of nodes, while recent studies have shown great advantages and utilities of the knowledge of communities of links in networks. From this new perspective, we propose a link dynamics based algorithm, called UELC, for identifying link communities of networks. In UELC, the stochastic process of a link–node–link random walk is employed to unfold an embedded bipartition structure of links in a network. The local mixing properties of the Markov chain underlying the random walk are then utilized to extract two emerging link communities. Further, the random walk and the bipartitioning processes are wrapped in an iterative subdivision strategy to recursively identify link partitions that segregate the network links into multiple subdivisions. We evaluate the performance of the new method on synthetic benchmarks and demonstrate its utility on real-world networks. Our experimental results show that our method is highly effective for discovering link communities in complex networks. As a comparison, we also extend UELC to extracting communities of nodes, and show that it is effective for node community identification.
    Journal of Statistical Mechanics Theory and Experiment 10/2012; 2012(10):P10015. · 1.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Network communities refer to groups of vertices within which their connecting links are dense but between which they are sparse. A network community mining problem (or NCMP for short) is concerned with the problem of finding all such communities from a given network. A wide variety of applications can be formulated as NCMPs, ranging from social and/or biological network analysis to web mining and searching. So far, many algorithms addressing NCMPs have been developed and most of them fall into the categories of either optimization based or heuristic methods. Distinct from the existing studies, the work presented in this paper explores the notion of network communities and their properties based on the dynamics of a stochastic model naturally introduced. In the paper, a relationship between the hierarchical community structure of a network and the local mixing properties of such a stochastic model has been established with the large-deviation theory. Topological information regarding to the community structures hidden in networks can be inferred from their spectral signatures. Based on the above-mentioned relationship, this work proposes a general framework for characterizing, analyzing, and mining network communities. Utilizing the two basic properties of metastability, i.e., being locally uniform and temporarily fixed, an efficient implementation of the framework, called the LM algorithm, has been developed that can scalably mine communities hidden in large-scale networks. The effectiveness and efficiency of the LM algorithm have been theoretically analyzed as well as experimentally validated.
    IEEE Transactions on Knowledge and Data Engineering 03/2012; · 1.89 Impact Factor
  • Source
    IEEE Trans. Knowl. Data Eng. 01/2012; 24:326-337.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Malaria transmission can be affected by multiple or even hidden factors, making it difficult to timely and accurately predict the impact of elimination and eradication programs that have been undertaken and the potential resurgence and spread that may continue to emerge. One approach at the moment is to develop and deploy surveillance systems in an attempt to identify them as timely as possible and thus to enable policy makers to modify and implement strategies for further preventing the transmission. Most of the surveillance data will be of temporal and spatial nature. From an interdisciplinary point of view, it would be interesting to ask the following important as well as challenging question: Based on the available surveillance data in temporal and spatial forms, how can we build a more effective surveillance mechanism for monitoring and early detecting the relative prevalence and transmission patterns of malaria? What we can note from the existing clustering-based surveillance software systems is that they do not infer the underlying transmission networks of malaria. However, such networks can be quite informative and insightful as they characterize how malaria transmits from one place to another. They can also in turn allow public health policy makers and researchers to uncover the hidden and interacting factors such as environment, genetics and ecology and to discover/predict malaria transmission patterns/trends. The network perspective further extends the present approaches to modelling malaria transmission based on a set of chosen factors. In this article, we survey the related work on transmission network inference, discuss how such an approach can be utilized in developing an effective computational means for inferring malaria transmission networks based on partial surveillance data, and what methodological steps and issues may be involved in its formulation and validation.
    Infectious diseases of poverty. 01/2012; 1(1):11.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present an enhanced fuzzy k-nearest neighbor (FKNN) classifier based computer aided diagnostic (CAD) system for thyroid disease. The neighborhood size k and the fuzzy strength parameter m in FKNN classifier are adaptively specified by the particle swarm optimization (PSO) approach. The adaptive control parameters including time-varying acceleration coefficients (TVAC) and time-varying inertia weight (TVIW) are employed to efficiently control the local and global search ability of PSO algorithm. In addition, we have validated the effectiveness of the principle component analysis (PCA) in constructing a more discriminative subspace for classification. The effectiveness of the resultant CAD system, termed as PCA-PSO-FKNN, has been rigorously evaluated against the thyroid disease dataset, which is commonly used among researchers who use machine learning methods for thyroid disease diagnosis. Compared to the existing methods in previous studies, the proposed system has achieved the highest classification accuracy reported so far via 10-fold cross-validation (CV) analysis, with the mean accuracy of 98.82% and with the maximum accuracy of 99.09%. Promisingly, the proposed CAD system might serve as a new candidate of powerful tools for diagnosing thyroid disease with excellent performance.
    Journal of Medical Systems 12/2011; 36(5):3243-54. · 1.78 Impact Factor
  • Bo Yang, Jiming Liu, Dayou Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: Complex network theory provides a means for modeling and analyzing complex systems that consist of multiple and interdependent components. Among the studies on complex networks, structural analysis is of fundamental importance as it presents a natural route to understanding the dynamics, as well as to synthesizing or optimizing the functions, of networks. A wide spectrum of structural patterns of networks has been reported in the past decade, such as communities, multipartites, bipartite, hubs, authorities, outliers, and bow ties, among others. In this paper, we are interested in tackling the challenging task of characterizing and extracting multiplex patterns (multiple patterns as mentioned previously coexisting in the same networks in a complicated manner), which so far has not been explicitly and adequately addressed in the literature. Our work shows that such multiplex patterns can be well characterized as well as effectively extracted by means of a granular stochastic blockmodel, together with a set of related algorithms proposed here based on some machine learning and statistical inference ideas. These models and algorithms enable us to further explore complex networks from a novel perspective.
    IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society 10/2011; 42(2):469-81. · 3.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The detection of overlapping communities in complex networks has motivated recent research in relevant fields. Aiming to address this problem, we propose a Markov-dynamics-based algorithm, called UEOC, which means 'unfold and extract overlapping communities'. In UEOC, when identifying each natural community that overlaps, a Markov random walk method combined with a constraint strategy, which is based on the corresponding annealed network (degree conserving random network), is performed to unfold the community. Then, a cutoff criterion with the aid of a local community function, called conductance, which can be thought of as the ratio between the number of edges inside the community and those leaving it, is presented to extract this emerged community from the entire network. The UEOC algorithm depends on only one parameter whose value can be easily set, and it requires no prior knowledge of the hidden community structures. The proposed UEOC has been evaluated both on synthetic benchmarks and on some real-world networks, and has been compared with a set of competing algorithms. The experimental result has shown that UEOC is highly effective and efficient for discovering overlapping communities.
    Journal of Statistical Mechanics Theory and Experiment 05/2011; 2011(05):P05031. · 1.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This study proposes an efficient non-parametric classifier for bankruptcy prediction using an adaptive fuzzy k-nearest neighbor (FKNN) method, where the nearest neighbor k and the fuzzy strength parameter m are adaptively specified by the particle swarm optimization (PSO) approach. In addition to performing the parameter optimization for FKNN, PSO is utilized to choose the most discriminative subset of features for prediction as well. Time varying acceleration coefficients (TVAC) and inertia weight (TVIW) are employed to efficiently control the local and global search ability of PSO. Moreover, both the continuous and binary PSO are implemented in parallel on a multi-core platform. The resultant bankruptcy prediction model, named PTVPSO-FKNN, is compared with three classification methods on a real-world case. The obtained results clearly confirm the superiority of the developed model as compared to the other three methods in terms of Classification accuracy, Type I error, Type II error and AUC (area under the receiver operating characteristic (ROC) curve) criterion. It is also observed that the PTVPSO-FKNN is a powerful feature selection tool which has indentified a subset of best discriminative features. Additionally, the proposed model has gained a great deal of efficiency in terms of CPU time owing to the parallel implementation. KeywordsFuzzy k-nearest neighbor–Parallel computing–Particle swarm optimization–Feature selection–Bankruptcy prediction
    05/2011: pages 249-264;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Breast cancer is becoming a leading cause of death among women in the whole world, meanwhile, it is confirmed that the early detection and accurate diagnosis of this disease can ensure a long survival of the patients. In this paper, a swarm intelligence technique based support vector machine classifier (PSO_SVM) is proposed for breast cancer diagnosis. In the proposed PSO-SVM, the issue of model selection and feature selection in SVM is simultaneously solved under particle swarm (PSO optimization) framework. A weighted function is adopted to design the objective function of PSO, which takes into account the average accuracy rates of SVM (ACC), the number of support vectors (SVs) and the selected features simultaneously. Furthermore, time varying acceleration coefficients (TVAC) and inertia weight (TVIW) are employed to efficiently control the local and global search in PSO algorithm. The effectiveness of PSO-SVM has been rigorously evaluated against the Wisconsin Breast Cancer Dataset (WBCD), which is commonly used among researchers who use machine learning methods for breast cancer diagnosis. The proposed system is compared with the grid search method with feature selection by F-score. The experimental results demonstrate that the proposed approach not only obtains much more appropriate model parameters and discriminative feature subset, but also needs smaller set of SVs for training, giving high predictive accuracy. In addition, Compared to the existing methods in previous studies, the proposed system can also be regarded as a promising success with the excellent classification accuracy of 99.3% via 10-fold cross validation (CV) analysis. Moreover, a combination of five informative features is identified, which might provide important insights to the nature of the breast cancer disease and give an important clue for the physicians to take a closer attention. We believe the promising result can ensure that the physicians make very accurate diagnostic decision in clinical breast cancer diagnosis.
    Journal of Medical Systems 05/2011; 36(4):2505-19. · 1.78 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a three-stage expert system based on a hybrid support vector machines (SVM) approach to diagnose thyroid disease. Focusing on feature selection, the first stage aims at constructing diverse feature subsets with different discriminative capability. Switching from feature selection to model construction, in the second stage, the obtained feature subsets are fed into the designed SVM classifier for training an optimal predictor model whose parameters are optimized by particle swarm optimization (PSO). Finally, the obtained optimal SVM model proceeds to perform the thyroid disease diagnosis tasks using the most discriminative feature subset and the optimal parameters. The effectiveness of the proposed expert system (FS-PSO-SVM) has been rigorously evaluated against the thyroid disease dataset, which is commonly used among researchers who use machine learning methods for thyroid disease diagnosis. The proposed system has been compared with two other related methods including the SVM based on the Grid search technique (Grid-SVM) and the SVM based on Grid search and principle component analysis (PCA-Grid-SVM) in terms of their classification accuracy. Experimental results demonstrate that FS-PSO-SVM significantly outperforms the other ones. In addition, Compared to the existing methods in previous studies, the proposed system has achieved the highest classification accuracy reported so far by 10-fold cross-validation (CV) method, with the mean accuracy of 97.49% and with the maximum accuracy of 98.59%. Promisingly, the proposed FS-PSO-SVM expert system might serve as a new candidate of powerful tools for diagnosing thyroid disease with excellent performance.
    Journal of Medical Systems 02/2011; 36(3):1953-63. · 1.78 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Bankruptcy prediction is one of the most important issues in financial decision-making. Constructing effective corporate bankruptcy prediction models in time is essential to make companies or banks prevent from bankruptcy. This study proposes a novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor (FKNN) method, where the neighborhood size k and the fuzzy strength parameter m are adaptively specified by the continuous particle swarm optimization (PSO) approach. In addition to performing the parameter optimization for FKNN, PSO is also utilized to choose the most discriminative subset of features for prediction. Adaptive control parameters including time-varying acceleration coefficients (TVAC) and time-varying inertia weight (TVIW) are employed to efficiently control the local and global search ability of PSO algorithm. Moreover, both the continuous and binary PSO are implemented in parallel on a multi-core platform. The proposed bankruptcy prediction model, named PTVPSO-FKNN, is compared with five other state-of-the-art classifiers on two real-life cases. The obtained results clearly confirm the superiority of the proposed model in terms of classification accuracy, Type I error, Type II error and area under the receiver operating characteristic curve (AUC) criterion. The proposed model also demonstrates its ability to identify the most discriminative financial ratios. Additionally, the proposed model has reduced a large amount of computational time owing to its parallel implementation. Promisingly, PTVPSO-FKNN might serve as a new candidate of powerful early warning systems for bankruptcy prediction with excellent performance.
    Knowledge-Based Systems. 01/2011; 24:1348-1359.