[Show abstract][Hide abstract] ABSTRACT: In this paper, a new support vector machines (SVM) parameter tuning scheme that uses the fruit fly optimization algorithm (FOA) is proposed. Termed as FOA-SVM, the scheme is successfully applied to medical diagnosis. In the proposed FOA-SVM, the FOA technique effectively and efficiently addresses the parameter set in SVM. Additionally, the effectiveness and efficiency of FOA-SVM is rigorously evaluated against four well-known medical datasets, including the Wisconsin breast cancer dataset, the Pima Indians diabetes dataset, the Parkinson dataset, and the thyroid disease dataset, in terms of classification accuracy, sensitivity, specificity, AUC (the area under the receiver operating characteristic (ROC) curve) criterion, and processing time. Four competitive counterparts are employed for comparison purposes, including the particle swarm optimization algorithm-based SVM (PSO-SVM), genetic algorithm-based SVM (GA-SVM), bacterial forging optimization-based SVM (BFO-SVM), and grid search technique-based SVM (Grid-SVM). The empirical results demonstrate that the proposed FOA-SVM method can obtain much more appropriate model parameters as well as significantly reduce the computational time, which generates a high classification accuracy. Promisingly, the proposed method can be regarded as a useful clinical tool for medical decision making.
[Show abstract][Hide abstract] ABSTRACT: The number of the overweight people continues to rise across the world. Studies have shown that being overweight can increase health risks, such as high blood pressure, diabetes mellitus, coronary heart disease, and certain forms of cancer. Therefore, identifying the overweight status in people is critical to prevent and decrease health risks. This study explores a new technique that uses blood and biochemical measurements to recognize the overweight condition. A new machine learning technique, an extreme learning machine, was developed to accurately detect the overweight status from a pool of 225 overweight and 251 healthy subjects. The group included 179 males and 297 females. The detection method was rigorously evaluated against the real-life dataset for accuracy, sensitivity, specificity, and AUC (area under the receiver operating characteristic (ROC) curve) criterion. Additionally, the feature selection was investigated to identify correlating factors for the overweight status. The results demonstrate that there are significant differences in blood and biochemical indexes between healthy and overweight people (p-value < 0.01). According to the feature selection, the most important correlated indexes are creatinine, hemoglobin, hematokrit, uric Acid, red blood cells, high density lipoprotein, alanine transaminase, triglyceride, and γ-glutamyl transpeptidase. These are consistent with the results of Spearman test analysis. The proposed method holds promise as a new, accurate method for identifying the overweight status in subjects.
[Show abstract][Hide abstract] ABSTRACT: Foreign fibers in cotton seriously affect the quality of cotton products. Online detection systems of foreign fibers based on machine vision are the efficient tools to minimize the harmful effects of foreign fibers. The optimum feature set with small size and high accuracy can efficiently improve the performance of online detection systems. To find the optimal feature sets, a two-stage feature selection algorithm combining IG (Information Gain) approach and BPSO (Binary Particle Swarm Optimization) is proposed for foreign fiber data. In the first stage, IG approach is used to filter noisy features, and the BPSO uses the classifier accuracy as a fitness function to select the highly discriminating features in the second stage. The proposed algorithm is tested on foreign fiber dataset. The experimental results show that the proposed algorithm can efficiently find the feature subsets with smaller size and higher accuracy than other algorithms.
[Show abstract][Hide abstract] ABSTRACT: With the advent of big data era, efficiently and effectively querying useful information on the Web, the largest heterogeneous data source in the world, is becoming increasingly challenging. Page ranking is an essential component of search engines because it determines the presentation sequence of the tens of millions of returned pages associated with a single query. It therefore plays a significant role in regulating the search quality and user experience for information retrieval. When measuring the authority of a web page, most methods focus on the quantity and the quality of the neighborhood pages that direct to it using inbound hyperlinks. However, these methods ignore the diversity of such neighborhood pages, which we believe is an important metric for objectively evaluating web page authority. In comparison with true authority pages that usually contain a large number of inbound hyperlinks from a wide variety of sources, it is difficult for fake authorities, which boost their page rank using techniques such as link farms, to occupy the high diversity of inbound hyperlinks due to prohibitively high costs. We propose a probabilistic counting-based method to quantitatively and efficiently compute the diversity of inbound hyperlinks. We then propose a novel link-based ranking algorithm, named Drank, to rank pages by simultaneously analyzing the quantity, quality and diversity of their inbound hyperlinks. The validations on both synthetic and real-world data show that Drank outperforms other state-of-the-art methods in terms of both finding high-quality pages and suppressing web spams.
No preview · Article · Jan 2015 · Knowledge-Based Systems
[Show abstract][Hide abstract] ABSTRACT: The selection plays an important role in the machine-vision-based online detection of foreign fibers in cotton because of improvement detection accuracy and speed. Feature sets of foreign fibers in cotton belong to multi-character feature sets. That means the high-quality feature sets of foreign fibers in cotton consist of three classes of features which are respectively the color, texture and shape features. The multi-character feature sets naturally contain a space constraint which lead to the smaller feature space than the general feature set with the same number of features, however the existing algorithms do not consider the space characteristic of multi-character feature sets and treat the multi-character feature sets as the general feature sets. This paper proposed an improved ant colony optimization for features election, whose objective is to find the (near) optimal subsets in multi-character feature sets. In the proposed algorithm, group constraint is adopted to limit subset constructing process and probability transition for reducing the effect of invalid subsets and improve the convergence efficiency. As a result, the algorithm can effectively find the high-quality subsets in the feature space of multi-character feature sets. The proposed algorithm is tested in the datasets of foreign fibers in cotton and comparisons with other methods are also made. The experimental results show that the proposed algorithm can find the high-quality subsets with smaller size and high classification accuracy. This is very important to improve performance of online detection systems of foreign fibers in cotton.
Full-text · Article · Nov 2014 · Applied Soft Computing
[Show abstract][Hide abstract] ABSTRACT: Proper parameter settings of support vector machine (SVM) and feature selection are of great importance to its efficiency and accuracy. In this paper, we propose a parallel time variant particle swarm optimization (TVPSO) algorithm to simultaneously perform the parameter optimization and feature selection for SVM, termed PTVPSO-SVM. It is implemented in a parallel environment using Parallel Virtual Machine (PVM). In the proposed method, a weighted function is adopted to design the objective function of PSO, which takes into account the average classification accuracy rates (ACC) of SVM, the number of support vectors (SVs) and the selected features simultaneously. Furthermore, mutation operators are introduced to overcome the problem of the premature convergence of PSO algorithm. In addition, an improved binary PSO algorithm is employed to enhance the performance of PSO algorithm in feature selection task. The performance of the proposed method is compared with that of other methods on a comprehensive set of 30 benchmark data sets. The empirical results demonstrate that the proposed method cannot only obtain much more appropriate model parameters, discriminative feature subset as well as smaller sets of SVs but also significantly reduce the computational time, giving high predictive accuracy.
[Show abstract][Hide abstract] ABSTRACT: Stochastic blockmodel (SBM) has recently come into the spotlight in the domains of social network analysis and statistical machine learning, as it enables us to decompose and then analyze an exploratory network without knowing any priori information about its intrinsic structure. However, the prohibitive computational cost limits SBM learning algorithm with the capability of model selection to small network with hundreds of nodes. This paper presents a fine-gained SBM and its fast learning algorithm, named FSL, which ingeniously combines the component-wise EM (CEM) algorithm and minimum message length (MML) together to achieve the parallel learning of parameter estimation and model evaluation. The FSL significantly reduces the time complexity of the learning algorithm, and scales to network with thousands of nodes. The experimental results indicate that the FSL can achieve the best tradeoff between effectiveness and efficiency through greatly reducing learning time while preserving competitive learning accuracy. Moreover, it is noteworthy that our proposed method shows its excellent generalization ability through the application of link prediction.
[Show abstract][Hide abstract] ABSTRACT: Given a network and a group of target nodes, the task of proximity alignment is to find out a sequence of nodes that are the most relevant to the targets in terms of the linkage structure of the network. Proximity alignment will find important applications in many areas such as online recommendation in e-commerce and infectious disease controlling in public healthcare. In spite of great efforts having been made to design various metrics of similarities and centralities in terms of network structure, to the best of our knowledge, there have been no studies in the literature that address the issue of proximity alignment by explicitly and adequately exploring the intrinsic connections between macroscopic community structure and microscopic node proximities. However, the influence of community structure on proximity alignment is indispensable not only because they are ubiquitous in real-world networks but also they can characterize node proximity in a more natural way. In this work, a novel proximity alignment method called the PAA is proposed to address this problem. The PAA first decomposes the given network into communities based on its global structure and then compute node proximities based on the local structure of communities. In this way, the solution of the PAA is expected to be more reasonable in the sense of both global and local relevance among nodes being sufficiently considered during the process of proximity aligning. To handle large-scale networks, the PAA is implemented by a proposed online-offline schema, in which expensive computations such as community detection will be done offline so that online queries can be quickly responded by calculating node proximities in an efficient way based on indexed communities. The efficacy and the applications of the PAA have been validated and demonstrated. Our work shows that the PAA outperforms existing methods and enables us to explore real-world networks from a novel perspective.
No preview · Article · Jan 2014 · Integrated Computer Aided Engineering
[Show abstract][Hide abstract] ABSTRACT: Discovery of communities in complex networks is a fundamental data analysis task in various domains. Generative models are a promising class of techniques for identifying modular properties from networks, which has been actively discussed recently. However, most of them cannot preserve the degree sequence of networks, which will distort the community detection results. Rather than using a blockmodel as most current works do, here we generalize a configuration model, namely, a null model of modularity, to solve this problem. Towards decomposing and combining sub-graphs according to the soft community memberships, our model incorporates the ability to describe community structures, something the original model does not have. Also, it has the property, as with the original model, that it fixes the expected degree sequence to be the same as that of the observed network. We combine both the community property and degree sequence preserving into a single unified model, which gives better community results compared with other models. Thereafter, we learn the model using a technique of nonnegative matrix factorization and determine the number of communities by applying consensus clustering. We test this approach both on synthetic benchmarks and on real-world networks, and compare it with two similar methods. The experimental results demonstrate the superior performance of our method over competing methods in detecting both disjoint and overlapping communities.
No preview · Article · Sep 2013 · Journal of Statistical Mechanics Theory and Experiment
[Show abstract][Hide abstract] ABSTRACT: Network community mining algorithms aim at efficiently and effectively discovering all such communities from a given network. Many related methods have been proposed and applied to different areas including social network analysis, gene network analysis and web clustering engines. Most of the existing methods for mining communities are centralized. In this paper, we present a multi-agent based decentralized algorithm, in which a group of autonomous agents work together to mine a network through a proposed self-aggregation and self-organization mechanism. Thanks to its decentralized feature, our method is potentially suitable for dealing with distributed networks, whose global structures are hard to obtain due to their geographical distributions, decentralized controls or huge sizes. The effectiveness of our method has been tested against different benchmark networks.
No preview · Article · Jun 2013 · Mathematical and Computer Modelling
[Show abstract][Hide abstract] ABSTRACT: In this paper, we propose a multi-layer ant-based algorithm MABA, which
detects communities from networks by means of locally optimizing
modularity using individual ants. The basic version of MABA, namely
SABA, combines a self-avoiding label propagation technique with a
simulated annealing strategy for ant diffusion in networks. Once the
communities are found by SABA, this method can be reapplied to a higher
level network where each obtained community is regarded as a new vertex.
The aforementioned process is repeated iteratively, and this corresponds
to MABA. Thanks to the intrinsic multi-level nature of our algorithm, it
possesses the potential ability to unfold multi-scale hierarchical
structures. Furthermore, MABA has the ability that mitigates the
resolution limit of modularity. The proposed MABA has been evaluated on
both computer-generated benchmarks and widely used real-world networks,
and has been compared with a set of competitive algorithms. Experimental
results demonstrate that MABA is both effective and efficient (in near
linear time with respect to the size of network) for discovering
No preview · Article · Mar 2013 · Advances in Complex Systems
[Show abstract][Hide abstract] ABSTRACT: In order to further improve the performance of current genetic
algorithms aiming at discovering communities, a local search based
genetic algorithm GALS is here proposed. The core of GALS is a local
search based mutation technique. In order to overcome the drawbacks of
traditional mutation methods, the paper develops the concept of marginal
gene and then the local monotonicity of modularity function Q is deduced
from each nodes local view. Based on these two elements, a new mutation
method combined with a local search strategy is presented. GALS has been
evaluated on both synthetic benchmarks and several real networks, and
compared with some presently competing algorithms. Experimental results
show that GALS is highly effective and efficient for discovering
No preview · Article · Mar 2013 · International Journal of Computational Intelligence Systems
[Show abstract][Hide abstract] ABSTRACT: Community structure is ubiquitous in real-world networks and community detection is of fundamental importance in many applications. Although considerable efforts have been made to address the task, the objective of seeking a good trade-off between effectiveness and efficiency, especially in the case of large-scale networks, remains challenging. This paper explores the nature of community structure from a probabilistic perspective and introduces a novel community detection algorithm named as PMC, which stands for probabilistically mining communities, to meet the challenging objective. In PMC, community detection is modeled as a constrained quadratic optimization problem that can be efficiently solved by a random walk based heuristic. The performance of PMC has been rigorously validated through comparisons with six representative methods against both synthetic and real-world networks with different scales. Moreover, two applications of analyzing real-world networks by means of PMC have been demonstrated.
No preview · Article · Jan 2013 · Data & Knowledge Engineering
[Show abstract][Hide abstract] ABSTRACT: The real-world continuous double auction CDA market is a dynamic environment. However, most of the existing agent bidding strategies are simply designed for static markets. A new detecting method for bidding strategy is necessary for more practical simulations and applications. In this paper, we present a novel agent-based computing approach called the GDX Plus GDXP model. In the proposed model, trades are decided according to the market events in history combined with the forecast of market trends. The GDXP model employs a dynamic adjustment mechanism to make the bidding strategy adapt to the shocks in a dynamic environment. The experimental results of the comparison between GDXP and other typical models, with respect to both static and dynamic CDA markets, demonstrate the performance of the GDXP model.
No preview · Article · Jan 2013 · Web Intelligence and Agent Systems
[Show abstract][Hide abstract] ABSTRACT: Malaria transmission can be affected by multiple or even hidden factors, making it difficult to timely and accurately predict the impact of elimination and eradication programs that have been undertaken and the potential resurgence and spread that may continue to emerge. One approach at the moment is to develop and deploy surveillance systems in an attempt to identify them as timely as possible and thus to enable policy makers to modify and implement strategies for further preventing the transmission. Most of the surveillance data will be of temporal and spatial nature. From an interdisciplinary point of view, it would be interesting to ask the following important as well as challenging question: Based on the available surveillance data in temporal and spatial forms, how can we build a more effective surveillance mechanism for monitoring and early detecting the relative prevalence and transmission patterns of malaria? What we can note from the existing clustering-based surveillance software systems is that they do not infer the underlying transmission networks of malaria. However, such networks can be quite informative and insightful as they characterize how malaria transmits from one place to another. They can also in turn allow public health policy makers and researchers to uncover the hidden and interacting factors such as environment, genetics and ecology and to discover/predict malaria transmission patterns/trends. The network perspective further extends the present approaches to modelling malaria transmission based on a set of chosen factors. In this article, we survey the related work on transmission network inference, discuss how such an approach can be utilized in developing an effective computational means for inferring malaria transmission networks based on partial surveillance data, and what methodological steps and issues may be involved in its formulation and validation.
Full-text · Article · Nov 2012 · Infectious Diseases of Poverty
[Show abstract][Hide abstract] ABSTRACT: Discovery of communities in complex networks is a fundamental data analysis problem with applications in various domains. Most of the existing approaches have focused on discovering communities of nodes, while recent studies have shown great advantages and utilities of the knowledge of communities of links in networks. From this new perspective, we propose a link dynamics based algorithm, called UELC, for identifying link communities of networks. In UELC, the stochastic process of a link–node–link random walk is employed to unfold an embedded bipartition structure of links in a network. The local mixing properties of the Markov chain underlying the random walk are then utilized to extract two emerging link communities. Further, the random walk and the bipartitioning processes are wrapped in an iterative subdivision strategy to recursively identify link partitions that segregate the network links into multiple subdivisions. We evaluate the performance of the new method on synthetic benchmarks and demonstrate its utility on real-world networks. Our experimental results show that our method is highly effective for discovering link communities in complex networks. As a comparison, we also extend UELC to extracting communities of nodes, and show that it is effective for node community identification.
No preview · Article · Oct 2012 · Journal of Statistical Mechanics Theory and Experiment
[Show abstract][Hide abstract] ABSTRACT: Network communities refer to groups of vertices within which their connecting links are dense but between which they are sparse. A network community mining problem (or NCMP for short) is concerned with the problem of finding all such communities from a given network. A wide variety of applications can be formulated as NCMPs, ranging from social and/or biological network analysis to web mining and searching. So far, many algorithms addressing NCMPs have been developed and most of them fall into the categories of either optimization based or heuristic methods. Distinct from the existing studies, the work presented in this paper explores the notion of network communities and their properties based on the dynamics of a stochastic model naturally introduced. In the paper, a relationship between the hierarchical community structure of a network and the local mixing properties of such a stochastic model has been established with the large-deviation theory. Topological information regarding to the community structures hidden in networks can be inferred from their spectral signatures. Based on the above-mentioned relationship, this work proposes a general framework for characterizing, analyzing, and mining network communities. Utilizing the two basic properties of metastability, i.e., being locally uniform and temporarily fixed, an efficient implementation of the framework, called the LM algorithm, has been developed that can scalably mine communities hidden in large-scale networks. The effectiveness and efficiency of the LM algorithm have been theoretically analyzed as well as experimentally validated.
Preview · Article · Mar 2012 · IEEE Transactions on Knowledge and Data Engineering
[Show abstract][Hide abstract] ABSTRACT: To accurately and actively provide users with their potentially interested information or services is the main task of a recommender system. Collaborative filtering is one of the most widely adopted recommender methods, whereas it is suffering the issue of sparse rating data that will severely degenerate the quality of recommendations. To address this issue, the article proposes a novel method, named the FTRA (Fusing Trust and Ratings), trying to improve the performance of collaborative filtering recommendation by means of elaborately integrating twofold sparse information, i.e., the conventional rating data given by users and the social trust network among the same users. The performance of FTRA is rigorously validated by comparing it with six representative methods on a real-world dataset. The experimental results show that the FTRA outperforms all other competitors in terms of both precision and recall. More importantly, our work suggests that the strategy of augmenting sparse rating data by fusing trust networks does significantly improve the quality of conventional collaborative filtering recommendation, and its quality could be further improved by means of designing more effective integrating schemes.