-
[show abstract]
[hide abstract]
ABSTRACT: The central criterion of feature selection is that good feature sets contain features that are highly correlated with the output, yet uncorrelated with each other. Based on this criterion, we address the problem of feature selection through correlation-based feature clustering and support vector machine (SVM) based feature ranking. Correlation-based clustering is proposed to group features into some clusters based on the correlation between two features. As a result, a feature is highly correlated to any other feature in the same cluster but uncorrelated to the features in other clusters. From each cluster, we select a feature as the delegate based on its influence quantities on the output. The influence quantities are measured by the feature sensitivity in the SVM. The proposed approach can identify relevant features and eliminate redundancy among them effectively. The effectiveness of the proposed approach is demonstrated through comparisons with other methods using real-world data with different dimensions. © 2011 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
IEEJ Transactions on Electrical and Electronic Engineering 02/2011; 6(2):173 - 179. · 0.36 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper proposes a local linear multi-SVM method based on composite kernel for solving classification tasks in gene function prediction. The proposed method realizes a nonlinear separating boundary by estimating a series of piecewise linear boundaries. Firstly, according to the distribution information of training data, a guided partitioning approach composed of separating boundary detection and clustering technique is used to obtain local subsets, and each subset is utilized to capture prior knowledge of corresponding local linear boundary. Secondly, a composite kernel is introduced to realize the local linear multi-SVM model. Instead of building multiple local SVM models separately, the prior knowledge of local subsets is used to construct a composite kernel, then the local linear multi-SVM model is realized by using the composite kernel exactly in the same way as a single SVM model. Experimental results on benchmark datasets demonstrate that the proposed method improves the classification performance efficiently.
Nature and Biologically Inspired Computing (NaBIC), 2010 Second World Congress on; 01/2011
-
[show abstract]
[hide abstract]
ABSTRACT: This paper proposes an improved Hierarchical Multi-label Classification (HMC) method for solving the gene function prediction. The HMC task is transferred into a series of binary SVM classification tasks. By introducing the hierarchy constraint into learning procedures, two measures with incorporating prior information are implemented to improve the HMC performance. Firstly, for imbalanced functional classes, a hierarchical SMOTE is proposed as over-sampling preprocessing to improve the SVM learning performance. Secondly, an improved True Path Rule consistency approach is introduced to ensemble the results of binary probabilistic SVM classifications. It can improve the classification results and guarantee the hierarchy constraint of classes.
Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on; 01/2011
-
[show abstract]
[hide abstract]
ABSTRACT: The central criterion of feature selection is that good feature sets contain features that are highly correlated with the output, yet uncorrelated with each other. Based on this criterion, we address the problem of feature selection through correlation-based feature clustering and support vector machine (SVM) based feature ranking. Correlation-based clustering is proposed to group features into some clusters based on the correlation between two features. As a result, a feature is highly correlated to any other feature in the same cluster but uncorrelated to the features in other clusters. From each cluster, we select a feature as the delegate based on its influence quantities on the output. The influence quantities are measured by the feature sensitivity in the SVM. The proposed approach can identify relevant features and eliminate redundancy among them effectively. The effectiveness of the proposed approach is demonstrated through comparisons with other methods using real-world data with different dimensions. 2011 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
01/2011; 6:173-179.
-
[show abstract]
[hide abstract]
ABSTRACT: Multi-label classification is an extension of traditional classification problem in which each instance is associated with a set of labels. Recent research has shown that the ranking approach is an effective way to solve this problem. In the multi-labeled sets, classes are often related to each other. Some implicit constraint rules are existed among the labels. So we present a novel multi-label ranking algorithm inspired by the pairwise constraint rules mined from the training set to enhance the existing method. In this method, one-against-all decomposition technique is used firstly to divide a multi-label problem into binary class sub-problems. A rank list is generated by combining the probabilistic outputs of each binary Support Vector Machine (SVM) classifier. Label constraint rules are learned by minimizing the ranking loss. Experimental performance evaluation on well-known multi-label benchmark datasets show that our method improves the classification accuracy efficiently, compared with some existed methods.
Systems Man and Cybernetics (SMC), 2010 IEEE International Conference on; 11/2010
-
[show abstract]
[hide abstract]
ABSTRACT: Sequencing by hybridization is a promising cost-effective technology for high-throughput DNA sequencing via microarray chips. However, due to the effects of spectrum errors rooted from experimental conditions, a fast and accurate reconstruction of original sequences has become a challenging problem. In the last decade, a variety of analyses and designs have been tried to overcome this problem, where different strategies have different tradeoffs in speed and accuracy. Motivated by the idea that the errors could be identified by analyzing the interrelation of spectrum elements, this paper presents a new constructive heuristic algorithm, featuring an accurate reconstruction guided by a set of well-defined criteria and rules. The experiments on benchmark instance sets demonstrate that the proposed method can reconstruct long DNA sequences more accurately than current approaches in the literature.
BioInformatics and BioEngineering (BIBE), 2010 IEEE International Conference on; 07/2010
-
International Joint Conference on Neural Networks, IJCNN 2010, Barcelona, Spain, 18-23 July, 2010; 01/2010
-
Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2010, Barcelona, Spain, 18-23 July 2010; 01/2010
-
International Joint Conference on Neural Networks, IJCNN 2010, Barcelona, Spain, 18-23 July, 2010; 01/2010
-
IEICE Transactions. 01/2010; 93-A:1792-1799.
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, a multi-label classification method based on label ranking and delicate boundary Support Vector Machine (SVM) is proposed for solving the functional genomics applications. Firstly, an improved probabilistic SVM with delicate decision boundary is used as scoring approach to obtain a proper label rank. Secondly, an instance-dependent thresholding strategy is proposed to decide classification results. A d-folds validation approach is utilised to determine a set of target thresholds for all training samples as teachers, then an appropriate instance-dependent threshold for each testing instance is obtained by applying k-Nearest Neighbours (KNN) strategy on this teacher threshold set.
International Journal of Computational Biology and Drug Design 01/2010; 3(2):133-45.
-
[show abstract]
[hide abstract]
ABSTRACT: Protein structure prediction (PSP) is one of the most important problems in computational biology. This chapter introduces
a novel hybrid Estimation of Distribution Algorithm (EDA) to solve the PSP problem on HP model. Firstly, a composite fitness
function containing the information of folding structure core (H-Core) is introduced to replace the traditional fitness function
of HP model. The new fitness function is expected to select better individuals for probabilistic model of EDA. Secondly, local
search with guided operators is utilized to refine found solutions for improving efficiency of EDA. Thirdly, an improved backtracking-based
repairing method is introduced to repair invalid individuals sampled by the probabilistic model of EDA. It can significantly
reduce the number of backtracking searching operation and the computational cost for long sequence protein. Experimental results
demonstrate that the new method outperforms the basic EDAs method. At the same time, it is very competitive with other existing
algorithms for the PSP problem on lattice HP models.
12/2009: pages 193-214;
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper, we propose a novel rule deductive method to mine the real demanded association rules for any given user. This method does not like the most existing methods that mine frequent itemsets starting from candidate two-itemsets to candidate (n-1)-itemsets with inductive method and produce huge rough rules on these frequent itemsets. On the contrary, it avoids producing huge amounts of frequent itemsets contained by their upper long frequent itemsets and can interact with users by making them pick up their interested items to deduce the final interesting association rules. Moreover, it can do dynamic response to users in any time when users want to check whether their interested frequent itemsets have been founded. Its several dynamic response strategies have been proposed. These dynamic response algorithms can find most long frequent itemsets in initial time. Therefore, users can find their interested rules in short time with high probability. So, our method also can be used applied in online data mining.
ICCAS-SICE, 2009; 09/2009
-
[show abstract]
[hide abstract]
ABSTRACT: Many evolutionary algorithm (EA) based methods have been proposed to solve protein structure prediction (PSP) problem in HP-lattice model. One of common difficulties of those methods is the existence of invalid individuals produced by geometrical constraints in the conformation of protein (i.e. self-avoidance in the chain). A backtracking method is often used to repair the invalid individuals of genetic search in those methods. However, there is a disadvantage in basic backtracking method, the repairing computational cost is very heavy for long sequence instances. This paper proposes an improved backtracking-based repairing method for long sequence protein folding. A detection procedure is added in backtracking method to avoid entering invalid closed areas when selecting directions for the residues. Experimental results show that the proposed method can significantly reduce the number of backtracking searching operations and the computational cost for the long protein sequences.
ICCAS-SICE, 2009; 09/2009
-
[show abstract]
[hide abstract]
ABSTRACT: In a standard support vector machine (SVM), the training process has O(n<sup>3</sup>) time and O(n<sup>2</sup>) space complexities, where n is the size of training dataset. Thus, it is computationally infeasible for very large datasets. Reducing the size of training dataset is naturally considered to solve this problem. SVM classifiers depend on only support vectors (SVs) that lie close to the separation boundary. Therefore, we need to reserve the samples that are likely to be SVs. In this paper, we propose a method based on the edge detection technique to detect these samples. To preserve the entire distribution properties, we also use a clustering algorithm such as K-means to calculate the centroids of clusters. The samples selected by edge detector and the centroids of clusters are used to reconstruct the training dataset. The reconstructed training dataset with a smaller size makes the training process much faster, but without degrading the classification accuracies.
Neural Networks, 2009. IJCNN 2009. International Joint Conference on; 07/2009
-
[show abstract]
[hide abstract]
ABSTRACT: The protein structure prediction (PSP) problem is one of the most important problems in computational biology. This paper proposes a novel Estimation of Distribution Algorithms (EDAs) based method to solve the PSP problem on HP model. Firstly, a composite fitness function containing the information of folding structure core formation is introduced to replace the traditional fitness function of HP model. It can help to select more optimum individuals for probabilistic model of EDAs algorithm. And a set of guided operators are used to increase the diversity of population and the likelihood of escaping from local optima. Secondly, an improved backtracking repairing algorithm is proposed to repair invalid individuals sampled by the probabilistic model of EDAs for the long sequence protein instances. A detection procedure of feasibility is added to avoid entering invalid closed areas when selecting directions for the residues. Thus, it can significant reduce the number of backtracking operation and the computational cost for long sequence protein. Experimental results demonstrate that the proposed method outperform the basic EDAs method. At the same time, it is very competitive with the other existing algorithms for the PSP problem on lattice HP models.
Evolutionary Computation, 2009. CEC '09. IEEE Congress on; 06/2009
-
World Congress on Nature & Biologically Inspired Computing, NaBIC 2009, 9-11 December 2009, Coimbatore, India; 01/2009
-
JACIII. 01/2009; 13:91-96.
-
JACIII. 01/2009; 13:407-415.
-
Appl. Soft Comput. 01/2009; 9:393-403.