Maximum Margin Bayesian Network Classifiers

Dept. of Electr. Eng., Graz Univ. of Technol., Graz, Austria
IEEE Transactions on Pattern Analysis and Machine Intelligence (Impact Factor: 4.8). 04/2012; DOI: 10.1109/TPAMI.2011.149
Source: IEEE Xplore

ABSTRACT We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient (CG) method for optimization. In contrast to previous approaches, we maintain the normalization constraints on the parameters of the Bayesian network during optimization, i.e., the probabilistic interpretation of the model is not lost. This enables us to handle missing features in discriminatively optimized Bayesian networks. In experiments, we compare the classification performance of maximum margin parameter learning to conditional likelihood and maximum likelihood learning approaches. Discriminative parameter learning significantly outperforms generative maximum likelihood estimation for naive Bayes and tree augmented naive Bayes structures on all considered data sets. Furthermore, maximizing the margin dominates the conditional likelihood approach in terms of classification performance in most cases. We provide results for a recently proposed maximum margin optimization approach based on convex relaxation [1]. While the classification results are highly similar, our CG-based optimization is computationally up to orders of magnitude faster. Margin-optimized Bayesian network classifiers achieve classification performance comparable to support vector machines (SVMs) using fewer parameters. Moreover, we show that unanticipated missing feature values during classification can be easily processed by discriminatively optimized Bayesian network classifiers, a case where discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The margin criterion for parameter learning in graphical models gained significant impact over the last years. We use the maximum margin score for discriminatively optimizing the structure of Bayesian network classifiers. Furthermore, greedy hill-climbing and simulated annealing search heuristics are applied to determine the classifier structures. In the experiments, we demonstrate the advantages of maximum margin optimized Bayesian network structures in terms of classification performance compared to traditionally used discriminative structure learning methods. Stochastic simulated annealing requires less score evaluations than greedy heuristics. Additionally, we compare generative and discriminative parameter learning on both generatively and discriminatively structured Bayesian network classifiers. Margin-optimized Bayesian network classifiers achieve similar classification performance as support vector machines. Moreover, missing feature values during classification can be handled by discriminatively optimized Bayesian network classifiers, a case where purely discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.
    Pattern Recognition 02/2013; 46(2):464–471. · 2.63 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Analyzing brain networks from neuroimages is becom- ing a promising approach in identifying novel connectivity- based biomarkers for the Alzheimer’s disease (AD). In this regard, brain “effective connectivity” analysis, which stud- ies the causal relationship among brain regions, is highly challenging and of many research opportunities. Most of the existing works in this field use generative methods. De- spite their success in data representation and other impor- tant merits, generative methods are not necessarily discrim- inative, which may cause the ignorance of subtle but criti- cal disease-induced changes. In this paper, we propose a learning-based approach that integrates the benefits of gen- erative and discriminative methods to recover effective con- nectivity. In particular, we employ Fisher kernel to bridge the generative models of sparse Bayesian networks (SBN) and the discriminative classifiers of SVMs, and convert the SBN parameter learning to Fisher kernel learning via min- imizing a generalization error bound of SVMs. Our method is able to simultaneously boost the discriminative power of both the generative SBN models and the SBN-induced SVM classifiers via Fisher kernel. The proposed method is tested on analyzing brain effective connectivity for AD from ADNI data, and demonstrates significant improvements over the state-of-the-art work.
    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon; 06/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper studies Fisher linear discriminants (FLDs) based on classification accuracies for imbalanced datasets. An optimal threshold is found out from a series of empirical formulas developed, which is related not only to sample sizes but also to distribution regions. A mixed binary-decimal coding system is suggested to make the very dense datasets sparse and enlarge the class margins on condition that the neighborhood relationships of samples are nearly preserved. The within-class scatter matrices being or approximately singular should be moderately reduced in dimensionality but not added with tiny perturbations. The weight vectors can be further updated by a kind of epoch-limited (three at most) iterative learning strategy provided that the current training error rates come down accordingly. Putting the above ideas together, this paper proposes a type of integrated FLDs. The extensive experimental results over real-world datasets have demonstrated that the integrated FLDs have obvious advantages over the conventional FLDs in the aspects of learning and generalization performances for the imbalanced datasets.
    Pattern Recognition. 02/2014; 47(2):789-805.

Full-text (2 Sources)

Available from
May 26, 2014