Maximum Margin Bayesian Network Classifiers

Dept. of Electr. Eng., Graz Univ. of Technol., Graz, Austria
IEEE Transactions on Pattern Analysis and Machine Intelligence (Impact Factor: 5.69). 04/2012; 34(3):521 - 532. DOI: 10.1109/TPAMI.2011.149
Source: IEEE Xplore

ABSTRACT We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient (CG) method for optimization. In contrast to previous approaches, we maintain the normalization constraints on the parameters of the Bayesian network during optimization, i.e., the probabilistic interpretation of the model is not lost. This enables us to handle missing features in discriminatively optimized Bayesian networks. In experiments, we compare the classification performance of maximum margin parameter learning to conditional likelihood and maximum likelihood learning approaches. Discriminative parameter learning significantly outperforms generative maximum likelihood estimation for naive Bayes and tree augmented naive Bayes structures on all considered data sets. Furthermore, maximizing the margin dominates the conditional likelihood approach in terms of classification performance in most cases. We provide results for a recently proposed maximum margin optimization approach based on convex relaxation [1]. While the classification results are highly similar, our CG-based optimization is computationally up to orders of magnitude faster. Margin-optimized Bayesian network classifiers achieve classification performance comparable to support vector machines (SVMs) using fewer parameters. Moreover, we show that unanticipated missing feature values during classification can be easily processed by discriminatively optimized Bayesian network classifiers, a case where discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.

Download full-text


Available from: Franz Pernkopf, Aug 07, 2015
1 Follower
  • Source
    • "The resulting algorithm, called stochastic discriminative EM (sdEM), is an online-EM-type algorithm that can train generative probabilistic models belonging to the exponential family using a wide range of discriminative loss functions , such as the negative conditional log-likelihood or the Hinge loss. In opposite to other discriminative learning approaches [26], models trained by sdEM can deal with missing data and latent variables in a principled way either when being learned or when making predictions, because at any moment they always define a joint probability distribution. sdEM could be used for learning using large scale data sets due to its stochastic approximation nature and, as we will show, because it allows to compute the natural gradient of the loss function with no extra cost [3]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Stochastic discriminative EM (sdEM) is an online-EM-type algorithm for discriminative training of probabilistic generative models belonging to the exponential family. In this work, we introduce and justify this algorithm as a stochastic natural gradient descent method, i.e. a method which accounts for the information geometry in the parameter space of the statistical model. We show how this learning algorithm can be used to train probabilistic generative models by minimizing different discriminative loss functions, such as the negative conditional log-likelihood and the Hinge loss. The resulting models trained by sdEM are always generative (i.e. they define a joint probability distribution) and, in consequence, allows to deal with missing data and latent variables in a principled way either when being learned or when making predictions. The performance of this method is illustrated by several text classification problems for which a multinomial naive Bayes and a latent Dirichlet allocation based classifier are learned using different discriminative loss functions.
  • Source
    • "a) Discriminatively versus generatively optimized parameters: Here, we compare the classification performance of BNCs with MAP parameters and of BNCs with MM parameters over varying numbers of bits used for quantization. MM parameters are determined using the algorithm described in [4]. The structures considered are NB and TAN-CMI. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Bayesian network classifier (BNCs) are typically implemented on nowadays desktop computers. However, many real world applications require classifier implementation on embedded or low power systems. Aspects for this purpose have not been studied rigorously. We partly close this gap by analyzing reduced precision implementations of BNCs. In detail, we investigate the quantization of the parameters of BNCs with discrete valued nodes including the implications on the classification rate (CR). We derive worst-case and probabilistic bounds on the CR for different bit-widths. These bounds are evaluated on several benchmark datasets. Furthermore, we compare the classification performance and the robustness of BNCs with generatively and discriminatively optimized parameters, i.e. parameters optimized for high data likelihood and parameters optimized for classification, with respect to parameter quantization. Generatively optimized parameters are more robust for very low bit-widths, i.e. less classifications change because of quantization. However, classification performance is better for discriminatively optimized parameters for all but very low bit-widths. Additionally, we perform analysis for margin-optimized tree augmented network (TAN) structures which outperform generatively optimized TAN structures in terms of CR and robustness.
    IEEE Transactions on Pattern Analysis and Machine Intelligence 01/2014; DOI:10.1109/TPAMI.2014.2353620 · 5.69 Impact Factor
  • Source
    • "Recent progress in [10] [11] for learning discriminative BNs follows the conventional two-stage approach and works for discrete variables. They may not be suitable for brain network analysis where the brain regional measurements are usually continuous variables. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Analyzing brain networks from neuroimages is becom- ing a promising approach in identifying novel connectivity- based biomarkers for the Alzheimer’s disease (AD). In this regard, brain “effective connectivity” analysis, which stud- ies the causal relationship among brain regions, is highly challenging and of many research opportunities. Most of the existing works in this field use generative methods. De- spite their success in data representation and other impor- tant merits, generative methods are not necessarily discrim- inative, which may cause the ignorance of subtle but criti- cal disease-induced changes. In this paper, we propose a learning-based approach that integrates the benefits of gen- erative and discriminative methods to recover effective con- nectivity. In particular, we employ Fisher kernel to bridge the generative models of sparse Bayesian networks (SBN) and the discriminative classifiers of SVMs, and convert the SBN parameter learning to Fisher kernel learning via min- imizing a generalization error bound of SVMs. Our method is able to simultaneously boost the discriminative power of both the generative SBN models and the SBN-induced SVM classifiers via Fisher kernel. The proposed method is tested on analyzing brain effective connectivity for AD from ADNI data, and demonstrates significant improvements over the state-of-the-art work.
    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon; 06/2013
Show more