Maximum Margin Bayesian Network Classifiers

Dept. of Electr. Eng., Graz Univ. of Technol., Graz, Austria
IEEE Transactions on Pattern Analysis and Machine Intelligence (Impact Factor: 4.8). 04/2012; DOI: 10.1109/TPAMI.2011.149
Source: IEEE Xplore

ABSTRACT We present a maximum margin parameter learning algorithm for Bayesian network classifiers using a conjugate gradient (CG) method for optimization. In contrast to previous approaches, we maintain the normalization constraints on the parameters of the Bayesian network during optimization, i.e., the probabilistic interpretation of the model is not lost. This enables us to handle missing features in discriminatively optimized Bayesian networks. In experiments, we compare the classification performance of maximum margin parameter learning to conditional likelihood and maximum likelihood learning approaches. Discriminative parameter learning significantly outperforms generative maximum likelihood estimation for naive Bayes and tree augmented naive Bayes structures on all considered data sets. Furthermore, maximizing the margin dominates the conditional likelihood approach in terms of classification performance in most cases. We provide results for a recently proposed maximum margin optimization approach based on convex relaxation [1]. While the classification results are highly similar, our CG-based optimization is computationally up to orders of magnitude faster. Margin-optimized Bayesian network classifiers achieve classification performance comparable to support vector machines (SVMs) using fewer parameters. Moreover, we show that unanticipated missing feature values during classification can be easily processed by discriminatively optimized Bayesian network classifiers, a case where discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.

  • [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we tackle the problem of structure learning for Bayesian network classifiers (BNC). Searching for an appropriate structure is a challenging task since the number of possible structures grows exponentially with the number of attributes. We formulate this search problem as a large Markov Decision Process (MDP). This allows us to tackle the problem using sequential decision making methods. Furthermore, we devise a Monte Carlo tree search algorithm to find a tractable solution for the MDP. The use of bandit-based action selection strategy enables us to have a systematic way of guiding the search, making the search in the large space of unrestricted structures tractable. The results of classification on different datasets show that the use of this method can significantly boost the performance of structure learning for BNCs.
    Proceedings of the 19th international conference on Neural Information Processing - Volume Part II; 11/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper studies Fisher linear discriminants (FLDs) based on classification accuracies for imbalanced datasets. An optimal threshold is found out from a series of empirical formulas developed, which is related not only to sample sizes but also to distribution regions. A mixed binary-decimal coding system is suggested to make the very dense datasets sparse and enlarge the class margins on condition that the neighborhood relationships of samples are nearly preserved. The within-class scatter matrices being or approximately singular should be moderately reduced in dimensionality but not added with tiny perturbations. The weight vectors can be further updated by a kind of epoch-limited (three at most) iterative learning strategy provided that the current training error rates come down accordingly. Putting the above ideas together, this paper proposes a type of integrated FLDs. The extensive experimental results over real-world datasets have demonstrated that the integrated FLDs have obvious advantages over the conventional FLDs in the aspects of learning and generalization performances for the imbalanced datasets.
    Pattern Recognition. 02/2014; 47(2):789-805.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Bayesian network classifiers are probabilistic classifiers achieving good classification rates in various applications. These classifiers consist of a directed acyclic graph and a set of conditional proba-bility densities, which in case of discrete-valued nodes can be repre-sented by conditional probability tables. In this paper, we investigate the effect of quantizing these conditional probabilities. We derive worst-case and best-case bounds on the classification rate using in-terval arithmetic. Furthermore, we determine performance bounds that hold with a user specified confidence using quantization theory. Our results emphasize that only small bit-widths are necessary to achieve good classification rates.
    IEEE International Conference on Acoustics, Speech and Signal Processing; 01/2013

Full-text (2 Sources)

Available from
May 26, 2014