Conference PaperPDF Available

Comparing Frequentist and Bayesian Approaches for Forecasting Binary Inference Performance

Authors:

Abstract and Figures

In this paper, we compare forecasts of the quality of inferences made by an inference enterprise generated from a frequentist perspective and a Bayesian perspective. An inference enterprise (IE) is an organizational entity that uses data, tools, people, and processes to make mission-focused inferences. When evaluating changes to an IE, the quality of the inferences that a new, hypothetical IE makes is uncertain. We can model quality or performance metric-such as recall, precision, and false positive rate-uncertainty as probability distributions generated either through a frequentist approach or a Bayesian approach. In the frequentist approach, we run several experiments evaluating inference quality and fit a distribution to the results. In the Bayesian approach, we update prior performance beliefs with empirical results. We compare the two approaches in eighteen forecast questions and score the two sets of forecasts against ground truth answers. Both approaches forecast similar performance means, but the frequentist approach systematically produces wider confidence intervals. Therefore, the frequentist approach out-scores the Bayesian approach in metrics sensitive to interval width.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
We address the problems of 1/ assessing the confidence of the standard point estimates, precision, recall and F-score, and 2/ comparing the results, in terms of precision, recall and F-score, obtained using two different methods. To do so, we use a probabilistic setting which allows us to obtain posterior distributions on these performance indicators, rather than point estimates. This framework is applied to the case where different methods are run on different datasets from the same source, as well as the standard situation where competing results are obtained on the same data.
Article
Full-text available
This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class, multi-labelled, and hierarchical. For each classification task, the study relates a set of changes in a confusion matrix to specific characteristics of data. Then the analysis concentrates on the type of changes to a confusion matrix that do not change a measure, therefore, preserve a classifier’s evaluation (measure invariance). The result is the measure invariance taxonomy with respect to all relevant label distribution changes in a classification problem. This formal analysis is supported by examples of applications where invariance properties of measures lead to a more reliable evaluation of classifiers. Text classification supplements the discussion with several case studies.
Chapter
Organizations employ a suite of analytical models to solve complex decision problems in their respective domains. The current practice uses different simulation and modeling formalisms and subject matter experts to address parts of a larger problem. There is a realization that complex problems cannot be solved by employing a single analytical methodology and its supporting tools; rather, they require a combination of several such methods, all supplementing or complementing each other. We propose the use of multiformalism-based modeling and analysis to assist in evaluating performance of insider threat detection systems. The paper proposes a multimodeling test bed that allows integration of multiple modeling and analysis techniques that can digest and correlate different sources of data and provide insights on performance of insider threat detection systems. © Springer International Publishing AG 2018. All rights are reserved.
Article
The lack of data sets derived from operational enterprise networks continues to be a critical deficiency in the cyber security research community. Unfortunately, releasing viable data sets to the larger com- munity is challenging for a number of reasons, primarily the difficulty of balancing security and privacy concerns against the fidelity and utility of the data. This chapter discusses the importance of cyber secu- rity research data sets and introduces a large data set derived from the operational network environment at Los Alamos National Laboratory. The hope is that this data set and associated discussion will act as a catalyst for both new research in cyber security as well as motivation for other organizations to release similar data sets to the community.
Article
Originally a talk delivered at a conference on Bayesian statistics, this article attempts to answer the following question: why is most scientific data analysis carried out in a non-Bayesian framework? The argument consists mainly of some practical examples of data analysis, in which the Bayesian approach is difficult but Fisherian/frequentist solutions are relatively easy. There is a brief discussion of objectivity in statistical analyses and of the difficulties of achieving objectivity within a Bayesian framework. The article ends with a list of practical advantages of Fisherian/frequentist methods, which so far seem to have outweighed the philosophical superiority of Bayesianism.
Article
There are many things that I am uncertain about, says Tony O'Hagan. Some are merely unknown to me, while others are unknowable. This article is about different kinds of uncertainty, and how the distinction between them impinges on the foundations of Probability and Statistics.
Bayesian versus frequentist inference. Bayesian evaluation of informative hypotheses
  • E-J Wagenmakers
  • M Lee
  • T Lodewyckx
  • G J Iverson
Wagenmakers E-J, Lee M, Lodewyckx T and Iverson GJ. Bayesian versus frequentist inference. Bayesian evaluation of informative hypotheses. Springer, 2008, p. 181-207.
The theory of probability
  • H Jeffreys
Jeffreys H. The theory of probability. OUP Oxford, 1998.