PresentationPDF Available

A quantum framework for likelihood ratios

Authors:

Abstract

Pseudodiagnosticity ; Quantum probability ; Quantum Bayes' theorem ; Relational Information Theory
A quantum framework for likelihood ratios
RACHAEL BOND
December 12th, 2015
University of Sussex
The annual scientific meeting of the
Mathematical, Statistical, & Computing Psychology Section
of the British Psychological Society
 
 
r.l.bond@sussex.ac.uk www.rachaelbond.com
@rachael_bond rlb.me/pdf1215
Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Pseudodiagnosticity
Is probability subjective?
Describing an objective reality
Deconstructing the contingency table
Quantum mechanics 101
Describing the wave function
Solving the “c” functions
The objective covariate probability
The implications for psychology
The relational information seeker
Conclusions
References
1. Pseudodiagnosticity
Pseudo-
diagnosticity
Doherty,
Mynatt,
Tweney, &
Schiavo [1]
“An undersea explorer has found a
pot with a square base that has
been made from smooth clay.
Using the information below, you
must decide from which of two
nearby islands it came. You may
select one more piece of
information to help you make your
decision.
Pseudo-
diagnosticity
Pseudo-
diagnosticity
Doherty,
Mynatt,
Tweney, &
Schiavo [1]
ShellIs. CoralIs.
#Finds 10 10
%Smooth 80 ?
%Sq.base ? ?
Pseudo-
diagnosticity
Pseudo-
diagnosticity
Doherty,
Mynatt,
Tweney, &
Schiavo [1]
ShellIs. CoralIs.
#Finds 10 10
%Smooth 80
%Sq.base  
Doherty et al. expected their
participants to select the paired
datum to the given “anchor
information” in order to calculate a
Bayes' ratio. The majority didn't.
Pseudo-
diagnosticity
“Pseudodiagnosticity is clearly disfunctional.
~ Doherty, Mynatt, Tweney, & Schiavo (1979) , p. 121
[1]
What if all the data are known?
ShellIs. CoralIs.
#Finds(Baserate)10 10
#Smoothclay 8 7
#Squarebase 6 5
What if all the data are known?
Base 10 10
8 7
6 5
To calculate the value
using Bayes' theorem, this
expression must be solved
However, the measures of
covariate intersection, ie.
, are unknowns.
What if all the data are known?
Base 10 10
8 7
6 5
Doherty et al. suggest that the
data should be treated as
conditionally independent. This
allows for a simple estimation of
from the multiplication of
marginal probabilities
What if all the data are known?
However, it would also be reasonable to note that the
covariate intersections form ranges:
ie.,
What if all the data are known?
Base 10 10
8 7
6 5
This means that it is also possible
to calculate a probability from the
mean value of these ranges:
What if all the data are known?
Base 10 10
8 7
6 5
Or, to take the mean value of the
minimum→maximum probability
range:
What if all the data are known?
Base 10 10
8 7
6 5
Other possible approaches
include regression analysis, which
would assume a low level of co-
linearity, or using an expectation-
maximisation algorithm (eg., see
Dempster, Laird, & Rubin, 1977)[2]
2. Is probability subjective?
Is probability subjective?
Given the variety of probability values which may be
reasonably calculated, one may conclude that there is
no objectively correct likelihood ratio.
Is probability subjective?
Given the variety of probability values which may be
reasonably calculated, one may conclude that there is
no objectively correct likelihood ratio.
The subjective nature of probability has moved to the
centre of statistical research since Bruno de Finetti
claimed that “probability does not exist”.
(de Finetti, 1974)[3]
de Finetti's subjective
view of probability may
be found in
epistemological
research, and modern
statistics, eg., the
“quantum Bayesian”
work of Caves, Fuchs, &
Schack (2002) [4]
Bruno de Finetti (1906-1985)
“As far as the laws of
mathematics refer to
reality, they are not
certain; and as far as
they are certain, they do
not refer to reality.
(Geometry & Experience,
1921)
Albert Einstein (1879-1955)
3. Describing an objective reality
Describing an objective reality
(384-322 BCE) argued that “ ” is
described by the unity of form and substance:
“substance” being what something is made from,
and “form” being its innate characteristics.
Aristotlereality
Describing an objective reality
(384-322 BCE) argued that “ ” is
described by the unity of form and substance:
“substance” being what something is made from,
and “form” being its innate characteristics.
In the contingency table, the “substances” (ie., the
differentiating characteristics), and their “forms” (ie.,
their values), are known. Yet an objective probability
value cannot be calculated from this description of the
table's reality.
Aristotlereality
In the “ ” (1922)
Wittgenstein said that
“the world is the totality
of facts”, and that “it is
the relationship
between facts and there
being all the facts”.
Tractatus
Ludwig Wittgenstein (1889-1951)
Jacques Derrida believed
that the relationships
between facts can only
be discovered through a
process of
“ ”.deconstruction
Jacques Derrida (1930-2004)
4. Deconstructing the contingency table
Deconstructing the contingency table
Assuming, for the moment, the case of even base rates,
the contingency table may be deconstructed into 4
sub-contingency tables ...
8 7
6 5
8 6 7 5
Deconstructing the contingency table
... each of which provides two pieces of “pure”
information generated from the facts of and .
These are not logically separable.
8 7
6 5
8 6 7 5
Deconstructing the contingency table
While the relationships between and are
known (they are mutually exclusive), the relationships
between and cannot be stated.
{
{
8 6 7 5
Deconstructing the contingency table
What is needed is a mathematical approach which
allows the covariate intersections to be directly
mapped to and .
Deconstructing the contingency table
What is needed is a mathematical approach which
allows the covariate intersections to be directly
mapped to and .
In other words, the contingency table's internal
relationships must be rewritten in a way that includes
the covariate intersections, but does not make any
structural changes. This can only be achieved by using
the mathematics of quantum mechanics.
5. Quantum mechanics 101
There are many competing models of quantum mechanics.
Multiverse
theory
String theory
Decoherence
theory
The Copenhagen
interpretation
In 1935 Niels Bohr
suggested that
psychology & quantum
mechanics might be
linked, but it is only
recently that research
has been conducted in
this field.
Niels Bohr (1885-1962)
Quantum mechanics 101
Instead of the used in
classical statistics, quantum mechanics works in
.
joint probability spaces
vector
spaces
Quantum mechanics 101
Instead of the used in
classical statistics, quantum mechanics works in
.
The vectors are normalised which are
orthogonal to each other in n-dimensions.
joint probability spaces
vector
spaces
wave functions
Quantum mechanics 101
Instead of the used in
classical statistics, quantum mechanics works in
.
The vectors are normalised which are
orthogonal to each other in n-dimensions.
In psychology these vectors could, for instance,
represent attitudes, beliefs, or intent etc.
joint probability spaces
vector
spaces
wave functions
Quantum mechanics 101
Using the Dirac (1939) ” notation, the wave
functions are described by horizontal matrices known
as “kets”, written as
[5] bra-ket
Quantum mechanics 101
Using the Dirac (1939) ” notation, the wave
functions are described by horizontal matrices known
as “kets”, written as
Their “ ” form vertical
matrix “bras”, written as
[5] bra-ket
complex conjugate transposes
Quantum mechanics 101
Using the Dirac (1939) ” notation, the wave
functions are described by horizontal matrices known
as “kets”, written as
Their “ ” form vertical
matrix “bras”, written as
Any ket multiplied by its own bra is “ ,
meaning that
[5] bra-ket
complex conjugate transposes
orthonormal
6. Describing the wave function
Describing the wave function
8 7
6 5
The four pieces of “pure” information may be written
as kets. The acts as a logical “AND”,
re-enforcing the inseparability of and .
tensor product
Describing the wave function
8 7
6 5
Each of the kets is automatically orthonormal
and forms an basis
of a .
eigenstate
Hilbert (vector) space
Describing the wave function
8 7
6 5
It is tempting to describe the covariate intersection as
being the simple of and .
However, this would give an expression which would
mix the whole of and the whole of .
entanglement
Describing the wave function
8 7
6 5
Instead we need to look at the “ ,
which are usually interpreted as giving the
of a ket collapsing into a bra.
inner products
probability
amplitude
Describing the wave function
8 7
6 5
The bra can only collapse into the ket
if the inner product contains
both and . As a consequence,
the inner product is a measure of covariate overlap.
Describing the wave function
8 7
6 5
The reverse, complex conjugate transposed,
inner product is also true.
Describing the wave function
8 7
6 5
Because both inner products are real, and consistent
with the conditional independence of and ,
it follows that they also equal to each other.
Describing the wave function
8 7
6 5
all other bra-kets
Thus, the complete quantum contingency table
consists of 4 orthonormal kets, and 2 inner products.
It exactly matches the classical description.
Describing the wave function
8 7
6 5
all other bra-kets
To provide a full Hilbert space description, the inner
products must be mapped to (ie., incorporated into)
the base kets. This may be achieved using the
process (see, eg., Strang, 1980) .
Gram-
Schmidt [6]
Describing the wave function
8 7
6 5
all other bra-kets
The process orthonormalizes the base
kets with respect to the inner product, and acts as a
to generate a new
of the original Hilbert space.
Gram-Schmidt
unitary operatorisomorphic
representation
Describing the wave function
In doing so, it returns four base kets that give a full
system description and includes the inner products.
This allows the fully normalized system wave function
to be described.
Describing the wave function
The correct expression for may be
found through rearrangement.
Describing the wave function
This expression fully generalizes, and the individual
elements may be weighted to incorporate the prior
distributions.
7. Solving the functions
Solving the functions
There are known features of which may be used to
generate constraints. These include “data
dependence”: must be, in some way, dependent
upon the data in the table;
Solving the functions
a “valid probability range”: the values of must
fall between 0 and 1;
Solving the functions
“complementarity”: the law of total probability requires
that the sum of all probabilities=1;
Solving the functions
“symmetry”: the exchanging of rows in the contingency
table should not affect the calculated probability
value, and if the columns are exchanged then the
values should map;
Solving the functions
→
→
→
→
→
→
“known probabilities”: there are certain contingency
table structures which must return specific
probabilities.
Solving the functions
Using these principles and constraints demonstrates
that are anti-symmetric bivariate functional
equations, to which only one solution exists.
8. The objective covariate probability
The objective covariate probability
8 7
6 5
Substituting in the derived functional expressions
allows for a final probability to be calculated.
9. The implications for psychology
The implications for psychology
“Calculating probabilities for predicting performance”
With only 10 data points in the “pot“ example, there is
not much difference between 0.5896 (QT) and 0.578
(classical Bayes' theorem) and is unlikely to affect
ordinal predictions. However, in modelling phenomena
based on thousands, or millions, of data points (eg., in
perception, memory, social learning etc.) this
difference will matter a lot more.
The implications for psychology
“Predicting new phenomena”
Bayesian learning lends itself to modelling systems
that develop linearly. However, humans often show
nonlinear, sometimes seemingly nondeterministic,
behaviours, such as sudden switches in strategy that
don't necessarily accord with the available data.
10. The relational information seeker
The relational information seeker
We conducted an experiment with a larger, 3x4,
contingency table, giving the participants (n=150) 5
degrees of freedom in their selections.
For the first 4 selections, the choices made followed an
information gain model, based on Shannon's entropy,
with a significance of for each choice (using
a Chi-squared test of predicted selection against
random).
The relational information seeker
However, the final selection demonstrated a strategy
change towards “weak” information. This suggests that
the search process only follows information theory in-
so-far as it is required to identify the diagnostically
important relationships.
The relational information seeker
However, the final selection demonstrated a strategy
change towards “weak” information. This suggests that
the search processonly follows information theory in-
so-far as it is required to identify the diagnostically
important relationships.
This is not the same as mental model building. Rather,
information search refines the mental representation
created by the question.
The relational information seeker
It is unclear as to whether these relationships are
classical, or quantum, in nature.
11. Conclusions
Conclusions
Any full description of objective reality may have to
include mathematical concepts that only exist in
quantum mechanics.
Conclusions
Any full description of objective reality may have to
include mathematical concepts that only exist in
quantum mechanics.
Quantum mechanics can describe models, and provide
solutions to them, which lie beyond the scope of
classical mathematics.
Conclusions
Any full description of objective reality may have to
include mathematical concepts that only exist in
quantum mechanics.
Quantum mechanics can describe models, and provide
solutions to them, which lie beyond the scope of
classical mathematics.
Bayes' theorem is a special case of a more general,
quantum mechanical expression.
Download this presentation from
http://rlb.me/pdf1215
RACHAEL BOND
University of Sussex
PROFESSOR TOM ORMEROD
University of Sussex
PROFESSOR YANG-HUI HE
City University; Nankai University;
Merton college, Oxford University
References
[1] Doherty, M.E., Mynatt, C.R., Tweney, R.D., & Schiavo, M.D. (1979).
Pseudodiagnosticity. Acta Psychologica, vol. 43(2), pp. 111-121. doi:
10.1016/0001-6918(79)90017-9
[2] Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood
from incomplete data via the EM algorithm. Journal of the Royal
Statistical Society. Series B (Statistical Methodology), vol. 39(1), pp. 1-38.
[3] de Finetti, B. (1974). Theory of probability: A critical introductory
treatment. New York, New York: Wiley.
[4] Caves, C.M., Fuchs, C.A., & Schack, R. (2002). Unknown quantum states:
The quantum de Finetti representation. Journal of Mathematical
Physics, vol. 43(9), pp. 4537-4559. doi: 10.1063/1.1494475
[5] Dirac, P.A.M. (1939). A new notation for quantum mechanics.
Mathematical Proceedings of the Cambridge Philosophical Society, vol.
35(03), pp. 416-418. doi: 10.1017/S0305004100021162
[6] Strang, G. (1980). Linear algebra and its applications (2nd ed.). New
York, New York: Academic Press.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We present an elementary proof of the quantum de Finetti representation theorem, a quantum analogue of de Finetti's classical theorem on exchangeable probability assignments. This contrasts with the original proof of Hudson and Moody [Z. Wahrschein. verw. Geb. 33, 343 (1976)], which relies on advanced mathematics and does not share the same potential for generalization. The classical de Finetti theorem provides an operational definition of the concept of an unknown probability in Bayesian probability theory, where probabilities are taken to be degrees of belief instead of objective states of nature. The quantum de Finetti theorem, in a closely analogous fashion, deals with exchangeable density-operator assignments and provides an operational definition of the concept of an ``unknown quantum state'' in quantum-state tomography. This result is especially important for information-based interpretations of quantum mechanics, where quantum states, like probabilities, are taken to be states of knowledge rather than states of nature. We further demonstrate that the theorem fails for real Hilbert spaces and discuss the significance of this point.
Book
De Finetti's theory of probability is one of the foundations of Bayesian theory. De Finetti stated that probability is nothing but a subjective analysis of the likelihood that something will happen and that that probability does not exist outside the mind. It is the rate at which a person is willing to bet on something happening. This view is directly opposed to the classicist/ frequentist view of the likelihood of a particular outcome of an event, which assumes that the same event could be identically repeated many times over, and the 'probability' of a particular outcome has to do with the fraction of the time that outcome results from the repeated trials.
Article
In mathematical theories the question of notation, while not of primary importance, is yet worthy of careful consideration, since a good notation can be of great value in helping the development of a theory, by making it easy to write down those quantities or combinations of quantities that are important, and difficult or impossible to write down those that are unimportant. The summation convention in tensor analysis is an example, illustrating how specially appropriate a notation can be.(Received April 29 1939)
Article
Subjects selected data in order to decide from which of two ‘islands’ an ‘archeological find’ had come. The results replicated two established phenomena in cognitive psychology: (1) the tendency to ignore base rate data given individuating information, and (2) the tendency to seek confirmatory evidence.The major outcome of the study was, however, to reveal a new phenomenon in information search. Subjects displayed a surprising and strong tendency to seek diagnostically worthless information. They then altered their conclusion based on that information. For example, subjects who had already obtained P(D1/H1) selected P(D2/H1) when P(D1/H2) was equally easily available, and when they had no relevant experience to bring to bear on the estimation of P(D1/H2). This phenomenon, which appears to be a wholly dysfunctional cognitive tendency, was labeled pseudodiagnosticity.
Article
A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.