Page 1
Practical Bayesian Data
Analysis from a Former
Frequentist
Frank E Harrell Jr
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
Box 800717 Charlottesville VA 22908 USA
fharrell@virginia.edu
hesweb1.med.virginia.edu/biostat
MASTERING STATISTICAL ISSUES IN DRUG DEVELOPMENT
HENRY STEWART CONFERENCE STUDIES
15-16 MAY 2000
Page 2
Abstract
Traditional statistical methods attempt to provide objective information
about treatment effects through the use of easily computed P–values. How-
ever, much controversy surrounds the use of P–values, including statistical
vs. clinical significance, artificiality of null hypotheses, 1–tailed vs. 2–tailed
tests, difficulty in interpreting confidence intervals, falsely interpreting non–
informative studies as ”negative”, arbitrariness in testing for equivalence, trad-
ing off type I and type II error, using P–values to quantify evidence, which
statistical test should be used for 2 × 2 frequency tables, α–spending and
adjusting for multiple comparisons, whether to adjust final P–values for the
intention of terminating a trial early even though it completed as planned, com-
plexity of group sequential monitoring procedures, and whether a promising but
statistically insignificant trial can be extended. Bayesian methods allow calcu-
lation of probabilities that are usually of more interest to consumers, e.g. the
probability that treatment A is similar to treatment B or the probability that
treatment A is at least 5% better than treatment B, and these methods are
simpler to use in monitoring ongoing trials. Bayesian methods are controver-
sial in that they require the use of a prior distribution for the treatment effect,
and calculations are more complex in spite of the concepts being simpler. This
talk will discuss advantages of estimation over hypothesis testing, basics of
the Bayesian approach, approaches to choosing the prior distribution and ar-
guments for favoring non–informative priors in order to let the data speak for
themselves, pros and cons of traditional and Bayesian inference, relating the
bootstrap to the Bayesian approach, possible study design criteria, sample size
and power issues, and implications for study design and review. The talk will
Page 3
use several examples from clinical trials including GUSTO (t–PA vs. streptoki-
nase for acute MI), a meta–analysis of possible harm from short–acting nifedip-
ine, and interpreting results from an unplanned interim analysis. BUGS code
will be given for these examples. The presentation will show how the Bayesian
approach can solve many common problems such as not having to deal with
how to “spend α” when considering multiple endpoints and sequential analy-
ses. An example clinical trial design that allows for continuous monitoring for
efficacy, safety, and similarity for two endpoints is given.
Page 4
Major Topics and Suggested Schedule
• 9:00a – 10:30a
– Overview of Methods for Quantifying and Acting
on Evidence
– Frequentist Statistical Inference
– What’s Wrong with Hypothesis Testing?
– Confidence Intervals
– Overview of Bayesian Approach
– The Standardized Likelihood Function
– Bayesian Inferential Methods
• 10:30a – 11:00a: Break
• 11:00a – 12:30p
– Three Types of Multiplicity
– The Bootstrap
– 2 × 2 Table Example
– Software
Page 5
– Examples from Clinical Trials
• 12:30p – 1:30p: Lunch
• 1:30p – 2:15p
– Meta–Analysis Example
– Unplanned Interim Analysis Example
– Example Study Designs
– Power and Sample Size
– Acceptance of Methods by Regulators &
Industry
– Summary
• 2:15p – 3:30p: General discussion
Page 6
Outline
• Quantifying Evidence vs. Decision Making
• Frequentist Statistical Inference
– Methods
– Advantages
– Disadvantages and Controversies
• What’s Wrong with Hypothesis Testing?
– The Applied Statistician’s Creed
– Has hypothesis testing hurt science?
• Confidence Intervals
• Bayesian Approach
– Brief Overview of Methods
– Advantages
– Disadvantages and Controversies
• The Standardized Likelihood Function
• Bayesian Inferential Methods
Page 7
– Choosing the Prior Distribution
– One–Sample Binomial
– Two–Sample Binomial
– Two–Sample Gaussian
– One–Sample Gaussian
– Deriving Posterior Distributions
– Using Posterior Distributions
• Sequential Testing
• Subgroup Analysis
• Inference for Multiple Endpoints
• The Bootstrap
• 2×2 Table Example: Traditional, Bayes, Bootstrap
• Software: BUGS and S-PLUS
• Examples from Clinical Trials
• Suggested Design Criteria
• Example Study Design
Page 8
• Power and Sample Size
• Implications for Design & Evaluation
• Acceptance of Methods by Regulators & Industry
• Summary
Page 9
Overview
• Point estimate for population treatment difference
• Probability of a statistic conditional on an
assumption we hope to gather evidence against
• Binary decision based on this P–value
• Selection of the variable of interest by a stepwise
variable selection algorithm
• Interval estimate: set of all parameter values that if
hypothesized to hold would not be rejected at
1 − α level or
Interval that gives desired coverage probability for
a parameter estimate
• Probability of a parameter (e.g., population
treatment difference) conditional on current data
• Entire probability distribution for the parameter
Page 10
• Optimal binary decision given model, prior beliefs,
loss function (e.g., patient utilities), data
• Relative evidence: odds ratio, likelihood ratio,
Bayes factor
E.g.: Whatever my prior belief about the therapy,
after receiving the current data the odds that the
new therapy has positive efficacy is 18 times as
high as it was before these data were available
Page 11
Medical Diagnosis Framework
• Traditional (frequentist) approach analogous to
consideration of probabilities of test outcomes |
disease status (sensitivity, specificity)
• Post–test probabilities of disease are much more
useful
• Debate about use of direct probability models (e.g.,
logistic) vs. classification
– Recursive partitioning (CART)
– Discriminant analysis
– Classify based onˆP from logistic model
Page 12
Decisions vs. Simply Quantifying Evidence
• Decision tree to structure options and outcomes
• Uncertainty about each outcome quantified using
probabilities
• Consequences valued on utility scale
• Derive thresholds corresponding to different
actions
• Classic decision–making example: Berry et al.12
– Vaccine trial in children in a Navajo reservation
– Goal: minimize number of cases of
Haemophilus influenzae b cases in the Navajo
Nation
Page 13
Problems with “Canned” Decisions
• See Spiegelhalter (1986): Probabilistic prediction
in patient management and clinical trials71
However, such a complete specification and
analysis of the problem, even when
accompanied by elaborate sensitivity
analyses, often does not appear convincing
or transparent to the practising clinician.
Indeed, Feinstein has stated that
‘quantitative decision analysis is
unsatisfactory for the realities of clinical
medicine’, primarily because of the problem
in ascribing an agreed upon measure of
‘utility’ to a health outcome
• In medical diagnosis framework, utilities and
patient preferences are not defined until the patient
is in front of the doctor
• Example: decision re: cardiac cath is based on
Page 14
patient age, beyond how age enters into pre–test
prob. of coronary disease
• It is presumptuous for the analyst to make
classifications into “diseased” and “non–diseased”
• The preferred published output of diagnostic
modeling isˆP(D|X)
• In therapeutic studies, probabilities of efficacy and
of cost are very useful; decisions can be made at
the point of care when utilities are available (and
relevant)
Page 15
Methods
• Attempt to demonstrate S assuming¯S and
showing it’s unlikely
• Treat unknowns as constants
• Choose a test statistic T
• Compute Pr[T as or more impressive as one
observed|H0]
• Probabilities “refer to the frequency with which
different values of statistics (arising from sets of
data other than those which have actually
happened) could occur for some fixed but unknown
values of the parameters”15
Page 16
Advantages
• Simple to think of unknown parameter as a
constant
• P–values relatively easy to compute
• Accepted by most of the world
• Prior beliefs not needed at computation time
• Robust nonparametric tests are available without
modeling
Page 17
Disadvantages and Controversies
• “Have to decide which ‘reference set’ of groups of
data which have not actually occurred we are going
to contemplate”15; what is “impressive”?
• Conditions on what is unknowable (parameters)
and does not condition on what is already known
(the data)
• H0:no effect is a boring hypothesis that is often not
really of interest. It is more of a mathematical
convenience.
• Do we really think that most treatments have truly
an effect of 0.0 in “negative trials”?
• Does not address clinical significance
• If real effect is mean decrease in BP by 0.2 mmHg,
large enough n will yield P < 0.05
• By some mistake, α = 0.05 is often used as
magic cutoff
Page 18
• Controversy surrounding 1–tailed vs. 2–tailed tests
[68, Chapter 12]
• No method for trading off type I and type II error
• No uniquely accepted P–value for 2 × 2 table!
What is “extreme”: of all possible tables or all
tables with same total no. of deaths?
No consensus on the optimum procedure for
obtaining a P–value (e.g., Pearson χ2vs. Fisher’s
so–called exact test, continuity correction,
likelihood ratio test, new unconditional tests).
• For ECMO trial, 13 P–values have been computed
for the same 2 × 2 table, ranging from 0.001 to 1.0
• P–values very often misinterpreteda
• Must interpret P–values in light of other evidence
since it is a probability for a statistic, not for drug
benefit
aHalf of 24 cardiologists gave the correct response to a 4–choice
question.24
Page 19
• Berger and Berry: n = 17 matched pairs,
P = 0.049, the maximum Pr[H0] = 0.79
• P = 0.049 deceptive because it involves
probabilities of more extreme unobserved data8
• In testing a point H0,P = 0.05 “essentially does
not provide any evidence against the null
hypothesis” (Berger et al.9) — Pr[H1|P = 0.05]
will be near 0.5 in many cases if prior probability of
truth of H0is near 0.5
• Confidence intervals frequently misinterpreted —
consumers act as if “degree of confidence” is
uniform within the interval
• Very hard to directly answer interesting questions
such as Pr[similarity]
• Standard statistical methods use subjective input
from “the producer rather than the consumer of the
data”8
• P–values can only be used to provide evidence
Page 20
against a hypothesis, not to give evidence in favor
of a hypothesis. Schervish67gives examples where
P–values are incoherent: if one uses a P–value to
gauge the evidence in favor of an interval
hypothesis for a certain dataset, the P–value
based on the same dataset but for a more
restrictive sub–hypothesis (i.e., one specifying a
subset of the interval) actually gives more support
(larger P).
• Equal P-values do not provide equal evidence
about a hypothesis63
• If use P < 0.05 as a binary event, evidence is
stronger in larger studies63[68, P. 179-183]
• If use actual P-value, evidence is stronger in
smaller studies63
• Goodman41showed how P–values can provide
misleading evidence by considering “replication
probability” — prob. of getting a significant result in
Page 21
a second study given P–value from first study and
given true treatment effect = observed effect in first
study
InitialProbability of
P–valueReplication
.10 .37
.05.50
.01 .73
.005.80
.001 .91
• See also Berger & Sellke7
• See65, 27for interpretations of P–values under
alternative hypotheses
• Why are P–values still used?
Feinstein33believes their status “...is a lamentable
demonstration of the credulity with which modern scientists
will abandon biologic wisdom in favor of any quantitative
ideology that offers the specious allure of a mathematical
replacement for sensible thought.”
Page 22
The Multiplicity Mess
• Much controversy about need for/how to adjust for
multiple comparisons
• Do you want Pr[Reject | this H0true] = 0.05, or
Pr[Reject | this and other H0s true] = 0.05?
• If the latter, C.L.s must use e.g. 1 −α
→precision of a parameter estimate depends on
what other parameters were estimated
kconf. level
• Rothman62:“The theoretical basis for advocating a
routine adjustment for multiple comparisons is the
‘universal null hypothesis’ that ‘chance’ serves as the
first–order explanation for observed phenomena. This
hypothesis undermines the basic premises of empirical
research, which holds that nature follows regular laws
that may be studied through observations. A policy of
not making adjustments for multiple comparisons is
preferable because it will lead to fewer errors of
interpretation when the data under evaluation are not
Page 23
random numbers but actual observations on nature.
Furthermore, scientists should not be so reluctant to
explore leads that may turn out to be wrong that they
penalize themselves by missing possibly important
findings.”
• Cook and Farewell21: If results are intended to be
interpreted marginally, there may be no need for
controlling experimentwise error rate. See also
[68, P. 142-143].
• Need to distinguish between H0: at least one of
five endpoints is improved by the drug and H0: the
fourth endpoint is improved by the drug
• Many conflicting alternative adjustment methods
• Bonferroni adjustment is consistent with a Bayesian
prior distribution which specifies that the probability
that all null hypotheses is true is a constant (say
0.5) no matter how many hypotheses are tested80
• Even with careful Bonferroni adjustment, a trial with
Page 24
20 endpoints could be declared a success if only
one endpoint was “significant” after adjustment;
Bayesian approach allows more sensible
specification of “success”
• Much controversy about need for adjusting for
sequential testing. Frequentist approach is
complicated.
Example: 5 looks at data as trial proceeds
Looks had no effect, trial proceeded to end
Usual P = 0.04, need to adjust upwards for
having looked
Two studies with identical experiments and data but
with investigators with different intentions → one
might claim “significance”, the other not (Berry10)
Example: one investigator may treat an interim
analysis as a final analysis, another may intend to
wait.
• It gets worse — need to adjust “final” point
Page 25
estimates for having done interim analyses
• Freedman et al.36give example where such
adjustment yields 0.95 CI that includes 0.0 even for
data indicating that study should be stopped at the
first interim analysis
• As frequentist methods use intentions (e.g.,
stopping rule), they are not fully objective8
If the investigator died after reporting the
data but before reporting the design of the
experiment, it would be impossible to
calculate a P–value or other standard
measures of evidence.
• Since P–values are probabilities of obtaining a
result as or more extreme than the study’s result
under repeated experimentation, frequentists
interpret results by inferring “what would have
occurred following results that were not observed
at analyses that were never performed”29.
Page 26
What’s Wrong with Hypothesis Testing?
• Hypotheses are often “straw men” that are
imagined by the investigator just to fit into the
traditional statistical framework
• Hypotheses are often inappropriately chosen (e.g.,
H0: ρ = 0)
• Most phenomena of interest are not all–or–nothing
but represent a continuum
• See50for an interesting review
Page 27
The Applied Statistician’s Creed
• Nester56:
(a) TREATMENTS — all treatments differ;
(b) FACTORS — all factors interact;
(c) CORRELATIONS — all variables are
correlated;
(d) POPULATIONS — no two populations
are identical in any respect;
(e) NORMALITY — no data are normally
distributed;
(f) VARIANCES — variances are never
equal;
(g) MODELS — all models are wrong;
(h) EQUALITY — no two numbers are the
same;
(i) SIZE — many numbers are very small.
Page 28
• →no two treatments actually yield identical patient
outcomes
• →Most hypotheses are irrelevant
Page 29
Has Hypothesis Testing Hurt Science?
• Many studies are powered to be able to detect a
huge treatment effect
• →sample size too small →confidence interval too
wide to be able to reliably estimate treatment
effects
• “Positive” study can have C.L. of [.1,.99] for effect
ratio
• “Negative” study can have C.L. of [.1,10]
• Physicians, patients, payers need to know the
magnitude of a therapeutic effect more than
whether or not it is zero
• “It is incomparably more useful to have a plausible
range for the value of a parameter than to know,
with whatever degree of certitude, what single
value is untenable.” — Oakes58
• Study may yield precise enough estimates of
Page 30
relative treatment effects but not of absolute effects
• C.L. for cost–effectiveness ratio may be extremely
wide
• Hypothesis testing usually entails fixing n; many
studies stop with P = 0.06 when adding 20 more
patients could have resulted in a conclusive study
• Many “positive” studies are due to large n and not
to clinically meaningful treatment effects
• Hypothesis testing usually implies inflexibility69
Page 31
• Cornfield23:
“Of course a re–examination in the light of results of
the assumptions on which the pre– observational
partition of the sample space was based would be
regarded in some circles as bad statistics. It would,
however, be widely regarded as good science. I do
not believe that anything that is good science can be
bad statistics, and conclude my remarks with the
hope that there are no statisticians so inflexible as to
decline to analyze an honest body of scientific data
simply because it fails to conform to some favored
theoretical scheme. If there are such, however,
clinical trials, in my opinion, are not for them.”
• If H0is rejected, practitioners often behave as if
point estimate of treatment effect is population
value
Page 32
Confidence Intervals
• Misinterpreted twice as often as P–values
• Are one–dimensional: consumers interpret a
confidence interval for OR of [.35,1.01] as saying
that a 1% increase in mortality is as likely as a
10% decrease
• Confidence plots (with continuously varying 1 − α)
can help13, 28, but their interpretation is complex
Page 33
Methods
• Attempt to answer question by computing
probability of the truth of a statement
• Let S denote a statement about the drug effect,
e.g., patients on drug live longer than patients on
placebo
• Want something like Pr[S| data]
• If θ is a parameter of interest (e.g., log odds ratio or
difference in mean blood pressure), need a
probability distribution of θ| data
• Pr[θ|data] ∝ Pr[data|θ]Pr[θ]
• Pr[θ] is the prior distribution for θ
• Assuming θ is an unknown random variable
Page 34
Advantages
• “intended for measuring support for hypotheses
when the data are fixed (the true state of affairs
after the data are observed)”67
• “inferences are based on probabilities associated
with different values of parameters which could
have given rise to the fixed set of data which has
actually occurred”15
• Results in a probability most clinicians think they’re
gettinga
• Can compute (posterior) probability of interesting
events, e.g.
Pr[drug is beneficial]
Pr[drug A clinically similar to drug B]
Pr[drug A is > 5% better than drug B]19
aNineteen of 24 cardiologists rated the posterior probability as
the quantity they would most like to know, from among three choices.
24
Page 35
Pr[mortality reduction ≥ 0∩ cost reduction > 0]
Pr[mortality reduction ≥ 0 ∪ (mortality reduction
> 0.02∩ cost reduction > −$5000)]
Pr[mortality reduction ≥ 0 ∪ ( cost reduction
> 0∩ morbidity reduction ≥ 0)]
Pr[ICER ≤ $30,000/ life year saved]
• Provides formal mechanism for using prior
information/bias — Pr[θ]
• Places emphasis on estimation and graphical
presentation rather than hypothesis testing
• Avoids 1–tailed/2–tailed issue
• Posterior (Berry prefers “current”) probabilities can
be interpreted out of context better than P–values
• If Pr[drug B is better than drug A] = 0.92, this is
true whether drug C was compared to drug D or not
• Avoids many of complexities of sequential
monitoring —
Page 36
P–value adjustment is needed for frequentist
methods because repeatedly computed test
statistics no longer have a χ2or normal
distribution;
A posterior probability is still a probability → Can
monitor continuously
• Allows accumulating information (from this as well
as other trials) to be used as trial proceeds
• No need for sufficient statistics
Page 37
Controversies
• Posterior probabilities may be hard to compute
(often have to use numerical methods)
• How does one choose a prior distribution Pr[θ]?49
– Biased prior – expert opinion
difficult, can be manipulated, medical experts
often wrong, whose opinion do you use?34
– Skeptical prior (often useful in sequential
monitoring)
– Unbiased (flat, non–informative) prior
– Truncated prior — allows one to pre–specify
e.g. there is no chance the odds ratio could be
outside [1
10,10]
• For monitoring, Spiegelhalter et al.74suggest using
“community of priors” (see22for pros and cons):
– Skeptical prior with mean 0 against which judge
early stopping for efficacy
Page 38
– Enthusiastic prior with mean δA(hypothesized
effect) against which judge early stopping for no
difference
• Rank–based analyses need to use models:
Wilcoxon →proportional odds ordinal logistic
model
logrank →Cox PH model
Page 39
Invalid Bayesian Analyses
• Choosing an improper model for the data (can be
remedied by adding e.g. non–normality parameter
with its own prior15)
• Sampling to a foregone conclusion if a continuous
prior is used but the investigators and the
consumers were convinced that prob. of treatment
effect is exactly zero > 0a
aThis is easily solved by using a prior with a lump of probability
at zero.
Page 40
• Suppression of the latest data by an unscrupulous
investigator:
Current results using 200 patients nearly
conclusive in favor of drug
Decide to accrue 50 more patients to draw firm
conclusion
Results of 50 less favorable to drug
Based final analysis on 200 patientsa
aNote the martingale property of posterior probs.: E[Pr(θ1 >
θ2| data, data?)] = Pr(θ1 > θ2| data).
Page 41
The Standardized Likelihood Function
• Unknown parameter θ, data vector y
• Let likelihood function be l(θ|y)
• Standardized likelihood:
p(θ|y) =
l(θ|y)
?l(θ|y)dθ
(1)
• Don’t need to choose a prior if willing to take the
normalized likelihood as a basis for calculating
probabilities of interest (Fisher’s fiducial
distributions)
Page 42
One–Sample Binomial
• Y1,Y2,...,Yn∼ Bernoulli(θ)
• s = number of “successes”
• l(θ|y) = θs(1 − θ)n−s
•
• p(θ|y) =
• Solving for θ so that tail areas of p(θ|y) =α
gives exact 1 − α C.L. for 1–sample binomial
?l(θ|y)dθ = β(s + 1,n − s + 1)
θs(1−θ)n−s
β(s+1,n−s+1)
2
Page 43
Basis
• p(θ|y) ∝ l(θ|y)p(θ)
• l(θ|y) = likelihood function
• Function through which data y modifies the prior
knowledge of θ15
• Has the information about θ that comes from the
data
Page 44
Choosing the Prior Distribution
• Stylized or “automatic” priors34, 49
• Data quickly overwhelm all but the most skeptical
priors, especially in clinical applications
• In scientific inference, let data speak for themselves
• →A priori relative ignorance, draw inference
appropriate for an unpredudiced observer15
• Scientific studies usually not undertaken if precise
estimates already known. Also, problems with
informed consent.
• Even when researcher has strong prior beliefs,
more convincing to analyze data using a reference
prior dominated by likelihood15
• Box and Tiao15advocate locally uniform priors —
considers local behavior of prior in region where
the likelihood is appreciable, prior assumed not
large outside that range
Page 45
→posterior ≈ standardized likelihood
• Choice of metric φ for uniformity of prior:
Such that likelihood for φ(θ) completely
determined except for location (≈ variance
stabilizing transformation) — likelihood is data
translated
“Then to say we know little a priori relative to
what the data is going to tell us, may be
expressed by saying that we are almost equally
willing to accept one value of φ(θ) as another.”15
→Highest likelihood intervals symmetric in φ(θ)
• Example: Gaussian dist.→φ(σ) = log(σ), or if
use σ, prior ∝ σ−1
Page 46
Consumer Specification of Prior
• Place statistics describing study results on web
page
Posterior computed and displayed using Java
applet (Lehmann & Nguyen53)
• Highly flexible approximate approach: store 1000
bootstrapˆθ, can quickly take a weighted sample
from these to apply a non–uniform prior57
Page 47
One–Sample Binomial, Continued
•ˆθ = ¯ y (proportion)
• sin−1?ˆθ →nearly data–translated likelihood and
locally uniform prior is nearly noninformative15
• Nearly noninformative prior on original scale
∝ [θ(1 − θ)]−1
• Posterior using this prior is
p(θ|y) =θs−1
2
2(1−θ)n−s−1
β(s+1
2
2,n−s+1
2)
Page 48
Two–Sample Binomial
• Posterior using locally uniform priors on
data–translated scale:
p(θ1,θ2|y) =
θ
1
β(s1+1
s1−1
2
(1−θ1)n1−s1−1
2,n1−s1+1
2θ
2)β(s2+1
s2−1
2
2
(1−θ2)n2−s2−1
2,n2−s2+1
2
2)
• Can integrate to get posterior distribution of any
quantity of interest, e.g.,
θ1
1−θ1
1−θ2
θ2
• See Hashemi et al.44for much more information
about posterior distributions of ORs and other
effect measures
• See Howard45for a discussion of the need to use
priors that require θ1and θ2to be dependent.
Page 49
Two–Sample Gaussian
15
• Y1∼ N(µ1,σ2) ind. of Y2∼ N(µ2,σ2)
• µ1,µ2,logσ ∼ constant independentlya
• ν = n1+ n2− 2
• νs2=?(y1i− ¯ y1)2+?(y2i− ¯ y2)2
• δ = µ2− µ1,ˆδ = ¯ y2− ¯ y1
• p(δ,σ2|y) = p(σ2|s2)p(δ|σ2,ˆδ)
• νs2/σ2∼ χ2
p(δ|σ2,ˆδ) = N(ˆδ,σ2(1/n1+ 1/n2))
• Integrate out σ2to get marginal posterior dist. of
δ ∼ tν[ˆδ,s2(1/n1+ 1/n2)]
aThe prior for σ ∝ σ−1.
ν
Page 50
One–Sample Gaussian
• Y ∼ N(µ,σ2),σ known
• µ ∼ N(µ0,σ2
• µ|y ∼ N(µ?,σ?2)
• µ?=w0µ0+wy
• σ?2=
• w0= σ−2
• σ0→ ∞ : µ ∼ N(y,σ2)
0)
w0+w
1
w0+w
0,w = σ−2
Download full-text