Page 1
Introducing Monte Carlo Methods with R
Christian P. Robert
Universit´ e Paris Dauphine
xian@ceremade.dauphine.fr
George Casella
University of Florida
casella@ufl.edu
Page 2
Monte Carlo Methods with R: Introduction [1]
Based on
• Introducing Monte Carlo Methods with R, 2009, Springer-Verlag
• Data and R programs for the course available at
http://www.stat.ufl.edu/ casella/IntroMonte/
Page 3
Monte Carlo Methods with R: Basic R Programming [2]
Chapter 1: Basic R Programming
“You’re missing the big picture,” he told her. “A good album should be
more than the sum of its parts.”
Ian Rankin
Exit Music
This Chapter
◮ We introduce the programming language R
◮ Input and output, data structures, and basic programming commands
◮ The material is both crucial and unavoidably sketchy
Page 4
Monte Carlo Methods with R: Basic R Programming [3]
Basic R Programming
Introduction
◮ This is a quick introduction to R
◮ There are entire books devoted to R
⊲ R Reference Card
⊲ available at http://cran.r-project.org/doc/contrib/Short-refcard.pdf
◮ Take Heart!
⊲ The syntax of R is simple and logical
⊲ The best, and in a sense the only, way to learn R is through trial-and-error
◮ Embedded help commands help() and help.search()
⊲ help.start() opens a Web browser linked to the local manual pages
Page 5
Monte Carlo Methods with R: Basic R Programming [4]
Basic R Programming
Why R ?
◮ There exist other languages, most (all?) of them faster than R, like Matlab, and
even free, like C or Python.
◮ The language combines a sufficiently high power (for an interpreted language)
with a very clear syntax both for statistical computation and graphics.
◮ R is a flexible language that is object-oriented and thus allows the manipulation
of complex data structures in a condensed and efficient manner.
◮ Its graphical abilities are also remarkable
⊲ Possible interfacing with LATEXusing the package Sweave.
Page 6
Monte Carlo Methods with R: Basic R Programming [5]
Basic R Programming
Why R ?
◮ R offers the additional advantages of being a free and open-source system
⊲ There is even an R newsletter, R-News
⊲ Numerous (free) Web-based tutorials and user’s manuals
◮ It runs on all platforms: Mac, Windows, Linux and Unix
◮ R provides a powerful interface
⊲ Can integrate programs written in other languages
⊲ Such as C, C++, Fortran, Perl, Python, and Java.
◮ It is increasingly common to see people who develop new methodology simulta-
neously producing an R package
◮ Can interface with WinBugs
Page 7
Monte Carlo Methods with R: Basic R Programming [6]
Basic R Programming
Getting started
◮ Type ’demo()’ for some demos; demo(image) and demo(graphics)
◮ ’help()’ for on-line help, or ’help.start()’ for an HTML browser interface to help.
◮ Type ’q()’ to quit R.
◮ Additional packages can be loaded via the library command, as in
> library(combinat) # combinatorics utilities
> library(datasets) # The R Datasets Package
⊲ There exist hundreds of packages available on the Web.
> install.package("mcsm")
◮ A library call is required each time R is launched
Page 8
Monte Carlo Methods with R: Basic R Programming [7]
Basic R Programming
R objects
◮ R distinguishes between several types of objects
⊲ scalar, vector, matrix, time series, data frames, functions, or graphics.
⊲ An R object is mostly characterized by a mode
⊲ The different modes are
- null (empty object),
- logical (TRUE or FALSE),
- numeric (such as 3, 0.14159, or 2+sqrt(3)),
- complex, (such as 3-2i or complex(1,4,-2)), and
- character (such as ”Blue”, ”binomial”, ”male”, or "y=a+bx"),
◮ The R function str applied to any R object will show its structure.
Page 9
Monte Carlo Methods with R: Basic R Programming [8]
Basic R Programming
Interpreted
◮ R operates on those types as a regular function would operate on a scalar
◮ R is interpreted ⇒ Slow
◮ Avoid loops in favor of matrix mainpulations
Page 10
Monte Carlo Methods with R: Basic R Programming [9]
Basic R Programming – The vector class
> a=c(5,5.6,1,4,-5) build the object a containing a numeric vector
of dimension 5 with elements 5, 5.6, 1, 4, –5
> a[1]
display the first element of a
> b=a[2:4]
build the numeric vector b of dimension 3
with elements 5.6, 1, 4
build the numeric vector d of dimension 3
> d=a[c(1,3,5)]
with elements 5, 1, –5
multiply each element of a by 2
> 2*a
and display the result
provides each element of b modulo 3
> b%%3
Page 11
Monte Carlo Methods with R: Basic R Programming [10]
Basic R Programming
More vector class
> e=3/d
build the numeric vector e of dimension 3
and elements 3/5, 3, –3/5
multiply the vectors d and e term by term
> log(d*e)
and transform each term into its natural logarithm
calculate the sum of d> sum(d)
> length(d) display the length of d
Page 12
Monte Carlo Methods with R: Basic R Programming [11]
Basic R Programming
Even more vector class
> t(d)
> t(d)*e
transpose d, the result is a row vector
elementwise product between two vectors
with identical lengths
matrix product between two vectors
> t(d)%*%e
with identical lengths
> g=c(sqrt(2),log(10)) build the numeric vector g of dimension 2
and elements√2, log(10)
> e[d==5]
build the subvector of e that contains the
components e[i] such that d[i]=5
create the subvector of a that contains
> a[-3]
all components of a but the third.
display the logical expression TRUE if
> is.vector(d)
a vector and FALSE else
Page 13
Monte Carlo Methods with R: Basic R Programming [12]
Basic R Programming
Comments on the vector class
◮ The ability to apply scalar functions to vectors: Major Advantage of R.
⊲ > lgamma(c(3,5,7))
⊲ returns the vector with components (logΓ(3),logΓ(5),logΓ(7)).
◮ Functions that are specially designed for vectors include
sample, permn, order,sort, and rank
⊲ All manipulate the order in which the components of the vector occur.
⊲ permn is part of the combinat library
◮ The components of a vector can also be identified by names.
⊲ For a vector x, names(x) is a vector of characters of the same length as x
Page 14
Monte Carlo Methods with R: Basic R Programming [13]
Basic R Programming
The matrix, array, and factor classes
◮ The matrix class provides the R representation of matrices.
◮ A typical entry is
> x=matrix(vec,nrow=n,ncol=p)
⊲ Creates an n × p matrix whose elements are of the dimension np vector vec
◮ Some manipulations on matrices
⊲ The standard matrix product is denoted by %*%,
⊲ while * represents the term-by-term product.
⊲ diag gives the vector of the diagonal elements of a matrix
⊲ crossprod replaces the product t(x)%*%y on either vectors or matrices
⊲ crossprod(x,y) more efficient
⊲ apply is easy to use for functions operating on matrices by row or column
Page 15
Monte Carlo Methods with R: Basic R Programming [14]
Basic R Programming
Some matrix commands
> x1=matrix(1:20,nrow=5)
build the numeric matrix x1 of dimension
5 × 4 with first row 1, 6, 11, 16
> x2=matrix(1:20,nrow=5,byrow=T) build the numeric matrix x2 of dimension
5 × 4 with first row 1, 2, 3, 4
matrix product
term-by-term product between x1 and x2
display the dimensions of x1
select the second column of b
select the third and fourth rows of b
delete the second row of b
vertical merging of x1 and x2rbind(*)rbind
horizontal merging of x1 and x2rbind(*)rbind
calculate the sum of each row of x1
turn the vector 1:10 into a 10 × 1 matrix
> a=x1%*%t(x2)
> c=x1*x2
> dim(x1)
> b[,2]
> b[c(3,4),]
> b[-2,]
> rbind(x1,x2)
> cbind(x1,x2)
> apply(x1,1,sum)
> as.matrix(1:10)
◮ Lots of other commands that we will see throughout the course
Page 16
Monte Carlo Methods with R: Basic R Programming [15]
Basic R Programming
The list and data.frame classes
The Last One
◮ A list is a collection of arbitrary objects known as its components
> li=list(num=1:5,y="color",a=T) create a list with three arguments
◮ The last class we briefly mention is the data frame
⊲ A list whose elements are possibly made of differing modes and attributes
⊲ But have the same length
> v1=sample(1:12,30,rep=T)
> v2=sample(LETTERS[1:10],30,rep=T) simulate 30 independent uniform {a,b,....,j}
> v3=runif(30)
> v4=rnorm(30)
> xx=data.frame(v1,v2,v3,v4)
simulate 30 independent uniform {1,2,...,12}
simulate 30 independent uniform [0,1]
simulate 30 independent standard normals
create a data frame
◮ R code
Page 17
Monte Carlo Methods with R: Basic R Programming [16]
Probability distributions in R
◮ R , or the web, has about all probability distributions
◮ Prefixes: p, d,q, r
Distribution
Beta
Binomial
Cauchy
Chi-square
Exponential
F
Gamma
Geometric
Hypergeometric
Log-normal
Logistic
Normal
Poisson
Student
Uniform
Weibull
Core
beta
binom
cauchy
chisq
exp
f
gamma
geom
hyper
lnorm
logis
norm
pois
t
unif
weibull
Parameters
shape1, shape2
size, prob
location, scale
df
1/mean
df1, df2
shape,1/scale
prob
m, n, k
mean, sd
location, scale
mean, sd
lambda
df
min, max
shape
Default Values
0, 1
1
NA, 1
0, 1
0, 1
0, 1
0, 1
Page 18
Monte Carlo Methods with R: Basic R Programming [17]
Basic and not-so-basic statistics
t-test
◮ Testing equality of two means
> x=rnorm(25) #produces a N(0,1) sample of size 25
> t.test(x)
One Sample t-test
data:
t = -0.8168, df = 24, p-value = 0.4220
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.49151030.2127705
sample estimates:
mean of x
-0.1393699
x
Page 19
Monte Carlo Methods with R: Basic R Programming [18]
Basic and not-so-basic statistics
Correlation
◮ Correlation
> attach(faithful) #resident dataset
> cor.test(faithful[,1],faithful[,2])
Pearson’s product-moment correlation
data:
t = 34.089, df = 270, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8756964 0.9210652
sample estimates:
cor
0.9008112
faithful[, 1] and faithful[, 2]
◮ R code
Page 20
Monte Carlo Methods with R: Basic R Programming [19]
Basic and not-so-basic statistics
Splines
◮ Nonparametric regression with loess function or using natural splines
◮ Relationship between nitrogen level in soil and abundance of a bacteria AOB
◮ Natural spline fit (dark)
⊲ With ns=2 (linear model)
◮ Loess fit (brown) with span=1.25
◮ R code
Page 21
Monte Carlo Methods with R: Basic R Programming [20]
Basic and not-so-basic statistics
Generalized Linear Models
◮ Fitting a binomial (logistic) glm to the probability of suffering from diabetes for
a woman within the Pima Indian population
> glm(formula = type ~ bmi + age, family = "binomial", data = Pima.tr)
Deviance Residuals:
Min
-1.7935 -0.8368
1Q Median
-0.5033
3Q Max
1.02112.2531
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.49870 1.17459
bmi0.10519 0.02956
age0.07104 0.01538
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-5.533 3.15e-08 ***
3.558 0.000373 ***
4.620 3.84e-06 ***
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 256.41on 199
Residual deviance: 215.93 on 197
AIC: 221.93
Number of Fisher Scoring iterations: 4
degrees of freedom
degrees of freedom
Page 22
Monte Carlo Methods with R: Basic R Programming [21]
Basic and not-so-basic statistics
Generalized Linear Models – Comments
◮ Concluding with the significance both of the body mass index bmi and the age
◮ Other generalized linear models can be defined by using a different family value
> glm(y ~x, family=quasi(var="mu^2", link="log"))
⊲ Quasi-Likelihood also
◮ Many many other procedures
⊲ Time series, anova,...
◮ One last one
Page 23
Monte Carlo Methods with R: Basic R Programming [22]
Basic and not-so-basic statistics
Bootstrap
◮ The bootstrap procedure uses the empirical distribution as a substitute for the
true distribution to construct variance estimates and confidence intervals.
⊲ A sample X1,...,Xnis resampled with replacement
⊲ The empirical distribution has a finite but large support made of nnpoints
◮ For example, with data y, we can create a bootstrap sample y∗using the code
> ystar=sample(y,replace=T)
⊲ For each resample, we can calculate a mean, variance, etc
Page 24
Monte Carlo Methods with R: Basic R Programming [23]
Basic and not-so-basic statistics
Simple illustration of bootstrap
Bootstrap Meansx
Relative Frequency
4.04.04.5 4.5 5.05.05.5 5.56.0 6.06.5 6.57.0 7.0
0.0 0.0
0.20.2
0.40.4
0.6 0.6
0.8 0.8
1.01.0
◮ A histogram of 2500 bootstrap means
◮ Along with the normal approximation
◮ Bootstrap shows some skewness
◮ R code
Page 25
Monte Carlo Methods with R: Basic R Programming [24]
Basic and not-so-basic statistics
Bootstrapping Regression
◮ The bootstrap is not a panacea
⊲ Not always clear which quantity should be bootstrapped
⊲ In regression, bootstrapping the residuals is preferred
◮ Linear regression
Yij= α + βxi+ εij,
α and β are the unknown intercept and slope, εijare the iid normal errors
◮ The residuals from the least squares fit are given by
ˆ εij= yij− ˆ α −ˆβxi,
⊲ We bootstrap the residuals
⊲ Produce a new sample (ˆ ε∗
⊲ The bootstrap samples are then y∗
ij)ijby resampling from the ˆ εij’s
ij= yij+ ˆ ε∗
ij
Page 26
Monte Carlo Methods with R: Basic R Programming [25]
Basic and not-so-basic statistics
Bootstrapping Regression – 2
Intercept
Frequency
1.52.5 3.5
0
50
100
150
200
Slope
Frequency
3.8 4.24.6 5.0
0
50
100
150
200
250
◮ Histogram of 2000 bootstrap samples
◮ We can also get confidence intervals
◮ R code
Page 27
Monte Carlo Methods with R: Basic R Programming [26]
Basic R Programming
Some Other Stuff
◮ Graphical facilities
⊲ Can do a lot; see plot and par
◮ Writing new R functions
⊲ h=function(x)(sin(x)^2+cos(x)^3)^(3/2)
⊲ We will do this a lot
◮ Input and output in R
⊲ write.table, read.table, scan
◮ Don’t forget the mcsm package
Page 28
Monte Carlo Methods with R: Random Variable Generation [27]
Chapter 2: Random Variable Generation
“It has long been an axiom of mine that the little things are infinitely the
most important.”
Arthur Conan Doyle
A Case of Identity
This Chapter
◮ We present practical techniques that can produce random variables
◮ From both standard and nonstandard distributions
◮ First: Transformation methods
◮ Next: Indirect Methods - Accept–Reject
Page 29
Monte Carlo Methods with R: Random Variable Generation [28]
Introduction
◮ Monte Carlo methods rely on
⊲ The possibility of producing a supposedly endless flow of random variables
⊲ For well-known or new distributions.
◮ Such a simulation is, in turn,
⊲ Based on the production of uniform random variables on the interval (0,1).
◮ We are not concerned with the details of producing uniform random variables
◮ We assume the existence of such a sequence
Page 30
Monte Carlo Methods with R: Random Variable Generation [29]
Introduction
Using the R Generators
R has a large number of functions that will generate the standard random variables
> rgamma(3,2.5,4.5)
produces three independent generations from a G(5/2,9/2) distribution
◮ It is therefore,
⊲ Counter-productive
⊲ Inefficient
⊲ And even dangerous,
◮ To generate from those standard distributions
◮ If it is built into R , use it
◮ But....we will practice on these.
◮ The principles are essential to deal with distributions that are not built into R.
Page 31
Monte Carlo Methods with R: Random Variable Generation [30]
Uniform Simulation
◮ The uniform generator in R is the function runif
◮ The only required entry is the number of values to be generated.
◮ The other optional parameters are min and max, with R code
> runif(100, min=2, max=5)
will produce 100 random variables U(2,5).
Page 32
Monte Carlo Methods with R: Random Variable Generation [31]
Uniform Simulation
Checking the Generator
◮ A quick check on the properties of this uniform generator is to
⊲ Look at a histogram of the Xi’s,
⊲ Plot the pairs (Xi,Xi+1)
⊲ Look at the estimate autocorrelation function
◮ Look at the R code
> Nsim=10^4
> x=runif(Nsim)
> x1=x[-Nsim]
> x2=x[-1]
> par(mfrow=c(1,3))
> hist(x)
> plot(x1,x2)
> acf(x)
#number of random numbers
#vectors to plot
#adjacent pairs
Page 33
Monte Carlo Methods with R: Random Variable Generation [32]
Uniform Simulation
Plots from the Generator
Histogram of x
x
Frequency
0.00.4 0.8
0
20
40
60
80
100
120
0.00.40.8
0.0
0.2
0.4
0.6
0.8
1.0
x1
x2
◮ Histogram (left), pairwise plot (center), and estimated autocorrelation func-
tion (right) of a sequence of 104uniform random numbers generated by runif.
Page 34
Monte Carlo Methods with R: Random Variable Generation [33]
Uniform Simulation
Some Comments
◮ Remember: runif does not involve randomness per se.
◮ It is a deterministic sequence based on a random starting point.
◮ The R function set.seed can produce the same sequence.
> set.seed(1)
> runif(5)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
> set.seed(1)
> runif(5)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
> set.seed(2)
> runif(5)
[1] 0.0693609 0.8177752 0.9426217 0.2693818 0.1693481
◮ Setting the seed determines all the subsequent values
Page 35
Monte Carlo Methods with R: Random Variable Generation [34]
The Inverse Transform
◮ The Probability Integral Transform
⊲ Allows us to transform a uniform into any random variable
◮ For example, if X has density f and cdf F, then we have the relation
F(x) =
?x
−∞
f(t)dt,
and we set U = F(X) and solve for X
◮ Example 2.1
⊲ If X ∼ Exp(1), then F(x) = 1 − e−x
⊲ Solving for x in u = 1 − e−xgives x = −log(1 − u)
Page 36
Monte Carlo Methods with R: Random Variable Generation [35]
Generating Exponentials
> Nsim=10^4
> U=runif(Nsim)
> X=-log(U)
> Y=rexp(Nsim)
> par(mfrow=c(1,2))
> hist(X,freq=F,main="Exp from Uniform")
> hist(Y,freq=F,main="Exp from R")
#number of random variables
#transforms of uniforms
#exponentials from R
#plots
◮ Histograms of exponential random variables
⊲ Inverse transform (right)
⊲ R command rexp (left)
⊲ Exp(1) density on top
Page 37
Monte Carlo Methods with R: Random Variable Generation [36]
Generating Other Random Variables From Uniforms
◮ This method is useful for other probability distributions
⊲ Ones obtained as a transformation of uniform random variables
◮ Logistic pdf: f(x) =1
β
e−(x−µ)/β
[1+e−(x−µ)/β]2, cdf: F(x) =
1
1+e−(x−µ)/β.
◮ Cauchy pdf: f(x) =
1
πσ
1
1+(x−µ
σ)
2, cdf: F(x) =1
2+1
πarctan((x − µ)/σ).
Page 38
Monte Carlo Methods with R: Random Variable Generation [37]
General Transformation Methods
◮ When a density f is linked in a relatively simple way
⊲ To another distribution easy to simulate
⊲ This relationship can be use to construct an algorithm to simulate from f
◮ If the Xi’s are iid Exp(1) random variables,
⊲ Three standard distributions can be derived as
Y = 2
ν
?
a
?
?a
j=1
Xj∼ χ2
2ν,ν ∈ N∗,
Y = β
j=1
j=1Xj
?a+b
Xj∼ G(a,β) ,a ∈ N∗,
Y =
j=1Xj
∼ Be(a,b), a,b ∈ N∗,
where N∗= {1,2,...}.
Page 39
Monte Carlo Methods with R: Random Variable Generation [38]
General Transformation Methods
χ2
6Random Variables
◮ For example, to generate χ2
6random variables, we could use the R code
> U=runif(3*10^4)
> U=matrix(data=U,nrow=3) #matrix for sums
> X=-log(U)
> X=2* apply(X,2,sum)
#uniform to exponential
#sum up to get chi squares
◮ Not nearly as efficient as calling rchisq, as can be checked by the R code
> system.time(test1());system.time(test2())
usersystem elapsed
0.1040.000 0.107
usersystem elapsed
0.0040.0000.004
◮ test1 corresponds to the R code above
◮ test2 corresponds to X=rchisq(10^4,df=6)
Page 40
Monte Carlo Methods with R: Random Variable Generation [39]
General Transformation Methods
Comments
◮ These transformations are quite simple and will be used in our illustrations.
◮ However, there are limits to their usefulness,
⊲ No odd degrees of freedom
⊲ No normals
◮ For any specific distribution, efficient algorithms have been developed.
◮ Thus, if R has a distribution built in, it is almost always worth using
Page 41
Monte Carlo Methods with R: Random Variable Generation [40]
General Transformation Methods
A Normal Generator
◮ Box–Muller algorithm - two normals from two uniforms
◮ If U1and U2are iid U[0,1]
◮ The variables X1and X2
X1=
?
−2log(U1) cos(2πU2) ,X2=
?
−2log(U1) sin(2πU2) ,
◮ Are iid N(0,1) by virtue of a change of variable argument.
◮ The Box–Muller algorithm is exact, not a crude CLT-based approximation
◮ Note that this is not the generator implemented in R
⊲ It uses the probability inverse transform
⊲ With a very accurate representation of the normal cdf
Page 42
Monte Carlo Methods with R: Random Variable Generation [41]
General Transformation Methods
Multivariate Normals
◮ Can simulate a multivariate normal variable using univariate normals
⊲ Cholesky decomposition of Σ = AA′
⊲ Y ∼ Np(0,I) ⇒ AY ∼ Np(0,Σ)
◮ There is an R package that replicates those steps, called rmnorm
⊲ In the mnormt library
⊲ Can also calculate the probability of hypercubes with the function sadmvn
> sadmvn(low=c(1,2,3),upp=c(10,11,12),mean=rep(0,3),var=B)
[1] 9.012408e-05
attr(,"error")
[1] 1.729111e-08
◮ B is a positive-definite matrix
◮ This is quite useful since the analytic derivation of this probability is almost always impossible.
Page 43
Monte Carlo Methods with R: Random Variable Generation [42]
Discrete Distributions
◮ To generate discrete random variables we have an “all-purpose” algorithm.
◮ Based on the inverse transform principle
◮ To generate X ∼ Pθ, where Pθis supported by the integers,
⊲ We can calculate—the probabilities
⊲ Once for all, assuming we can store them
p0= Pθ(X ≤ 0),p1= Pθ(X ≤ 1),p2= Pθ(X ≤ 2), ... ,
⊲ And then generate U ∼ U[0,1]and take
X = k if pk−1< U < pk.
Page 44
Monte Carlo Methods with R: Random Variable Generation [43]
Discrete Distributions
Binomial
◮ Example To generate X ∼ Bin(10,.3)
⊲ The probability values are obtained by pbinom(k,10,.3)
p0= 0.028,p1= 0.149,p2= 0.382,...,p10= 1,
⊲ And to generate X ∼ P(7), take
p0= 0.0009,p1= 0.0073,p2= 0.0296,... ,
⊲ Stopping the sequence when it reaches 1 with a given number of decimals.
⊲ For instance, p20= 0.999985.
◮ Check the R code
Page 45
Monte Carlo Methods with R: Random Variable Generation [44]
Discrete Distributions
Comments
◮ Specific algorithms are usually more efficient
◮ Improvement can come from a judicious choice of the probabilities first computed.
◮ For example, if we want to generate from a Poisson with λ = 100
⊲ The algorithm above is woefully inefficient
⊲ We expect most of our observations to be in the interval λ ± 3√λ
⊲ For λ = 100 this interval is (70,130)
⊲ Thus, starting at 0 is quite wasteful
◮ A first remedy is to “ignore” what is outside of a highly likely interval
⊲ In the current example P(X < 70) + P(X > 130) = 0.00268.
Page 46
Monte Carlo Methods with R: Random Variable Generation [45]
Discrete Distributions
Poisson R Code
◮ R code that can be used to generate Poisson random variables for large values
of lambda.
◮ The sequence t contains the integer values in the range around the mean.
> Nsim=10^4; lambda=100
> spread=3*sqrt(lambda)
> t=round(seq(max(0,lambda-spread),lambda+spread,1))
> prob=ppois(t, lambda)
> X=rep(0,Nsim)
> for (i in 1:Nsim){
+ u=runif(1)
+X[i]=t[1]+sum(prob<u)-1 }
◮ The last line of the program checks to see what interval the uniform random
variable fell in and assigns the correct Poisson value to X.
Page 47
Monte Carlo Methods with R: Random Variable Generation [46]
Discrete Distributions
Comments
◮ Another remedy is to start the cumulative probabilities at the mode of the dis-
crete distribution
◮ Then explore neighboring values until the cumulative probability is almost 1.
◮ Specific algorithms exist for almost any distribution and are often quite fast.
◮ So, if R has it, use it.
◮ But R does not handle every distribution that we will need,
Page 48
Monte Carlo Methods with R: Random Variable Generation [47]
Mixture Representations
◮ It is sometimes the case that a probability distribution can be naturally repre-
sented as a mixture distribution
◮ That is, we can write it in the form
f(x) =
?
Y
g(x|y)p(y) dy
or
f(x) =
?
i∈Y
pifi(x) ,
⊲ The mixing distribution can be continuous or discrete.
◮ To generate a random variable X using such a representation,
⊲ we can first generate a variable Y from the mixing distribution
⊲ Then generate X from the selected conditional distribution
Page 49
Monte Carlo Methods with R: Random Variable Generation [48]
Mixture Representations
Generating the Mixture
◮ Continuous
f(x) =
?
Y
g(x|y)p(y) dy ⇒ y ∼ p(y) and X ∼ f(x|y), then X ∼ f(x)
◮ Discrete
f(x) =
?
i∈Y
pifi(x) ⇒ i ∼ piand X ∼ fi(x), then X ∼ f(x)
◮ Discrete Normal Mixture R code
⊲ p1∗ N(µ1,σ1) + p2∗ N(µ2,σ2) + p3∗ N(µ3,σ3)
Page 50
Monte Carlo Methods with R: Random Variable Generation [49]
Mixture Representations
Continuous Mixtures
◮ Student’s t density with ν degrees of freedom
X|y ∼ N(0,ν/y)
νthen from the corresponding normal distribution
and
Y ∼ χ2
ν.
⊲ Generate from a χ2
⊲ Obviously, using rt is slightly more efficient
◮ If X is negative binomial X ∼ Neg(n,p)
⊲ X|y ∼ P(y)
⊲ R code generates from this mixture
and
Y ∼ G(n,β),
x
Density
0 1020 30 4050
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Download full-text