ArticlePDF Available

Simulating Correlated Multivariate Pseudorandom Numbers

Authors:

Abstract and Figures

A modification of the Kaiser and Dichman (1962) procedure of generating multivariate random numbers with specified intercorrelation is proposed. The procedure works with positive and non-positive definite population correlation matrix. A SAS module is also provided to run the procedure.
Content may be subject to copyright.
Simulating Correlated Multivariate Pseudorandom Numbers
Ali A. Al-Subaihi
Institute of Public Administration
Riyadh 11141, Saudi Arabia
subaihia@ipa.edu.sa
1
Abstract
A modification of the Kaiser and Dichman (1962) procedure of generating multivariate
random numbers with specified intercorrelation is proposed. The procedure works with
positive and non-positive definite population correlation matrix. A SAS module is also
provided to run the procedure.
Key words: Random Numbers, Multivariate Random Numbers, Random Number Generation.
2
1. INTRODUCTION
A fundamental task of quantitative researches is to make probability-based inferences about
population characteristics, θ, based on an estimator, θ
)
, using a "representative" sample drawn
from that population. However, the statistics of classical parametric inference inform us about
how the population works to the extent necessary mathematical assumptions are met, and
violation of any of the mathematical assumptions may harm the accuracy of the research
conclusion(s). In regression, for instance, when the slope of the independent variable (x) is
statistically significant and BLUE (best linear unbiased estimator), a researcher can clearly
expect what happens to the dependent variable (y) when x changes one unit provided that the
usual mathematical assumptions of the regression (independence, homogeneity of variance,
and normality) are met. If any of these assumptions is violated, the risk of inference from the
ordinary lease squares (OLS) is seriously off the mark (Moony, 1997).
In real world, however, there are numerous situations where data violates at least one
of the mathematical assumptions of the inferential statistic that is going to be used in analysis.
Therefore, statisticians as well as researchers are interested to know how the mathematical
assumptoin(s) violation affects the inferntial statistics’ power (the probability of correctly
rejecting a false null hypothesis) and/or the Type I error rate (the probability of incorrectly
rejecting a true null hypothesis). Fortunately, meaningful investigation of many problems in
statistics can be done through Monte Carlo simulations (Brooks et al., 1999).
1.1 Statement of the Problem
Monte Carlo simulation is a computer-based technique that enables one to generate artificially
random samples from populations with known characteristics in order to control some
variables and manipulate others to investigate the effect of the manipulation on the robustness
of a statistic. For more details about Monte Carlo simulation, the reader is referred to Brooks
et al., 1999 and Mooney, 1997.
3
Monte Carlo simulations that require correlated data from normal and nonnormal
populations are frequently used to investigate the small sample properties of competing
statistics, or the comparison of estimation techniques (Headrick & Sawilowsky, 1999). And,
the widely used technique to generate correlated data from normal population (which is the
paper's concentration) is the Kaiser and Dichman (1962) procedure. The procedure uses a
component analysis (e.g., principal components, square-root components) of a positive
definite population correlation matrix (R) for generating multivariate random numbers with
specified intercorrelations.
The procedure depends mainly on the decomposition of R and works only with a
positive definite R. This limitation is an obstacle often faces users because not all real world
situations have a positive definite R. Thus, a modification for this procedure (or development
of a new one) is highly desired.
1.2 Purpose of the Study
The study proposes a modification of the Kaiser and Dichman procedure in order to decrease
its limitation effect. That is, the paper presents techniques that enable the Kaiser and Dichman
procedure’ users to generate multivariate correlated pseudorandom numbers using a non-
positive definite R.
4
2. REVIEW OF THE LITERATURE
The importance of the problem of generating a correlated pseudorandom numbers comes up
from the considerable attention that has been given to the Monte Carlo studies; Monte Carlo
studies have been of interest to applied statisticians and continue to receive large focus in the
recent statistical literature.
The first to introduce a method for generating correlated pseudorandom numbers was
Hoffman in 1924. He presented a simple technique that is used only to simulate two
correlated variables. Then, Kaiser and Dickman (1962) extended the Hoffman’s procedure for
m 2, where m is the number of the correlated variables. They proposed a method for
generating sample and population score matrices and sample correlation matrices from a
given R. The procedure utilizes component analysis for generating multivariate random
numbers. The disadvantage of the Kaiser and Dickman procedure is that it depends on matrix
factorization that requires positive definiteness, and this condition does not frequently hold.
Fleishman (1978) noted that real-world distributions of variables are typically
characterized by their first four moments (i.e., mean, variance, skew, and kurtosis) and
presented a procedure for generating normal as well as nonnormal random numbers with
these moments specified. The procedure accomplishes this by simply taking a linear
combination of a random number drawn from a normal distribution and its square and cube.
Tadikamalla (1980) criticized the Fleishman's procedure for producing distributions of
variables for which the exact distribution was known and thus lacked probability-density and
cumulative-distribution functions and which, furthermore, could not produce distributions
with all possible combinations of skew and kurtosis.
Tadikamalla (1980) proposed five alternative procedures for generating nonnormal
random numbers and compared them all with Fleishman power method for speed of
5
execution, simplicity of implementation, and generality of their ability to generate nonnormal
distributions.
Vale and Maurelli (1983) extended the Fleishman (1978) and Kaiser and Dickman
(1962) procedures to the multivariate case. They proposed a method for generating
multivariate normal and nonnormal distributions with specified intercorrelations and marginal
means, variances, skews, and kurtoses. However, like Kaiser and Dickman (1962) procedure ,
the procedure fails to work when R is not positive definite. Not only that, but also the
procedure fails to generate desired intercorrelations when the conditional distributions possess
extreme skew and/or heavy tails, even for sample sizes as large as N = 100 (Headrick and
Sawilowsky, 1999).
Headrick and Sawilowsky (1999) proposed a method that combines a generalization
of Theorem 1 and 2 from Knapp and Swoyer (1967) with the Fleishman (1978) procedure.
The procedure generates multivariate nonnormal distributions with average values of
intercorrelations closer to the population parameters for skewed and/or heavy tailed
distributions and for small sample sizes. Although the procedure eliminates the necessity of
conducting a factorization procedure on R that underlies the random deviate, yet it still
requires the positive definiteness of R.
Briefly, all procedures that can be used to generate multivariate pseudorandom
numbers require either directly or indirectly positive definite R and do not work without it,
but the Fleishman’s (1978). This study will present a modified technique that minimizes the
reliance on positive definiteness condition for R.
6
3. DESCRIPTION OF THE PROCEDURES
3-1 Kaiser and Dichman procedure
The Kaiser and Dichman (1962) procedure is generally applied to generate multivariate
normal random numbers, and uses a matrix decomposition procedure. A Cholesky
factorization (or any factorization, for that matter) is performed on R that is to underlie the
random numbers. To generate a multivariate random number, one random number is
generated for each component, and each random variable is defined as the sum of the products
of the variable’s component loadings and the random number corresponding to each of the
components. When the random numbers input are normal, the resulting random numbers are
multivariate normal with population intercorrelations equal to those of the matrix originally
decomposed.
3-2 The Modified Procedure
The modified procedure is also used to generate multivariate normal pseudorandom numbers
utilizing R. However, unlike other procedures, it works with a positive as well as a non-
positive definite R.
To generate a correlated multivariate data matrix with specific R,
we rewrite R as a block matrix such as
] [ knqnmn ××× =XYD
=CB
BA
R
where and are the correlation matrices of Y and X, respectively, and the is the
correlation matrix between Y and X. The division is assumed to occur intentionally to ensure
positive definiteness of both and . This is a necessary condition (i.e., the method
dose not work without it).
qq×
Akk×
Ckq×
B
qq×
Akk×
C
Next, one of the following techniques may be applied.
7
3-2-1 When A q × q and q =1
This technique is applied to generate multivariate pseudorandom numbers when R can be
partitioned into three matrices: , B, and C, where k is the number of x's and k >1
such as
11×
Ak×1kk×
=
=
1
1
1
1
1
2
1
21
xxyx
x
xyx
xxyx
yxyxyx
k
k
CB
B
ρρρ
ρ
ρρ
ρρρ
ρρρ
L
OOMM
MO
L
L
R
The technique is performed as follows:
A total of k correlation matrices that contain the correlation between y and each x
individually are created such as i = 1, 2, …, k. Then, a multivariate
normal data matrix ( ) is generated through the equation
=1
1
yx
yx
i
i
i
ρ
ρ
B
kn×
X
UXX ˆ
+=µ
where X is the multivariate normal data matrix,
µ
is the vector containing the variable
means, contains vectors of independent and standard normal variates, U is the Cholesky
upper triangular matrix of C. The factorization is known as the Cholesky
factorization.
X
ˆ
kk×Tkk UUC =
×
A data vector (y n × 1) of standard normal variates is generated and concatenated
horizontally with each column of the matrix X individually such as
Ti =[y Xi] i = 1, 2, …, k). This will gives us a total of k matrices each of size (n × 2).
A total of k vectors Zi of size (n × 1) are generated independently through the
equation Zi =
µ
+ Ti Uic2
i = 1, 2, …, k
8
where Uic2 is the 2nd column of the Cholesky upper triangular matrix of Bi which was
created in the beginning. The vector y is then concatenated horizontally with Zi i = 1,
2, …, k to create data matrix of size (n × k+1) such as D n × k+1 = [ y Z1 Z2Zk].
The resulting data matrix has a correlation matrix that most likely varies from the
given population matrix (R) and the change takes place mainly in C. Thus, to get data
matrix that has a correlation matrix close to (i.e. with average intercorrelation among the
x’s around) the desired R, we manipulate the actual correlation value among the x’s, and
repeat the process.
The average intercorrelation among the x’s is assessed using the measure proposed by
Kaiser (1968)
1
1
λ
=γ O (3.1)
where λ is the largest eigenvalue of the correlation matrix, and O is the number of
variables (which is k when measuring the average intercorrelation among the x’s). The
larger the value of
γ
, the greater the correlation; if
γ
= 0, there is no correlation.
γ
= 1
when the correlations among the variables in the set are quite high. Accordingly, one
should expect that
γ
takes values near the values of
ρ
x.
Simulation Example 1
Suppose a data matrix of size (20 × 5) with specific intercorrelations need to be generated,
and suppose
=
18.08.08.00
8.018.08.00
8.08.018.05.0
8.08.08.015.0
005.05.01
4321 xxxxy
R
9
The Kaiser and Dickman procedure cannot be used here to generate the required
data because the provided R is not positive definite. Thus, the new procedure is going to
be implemented instead (the SAS module in the Appendix A can be utilized to simulate
the data using the modified procedure).
Table 1 shows the correlation matrix (Ri) used in the procedure and the correlation
matrix of the simulated data (Rei) at attempt i. At the second attempt (when the correlation
among the x’s was replaced by 0.84 in the original R), we had data with correlation matrix
(Re2) that is closer to the original R (which equals R1) than those of the first (Re1) and the
third (Re3) attempts.
Notice that Rei in the Table are full rank (i.e., Rank(Rei) = 5), and usually
compared with the original R. R2, R3, … etc. are only used in the method to simulate the
needed data.
10
TABLE 1 The Theoretical and Empirical Correlations Values
The Correlation matrix used The Correlation matrix gotten
y x
1
x2
x3
x4
y x
1
x2
x3
x4
y 1 0.5 0.5 0 0 y 1 0.5013 0.4948 -4E-04 -0.006
x1 0.5 1 0.8 0.8 0.8 x10.5013 1 0.8497 0.6954 0.689
x2 0.5 0.8 1 0.8 0.8 x20.4948 0.8497 1 0.6919 0.6896
x3 0 0.8 0.8 1 0.8 x3-4E-04 0.6954 0.6919 1 0.7982
R1
x4 0 0.8 0.8 0.8 1
Re1
x4-0.006 0.689 0.6896 0.7982 1
γ = 0.736
1 0.5 0.5 0 0 1 0.4964 0.5008 0.0011 0.0050
0.5 1 0.84 0.84 0.84 0.4964 1 0.8819 0.7317 0.7363
0.5 0.84 1 0.84 0.84 0.5008 0.8819 1 0.7272 0.7305
0 0.84 0.84 1 0.84 0.0011 0.7317 0.7272 1 0.8432
R2
0 0.84 0.84 0.84 1
Re2
0.0050 0.7363 0.7305 0.8432 1
γ = 0.776
1 0.45 0.45 0 0 1 0.4558 0.4552 0.0068 -0.004
0.45 1 0.84 0.84 0.84 0.4558 1 0.8761 0.7515 0.7466
0.45 0.84 1 0.84 0.84 0.4552 0.8761 1 0.7517 0.7478
0 0.84 0.84 1 0.84 0.0068 0.7515 0.7517 1 0.8424
R3
0 0.84 0.84 0.84 1
Re3
-0.004 0.7466 0.7478 0.8424 1
γ = 0.786
Note: Ri is the correlation matrix used to simulate the required data; Rei is the average correlation matrix of the data
simulated 1000 times; γ is the Kaiser’s Gamma.
The real and cpu times needed for 1000 replications = 0.10 and 0.07 seconds, respectively
when a hp with 1.70GHz processor and 256 KB RAM is used. The standard error matrix of
Re2 is as follows:
11
y x
1
x2
x3
x4
y 0 0.0054 0.0054 0.0069 0.0070
x1 0.0054 0 0.0018 0.0036 0.0037
x2 0.0054 0.0018 0 0.0038 0.0038
x3 0.0069 0.0036 0.0038 0 0.0024
Re2
x4 0.0070 0.0037 0.0038 0.0024 0
3-2-2 When A q × q and q 2
This technique is applied when R can be partitioned into three matrices: , B, and
, where q >1 and k>1 are numbers of the y's and the x's, respectively, such as
qq×
Akq×
kk×
C
=
=
1ρρρρρ
ρ1ρρρρ
ρρ1ρρρ
ρρρ1ρρ
ρ
ρρρ1ρ
ρρρρρ1
CB
BA
xxyxyxyx
xxyxyxyx
xxyxyxyx
xyxyxyyy
y
xyxyxyy
xyxyxyyy
qk2k1k
q22212
q12111
kq2q1q
k22212
k12111
LL
MOOMMLMM
OL
LL
LL
MLMMOOM
LMO
LL
R
In this case, the technique is performed in this way:
A total of (q × k) correlation matrices that contain the correlation between each y
and each x are created such as i = 1, 2, …, q and j = 1, 2, …, k.
Then a multivariate normal data matrix ( ) is generated through the equation
=1
1
ij
ji
yx
xy
ij ρ
ρ
B
kn×
X
UXX ˆ
+=µ
where X is the multivariate normal data matrix,
µ
is the vector containing the variable
means, contains vectors of independent and standard normal variates, U is the Cholesky
upper triangular matrix of C.
X
ˆ
kk×
12
Similarly, a multivariate normal data matrix ( ) is generated using the
Cholesky upper triangular matrix of . Next, each column of the matrix Y is
concatenate horizontally with each column of the matrix X individually (i.e., Tij =[ Yi
Xj] i = 1, 2, …, q and j = 1, 2, …, k), which gives a total of (q × k) matrices each of
size (n × 2).
qn×
Y
qq×
A
A total of (q × k) vectors Zij of size (n × 1) are generated through the equation
Zij =
µ
+ Tij Uijc2
i = 1, 2, …, q and j = 1, 2, …, k
where Uijc2 is the 2nd column of the Cholesky upper triangular of the matrix Bij which was
created at first. Afterward, the vectors Zj j = 1, 2, …, k are concatenate horizontally to
create data matrix of size (n × k) for each Yi individually such as ] ... [
~
21 ikii ZZZX =
×kn
i.
The matrices kn
j
×
X
~
i = 1, 2, …, q are then summed together to create data
matrix ( ), and then the matrix Y is concatenate horizontally with the matrix
=
×
q
i
kn
i
1
~
X
q
×kn
i
~
X
=
=
q
i1
~
Xto create the required data matrix ( D).
kqn +×
Similar to the first case, the resulting data matrix ( ) has a correlation matrix
that most likely varies from the population matrix and the change, once again, takes place
mainly in C. Thus, to get data matrix that has a correlation matrix close to (i.e. with
average intercorrelation among the x’s around) the desired population correlation matrix,
we manipulate the actual correlation value among the x’s, and repeat the process.
kqn +×
D
Simulation Example 2
Suppose a data matrix of size (20 × 6) with specific intercorrelation need to be generated
and suppose
13
=
14.04.01.01.01.0
4.014.07.07.07.0
4.04.017.07.07.0
1.07.07.015.05.0
1.07.07.05.015.0
1.07.07.05.05.01
R
Again, the Kaiser and Dickman procedure does not work in this situation because
the provided R is not positive definite. Consequently, the modified procedure is going to
be implemented as an alternative (the SAS module in Appendix A also can be utilized to
simulate the data using the modified procedure).
Table 2 shows the correlation matrix (Ri) used in the procedure and the correlation
matrix of the simulated data (Rei) at attempt i. At the third attempt (when the correlation
among the x’s was replaced by 0.84 in the original population correlation matrix), we had
data with correlation matrix (Re3) that is closer to the original population correlation
matrix (R1) than those of the first (Re1) and the second (Re2) attempts.
14
TABLE 2 The Theoretical and Empirical Correlations Values
The Correlation matrix used The Correlation matrix gotten
y1 y
2 y
3 x
1 x
2 x
3
y1 y
2 y
3 x
1 x
2 x
3
y1 1 0.5 0.5 0.7 0.7 0.1 y11 0.4959 0.4931 0.5165 0.5101 0.0700
y2 0.5 1 0.5 0.7 0.7 0.1 y20.4959 1 0.4972 0.5079 0.5079 0.0730
y3 0.5 0.5 1 0.7 0.7 0.1 y30.4931 0.4972 1 0.5119 0.5054 0.0635
x1 0.7 0.7 0.7 1 0.4 0.4 x10.5165 0.5079 0.5119 1 0.6329 0.3546
x2 0.7 0.7 0.7 0.4 1 0.4 x20.5101 0.5079 0.5054 0.6329 1 0.3672
R1
x3 0.1 0.1 0.1 0.4 0.4 1
Re1
x30.0700 0.0730 0.0635 0.3546 0.3672 1
γ = 0.4585
1 0.5 0.5 0.9 0.9 0.1 1 0.5072 0.5061 0.7071 0.7015 0.0666
0.5 1 0.5 0.9 0.9 0.1 0.5072 1 0.5098 0.7074 0.7063 0.0726
0.5 0.5 1 0.9 0.9 0.1 0.5061 0.5098 1 0.7067 0.7068 0.0748
0.9 0.9 0.9 1 0.4 0.4 0.7071 0.7074 0.7067 1 0.8448 0.2731
0.9 0.9 0.9 0.4 1 0.4 0.7015 0.7063 0.7068 0.8448 1 0.2768
R2
0.1 0.1 0.1 0.4 0.4 1
Re2
0.0666 0.0726 0.0748 0.2731 0.2768 1
γ = 0.4982
1 0.5 0.5 0.9 0.9 0.1 1 0.4984 0.4969 0.7028 0.7015 0.0649
0.5 1 0.5 0.9 0.9 0.1 0.4984 1 0.4965 0.6989 0.7002 0.0604
0.5 0.5 1 0.9 0.9 0.1 0.4969 0.4965 1 0.7002 0.6995 0.0631
0.9 0.9 0.9 1 0.1 0.1 0.7028 0.6989 0.7002 1 0.7613 0.1193
0.9 0.9 0.9 0.1 1 0.1 0.7015 0.7002 0.6995 0.7613 1 0.1163
R3
0.1 0.1 0.1 0.1 0.1 1
Re3
0.0649 0.0604 0.0631 0.1193 0.1163 1
γ = 0.3981
Note: Ri is the correlation matrix used to simulate the required data; Reiis the average correlation matrix of the data
simulated 1000 times; γ is the Kaiser’s Gamma; the real and cpu times needed for 1000 replications = 0.57 and 0.10
seconds, respectively.
The real and cpu times needed for 1000 replications = 0.57 and 0.10 seconds, respectively
when a hp with 1.70GHz processor and 256 KB RAM is used. The standard error matrix of
Re2 is as follows:
15
y1 y
2 y
3 x
1 x
2 x
3
y1 0 0.0055 0.0056 0.0039 0.0039 0.0073
y2 0.0055 0 0.0057 0.0039 0.0041 0.0071
y3 0.0056 0.0057 0 0.0040 0.0041 0.0073
x1 0.0039 0.0039 0.0040 0 0.0033 0.0073
x2 0.0039 0.0041 0.0041 0.0033 0 0.0072
Se =
x3 0.0073 0.0071 0.0073 0.0073 0.0072 0
16
4. CONCLUSION
The paper has suggested a modification to the Kaiser and Dickman (1962) procedure of
generating multivariate normal pseudorandom numbers. As demonstrated in the numerical
examples, the procedure can produce random numbers with intercorrelation near the
population correlation matrix when other procedures do not work at all. Although there are
few situations (especially when A or C are not positive definite) where the procedure does not
work, still it might be used to generate numbers close to the preferred intercorrelations by
manipulating the correlations among the y’s or the x’s beside the correlations between the y’s
and the x’s.
17
APPENDIX (A)
OPTIONS ls=100 ps=60 nodate nonumber;
proc iml;
/********* The Parameters **************/
NS=20; /* No. of subjects */
/* The population correlation matrix is entered as YY YX, XY XX */
PopCor={1 .5 .5 .7 .7 .1,
.5 1 .5 .7 .7 .1,
.5 .5 1 .7 .7 .1,
.7 .7 .7 1 .2 .2,
.7 .7 .7 .2 1 .2,
.1 .1 .1 .2 .2 1};
%Let NY=3; /* No. of the y's */
%Let NX=3; /* No. of the x's */
%Let NPC=9; /* No. of yx correlations = NY*NX */
/***************************************************/
NV=&NY+&NX; /* No. of Variables */
CorY= PopCor[1:&NY,1:&NY]; /* Corr. among the y's */
CorX= PopCor[&NY+1:NV,&NY+1:NV]; /* Corr. among the x's */
CorYX= PopCor[&NY+1:NV,1:&NY]; /* Corr. betw. the y's & the x's */
do i=1 to ncol(CorYX); /* Corr. betw. the y's & the x's as a column*/
CorYXs=CorYXs//CorYX[,i];
end;
%macro loop(NPC);
%Do i=1 %to &NPC; /* Bi's Correlation matrices */
Cryx&i=I(2);
Cryx&i[1,2]=CorYXs[&i,1];
Cryx&i[2,1]=CorYXs[&i,1];
%mend loop;
%end;
%loop (&npc);
X=Rannor(Repeat(0,NS,&NX))*root(CorX); /* The X data matrix */
y=Rannor(Repeat(0,NS,&NY))*root(CorY); /* The Y data matrix */
DaXs=0*j(ns,&NX);
%macro loop2 (NY);
%Let k=0;
%do j= 1 %to &NY;
%do i=1 %to &NX;
%Let c=%eval(&i+&K);
%put c=&c;
dat=(Y[,&j]||X[,&i])*(root(CrYX&c));
dat2=dat2||dat[,2];
%end;
%Let k=&c;
daXs=daXs+Dat2;
free dat2;
%end;
18
%mend loop2;
%loop2 (&NY );
daXs=daXs*(1/&NY);
data=Y||daXs; /* The final data matrix */
eg=eigval(corr(DaXs));
CXs=(eg[<>,1]-1)/(&NX-1); /* The average Correlations among all x's */
eg=eigval(corr(Y));
CYs=(eg[<>,1]-1)/(&NY-1); /* The average Correlations among all y's */
Call=corr(data); /* Correlations among all data */
ca=call[1:&NY,(&NY+1):(&NY+&NX)];
print 'The Correlations between Xs and Ys',ca,
'The average Correlations among all Xs = ' CXs,
'The average Correlations among all Xs = 'CYs,
'The total correlation matrix of the data', call;
quit;
19
REFERENCES
[1] Brooks, G. P.,& Robey, R. R. (1999), "Monte Carlo Simulation for Perusal and
Practice", Paper presented at the meeting of the American Educational Research
Association, Motereal, Quebec, Canada.
[2] Fleishman, A. (1978), "A Method for Simulating Non-Normal Distributions",
Psychomerika, 43 (4), 521-532.
[3] Headrick, T. C., & Sawilowsky, S. S. (1999), “Simulating Correlated Multivariate
Nonnormal Distributions Extending the Fleishman Power Method”,
Psychomerika, 64 (1), 25-35.
[4] Hoffman, P. J. (1924), "Generating Variables with Arbitrary Parameters"
Psychomerika, 24, 265-267.
[5] Kaiser, H. F., & Dickman, K. (1962), "Sample and Population Score Matrices and
Sample Correlation Matrices From an Arbitrary Population Correlation Matrix",
Psychomerika, 27 (2), 179-182.
[6] Moony, C. Z. (1997), "Monte Carlo Simulation", Sage Publications, Thousand
Oaks, CA.
[7] Tadikamalla, P. R. (1980), "On Simulating Non-Normal Distributions",
Psychomerika, 45 (2), 273-279.
[8] Vale, C. D., & Maurelli, V. A. (1983), "Simulating Multivariate Nonnormal
Distributions", Psychomerika, 48 (3), 465-471.
[9] Knapp, T.R., & Swoyer, V. H. (1967), "Some Empirical Results Concerning the
Power of Bartlett's Test of the Significance of a Correlation Matrix", American
Educational Research Journal, 4 (1), 13-17.
20
... The eighth approach is a method to generate random numbers from a general symmetric pseudocorrelation matrix R (possibly indefinite) [1]. The method is based on a Cholesky factorization of R which is modified to work with indefinite R matrices. ...
... At the first row iteration we find X (1) 1 such that f 1 (X (1) ...
... . At the second row iteration we find X (1) 2 such that f 2 (X (1) 2 ) < f 2 (X (0) 2 ). But at the third row iteration we have to enforce the constraint X 3 X T 2 = 0. X 2 changed in the previous step, so the X 3 that we find may result in f 3 (X ...
Article
We desire to find a correlation matrix R^ of a given rank that is as close as possible to an input matrix R, subject to the constraint that specified elements in R^ must be zero. Our optimality criterion is the weighted Frobenius norm of the approximation error, and we use a constrained majorization algorithm to solve the problem. Although many correlation matrix approximation approaches have been proposed, this specific problem, with the rank specification and the R^ij=0 constraints, has not been studied until now. We discuss solution feasibility, convergence, and computational effort. We also present several examples.
... Here, · F represents the Frobenius norm of the corresponding matrix. This problem attracted great interest of a number of researchers in past decade [1,4,5,6,9,11]. A general methodology for a valid correlation matrix was firstly presented in [6], which include the hypersphere decomposition method and the spectral decomposition method. These methods were further improved in [9]. ...
... Further, we depict the curve of Frobenius norm between A and computed approximate matrix versus the number of iteration in Figure 1. with A − H MMU F = 0.1107. This further verified that the MMU algorithm is efficient, since the obtained object function is smaller than that of the method in [1]. Moreover, the eigenvalues of these matrices are listed Table 3. Table 3, it is observed that H MMU is symmetric definite (nearly semidefinite), and thus a valid nonnegative correlation matrix. ...
Article
A modified multiplicative update algorithms is presented for computing the nearest correlation matrix. The convergence property is analyzed in details and a sufficient condition is given to guarantee that the proposed approach will not breakdown. A number of numerical experiments show that the modified multiplicative updating algorithm is efficient, and comparable with the existing algorithms.
... ly convergent and computationally efficient. Additionally, it is straightforward to implement and can handle arbitrary weights on the entries of the correlation matrix. A simulation study by the authors suggests that majorization compares favourably with competing approaches in terms of the quality of the solution within a fixed computational time. Al-Subaihi (2004) proposed a modification of Kaiser-Dichman procedure (see Kaiser and Dichman, 1962) to generate normally distributed (correlated) variates from a given negative semidefinite Q , which, in the process, is approximated by a positive definite * R matrix. The resulting variates satisfy the * R matrix. It appears that Al-Subaihi's method does ...
... A simulation study by the authors suggests that majorization compares favourably with competing approaches in terms of the quality of the solution within a fixed computational time. Al-Subaihi (2004) proposed a modification of Kaiser-Dichman procedure (see Kaiser and Dichman, 1962) to generate normally distributed (correlated) variates from a given negative semidefinite Q , which, in the process, is approximated by a positive definite * R matrix. The resulting variates satisfy the * R matrix. ...
Article
Full-text available
The nearest correlation matrix problem is to find a valid (positive semidefinite) correlation matrix, R(m,m), that is nearest to a given invalid (negative semidefinite) or pseudo-correlation matrix, Q(m,m); m larger than 2. In the literature on this problem, 'nearest' is invariably defined in the sense of the least Frobenius norm. Research works of Rebonato and Jaeckel (1999), Higham (2002), Anjos et al. (2003), Grubisic and Pietersz (2004), Pietersz, and Groenen (2004), etc. use Frobenius norm explicitly or implicitly. However, it is not necessary to define 'nearest' in this conventional sense. The thrust of this paper is to define 'nearest' in the sense of the least maximum norm (LMN) of the deviation matrix (R-Q), and to obtain R nearest to Q. The LMN provides the overall minimum range of deviation of the elements of R from those of Q. We also append a computer program (source codes in FORTRAN) to find the LMN R from a given Q. Presently we use the random walk search method for optimization. However, we suggest that more efficient methods based on the Genetic algorithms may replace the random walk algorithm of optimization.
... Journal of the National Science Foundation of Sri Lanka 39 (1) hypothesis, the mean difference between treatments (ε) ...
Article
Full-text available
In crossover trials, patients receive two or more treatments in a random order in different periods. The sample size determination is often an important step in planning a crossover study. This paper concerns sample size calculations in 2x2 crossover trials, with random patient effects and no interaction between the treatment and the patient under two scenarios, namely the exact and the large sample size approaches. Simulation was carried out for determining the sample size for both scenarios. For varying parameter values, simulation was used for generating samples of the required size and examining whether the significance level and power of the tests are maintained. The results indicate that when the sample size was ≤ 5, neither method maintained error rates and when the sample size was >5 and < 12 only the exact approach maintained error rates. However, when the sample size is approximately > 12 both methods maintained error rates. In addition it was found that a saving in sample size can be achieved depending on the extent of the correlation between the observations on the same patient. The simulation results indicate that crossover studies should not be conducted when the anticipated sample size is ≤ 5 and when a sample size of >5 and < 12 is anticipated, the exact method of determining sample size should be used. When larger sample sizes are anticipated either method can be used but the method based on large sample size approximation is simpler.
... Note that the correlation matrix should be positive semi-definite and that the multivariate distribution will be the same as In [17] the matrix is decomposed into its eigenvalues and eigenvectors, the highest eigenvalues are chosen and the matrix is reconstruct using them. In [18] a review of existing approaches is given and a new approach is proposed. This consists of decomposing the correlation matrix as a block matrix such that the matrices in the main diagonals are positive definite while the matrices in the other diagonals are transpose of each other. ...
... Multiple linear regression (MLR), deals with issues related to estimation, or prediction of the expected value of the dependent variable (y), using the known values of (k) predictors (x's). The statistical model of MLR is: 1 ( 1) ( 1) 1 1 n n k k n y X b × × + + × × = + e (1.1) where y is a vector of responses of n (independent) observations, X is the design matrix of rank k+1, b is a vector of parameters to be estimated or predicted, and e is the vector of residuals. ...
... Multiple linear regression (MLR), deals with issues related to estimation, or prediction of the expected value of the dependent variable (y), using the known values of (k) predictors (x's). The statistical model of MLR is: 1 ( 1) ( 1) 1 1 n n k k n y X b × × + + × × = + e (1.1) where y is a vector of responses of n (independent) observations, X is the design matrix of rank k+1, b is a vector of parameters to be estimated or predicted, and e is the vector of residuals. ...
Article
Full-text available
A Simulation study was conducted, to compare a number of univariate variable selection criteria that are available in either SAS or SPSS, in terms of their ability to select the "true" regression model for different sample sizes, intercorrelations, and interacorrelations. The results suggest that the ability of all procedures to identify the "true" model is less than 19%, and that sample size, intercorrelations, and interacorrrelations have no significant effect. The study also shows that all criteria are more likely to overfit by at least two variables, than to select the "true" model or underfit.
... The estimated matrices are presented in table-2.2. As the third example, we use the invalid correlation matrix (Q) reported in Al-Subaihi (2004). He has obtained the valid matrix, which is grossly inoptimal (Mishra, 2004) and hence we do not feel a necessity to present it here. ...
Article
Full-text available
The Pearsonian coefficient of correlation as a measure of association between two variates is highly prone to the deleterious effects of outlier observations (in data). Statisticians have proposed a number of formulas to obtain robust measures of correlation that are considered to be less affected by errors of observation, perturbation or presence of outliers. Spearman’s rho, Blomqvist’s signum, Bradley’s absolute r and Shevlyakov’s median correlation are some of such robust measures of correlation. However, in many applications, correlation matrices that satisfy the criterion of positive semi-definiteness are required. Our investigation finds that while Spearman’s rho, Blomqvist’s signum and Bradley’s absolute r make positive semi-definite correlation matrices, Shevlyakov’s median correlation very often fails to do that. The use of correlation matrices based on Shevlyakov’s formula, therefore, is problematic.
Article
The correlation matrix has a wide range of applications in finance and risk management. However, due to the constraints of practical operations, the correlation matrix cannot satisfy the positive semidefinite property in most cases. In this paper, an elementwisely alternative gradient algorithm and a columnwisely alternative gradient algorithm are presented to compute the nearest correlation matrix that satisfies the semidefinite property for a given set of constraints. The convergence properties and the implementation of these two algorithms are discussed. Numerical experiments show that the proposed methods are efficient. Furthermore, the columnwisely alternative gradient algorithm outperforms other algorithms in terms of the number of iterations and the objective value of the cost function.
Article
Full-text available
A procedure for generating multivariate nonnormal distributions is proposed. Our procedure generates average values of intercorrelations much closer to population parameters than competing procedures for skewed and/or heavy tailed distributions and for small sample sizes. Also, it eliminates the necessity of conducting a factorization procedure on the population correlation matrix that underlies the random deviates, and it is simpler to code in a programming language (e.g., FORTRAN). Numerical examples demonstrating the procedures are given. Monte Carlo results indicate our procedure yields excellent agreement between population parameters and average values of intercorrelation, skew, and kurtosis.
Article
Full-text available
A procedure for generating multivariate nonnormal distributions is proposed. Our procedure generates average values of intercorrelations much closer to population parameters than competing procedures for skewed and/or heavy tailed distributions and for small sample sizes. Also, it eliminates the necessity of conducting a factorization procedure on the population correlation matrix that underlies the random deviates, and it is simpler to code in a programming language (e.g., FORTRAN). Numerical examples demonstrating the procedures are given. Monte Carlo results indicate our procedure yields excellent agreement between population parameters and average values of intercorrelation, skew, and kurtosis.
Article
A method for generating multivariate nonnormal distributions with specified intercorrelations and marginal means, variances, skews, and kurtoses is proposed. As an example, the method is applied to the generation of simulated scores on three psychological tests administered to a single group of individuals.
Article
Six different algorithms to generate widely different non-normal distributions are reviewed. These algorithms are compared in terms of speed, simplicity and generality of the technique. The advantages and disadvantages of using these algorithms are briefly discussed.
Article
A method of introducing a controlled degree of skew and kurtosis for Monte Carlo studies was derived. The form of such a transformation on normal deviates [X N(0, 1)] isY =a +bX +cX 2 +dX 3. Analytic and empirical validation of the method is demonstrated.
Article
A method for generating sample and population score matrices and sample correlation matrices from a given population correlation matrix is developed. An example giving the desired matrices for a population Guttman simplex correlation matrix is presented.
Article
There are occasions in psychological research where it is desirable to have available sets of variables with arbitrary intercorrelations. A quite simple procedure is described for generating pairs of such variables.
Monte Carlo Simulation for Perusal and Practice , Paper presented at the meeting of the
  • G P Brooks
  • R R Robey
Brooks, G. P.,& Robey, R. R. (1999), Monte Carlo Simulation for Perusal and Practice , Paper presented at the meeting of the American Educational Research Association, Motereal, Quebec, Canada.