Available via license: CC BY-NC-SA 4.0
Content may be subject to copyright.
ITAMARACÁ: A NOVEL SIMPLE WAY TO GENERATE PSEUDO-RANDOM
NUMBERS
Daniel Henrique Pereira*
Email: researchdh.pereira@gmail.com
Orcid: https://orcid.org/0000-0003-4750-9659
*Business Administration student at Pontifical Catholic University of Minas Gerais
Abstract: In this paper was presented Itamaracá, a novel simple way to generate pseudo random
numbers. In general vision we can say that Itamaracá tends to pass in some statistical tests like
frequency, chi square, autocorrelation, run sequence and run test. As an effect to comparison also
was taking into account the results of the function RandBetween by Microsoft Excel and true
random numbers by Random Org analyzed its distinctive characteristics as well as with the proposal
model. In this sense, the goal of this study is contributing to growing the existing Pseudo Random
Number Generators (PRNGs) portfolio.
Keywords: Pseudo-random number generator, Itamaracá, Computer Science
INTRODUCTION
According to Knuth (1998), the generation of random numbers has too many practical applications
like simulations, sampling, numerical analysis, recreation, computer programming, decision
making, and studies on cryptography and aesthetics (computer science), for example.
Although statistical tests do not ensure the generating model is in fact good for practical application,
Vieira et al. (2004) raises a series of properties that a “good” random number generator should have
in order to be minimally considered acceptable for use. Among these properties we can cite:
uniformity, independence, long period, ease of implementation and efficiency, replicability,
portability and disjoint subsequences.
Das et al. (2018) states that in most cases, a Pseudo-random Number Generator (PRNG) does not
pass all desirable requirements, which is natural due to its deterministic nature. Thus, each PRNG
can be used for different types of practical applications, in specific, i.e., some may be useful for the
simulation field while others are geared exclusively to the information security field. Anyway, of all
these properties mentioned above by Vieira et al. (2004), surely the ones stand out in too many
literatures are the uniformity and independence of the numbers generated by the algorithm.
In the present study the PRNG called – Itamaracá - as well as its even more simplified version will
be presented. The goal is to present a new simple way to generate random numbers and thus
contribute to the increase of the portfolio in this field of study.
1 METHODOLOGY
This study can be considered qualitative and quantitative at the same time. As for the sampling
criteria of the generated random number sequences analyzed, convenience sampling was
considered.
In this paper, as a way to evaluate the proposed model, a series of tests and statistical tools were
considered evaluating, above all, the uniformity and independence criteria.
As for the uniformity criteria, it was used the frequency analysis with graphical tools through
histograms; frequency analysis with the chi square test and; Graphical analysis of the distribution of
numbers generated through the Scatter Graph.
Also included in the evaluation tests was the Analysis of the mean, standard deviation and repeated
numbers of the sequence of random numbers generated in the sample.
As independence criteria, the Autocorrelation function and the Run Test, a binary test considering
both odd and even values, as well as values below or above the median were used in this study.
Furthermore, the line graph tool was also used to observe the behavior of the numbers generated
throughout the series. Another important point, is that also as a way to measure the degree of
independence and the level of disorder, Shannon’s Entropy was used.
Regarding autocorrelation, the analysis of the values found for the first 10 lags – different of 0 - and
their respective average was used.
As a form of analysis, we can say that in the Run Test the first 100 numbers obtained were
considered and in the tests containing graphic visualizations (Line Graph, Scatter Plot and
Histogram) 1,000 numbers were considered. Furthermore, in the Chi-square Test, Autocorrelation,
repeated number analysis and Shannon Entropy, 10,000 number sequences were considered.
In this study, it should also be noted that with regard to data acquisition and manipulation,
especially the results generated and data visualization, Microsoft Excel software and the Random
Org platform were used.
2 THEORETICAL FRAMEWORK
2.1 Application fields of random numbers
Random number generation according to Rosa (2016) has too many practical applications in our
daily lives and in the most different fields of study such as, for example, physics, statistics,
computer science, astronomy, astrophysics, medicine. However, the author makes it clear and
intelligible that the closest thing to our daily lives is in the fields of data encryption, extremely
useful to have a higher level of anonymity of the data of a bank’s customers of a patient in a health
system, for example.
Moreover, we should also mention the great importance of random numbers for the field of event
simulation, from the behavior of demand and its respective profit or loss for a company to; as Rosa
(2016) has given us with examples such as the simulation of urban traffic in a large city.
Another social situation in which it is common to encounter random numbers in events directed to
the field of leisure and entertainment such as an example the use of random numbers for lottery
games, gambling, random selection of a music or video file on smartphones, among many other
examples.
2.2 Differentiation between random numbers and pseudo random numbers
According to Rosa (2016) it is natural when we do not delve deeply into the subject, carrying out
only common sense, we consider both random numbers (RN) and pseudo random numbers as the
same idea. Nevertheless, there are distinctions between these that we must make.
Pseudo random numbers have in their essence, a purely deterministic character, that is, there is a
function f(x) that given the same initial condition is expected to reach the same results, no matter
how many times the experiment is repeated. In physics, this becomes very clear with respect to
uniform motion, in which by having data such as time traveled and its respective distance, we can
measure the average speed given a unit of measurement during the course of a path.
On the other hand, before we understand what random numbers are, we must first pay attention to
what is meant by random. In this sense, we can simplistically say that something is truly random is
nothing more than something that occurs without any defined cause, that is, there is no cause-and-
effect relationship. Its main characteristic is its unpredictability. In Wikipedia, for example, we are
provided with some of its various meanings:
The expression randomness expresses breakdown of order, purpose, causation, or unpredictability in non-
scientific terminology. A random process is the repetitive process whose outcome does not describe a
deterministic pattern, but follows a probability distribution (WIKIPEDIA, 2021).
After gaining a little more insight into what randomness consists of, we can say that True Random
Numbers (TRN) consist, according to Rosa (2016), of those that possess the aforementioned
characteristics which in turn, are usually found in physical phenomena arising from nature,
considering everything from the throwing of dice to electrical circuits and radioactive decay, for
example.
Moreover, according to Hamid and Abdullah (2015), we should pay attention to the difference
between both terms from the point of view of four pillars: approach, efficiency, determinism, and
periodicity, as shown in the figure below:
Figure 1. Differences between Pseudo-Random Numbers and True-Random Numbers
by Hamid and Abdullah (2015).
Through Figure 1 by Hamid and Abdullah (2015) we note as for the approach, it will make us clear
and intelligible that, in the case of pseudo-random numbers, usually to generate random numbers
will be given through some algorithm mathematical formula and therefore, if we consider their
applications to the field of computer science, for example, we insert these algorithms so that the
computer can understand and reproduce these numbers given a purpose.
On the other hand, when it comes to truly random numbers, before they are inserted into computer
language, it is essential the algorithm and/or the generating source of these random numbers come
from sources in the natural environment, that is, without any human intervention, such as
atmospheric noise, radioactive decay, electrical circuits, among many other examples.
When it comes to efficiency, we see pseudo-random number algorithms tend to be more responsive
in generating numbers more quickly and easily. On the other hand, true random number algorithms
are not as responsive in generating numbers given their very nature.
Under the pillar of determinism, we also note pseudo-random number algorithms are extremely
deterministic, in other words, given the same initial condition we can reproduce the same
experiment and generate the same results. However, algorithms of truly random numbers do not
have this characteristic, so we cannot obtain the same results even if we start with the same initial
conditions, after all, random numbers do not come from mathematical formulations but from some
physical phenomenon of nature.
Considering the last pillar, that of periodicity, it becomes clear when it comes to pseudo-random
number algorithms, due to their deterministic character, we can soon expect the sequence of
numbers can be repeated if given the same initial conditions. As for algorithms of truly random
numbers, it is also to be inferred that, most of the time, there is a tendency that there is no
periodicity, a repetition of numbers, although in some rare cases due to the random phenomenon,
some or other number of a sequence may be repeated by the so-called coincidences.
At this point, it is necessary to conclude and make it understandable the so-called pseudo-random
numbers, according to Rosa (2016) are not in fact random. However, as soon as we humans cannot
identify and establish patterns in a given sequence of generated numbers, we can consider such
numbers to be “random”. In this sense, still according to Rosa (2016) not only in common sense but
also in too much literature on the subject it is common to trear both terms, pseudo-random and truly
random as a synonyms.
2.3 Understanding Pseudo Random Number Generators (PRNG)
Pseudo Random Number Generators (PRNG) are as the name suggests, random number generators
given a mathematical function f(x).
It is common, in PRNG models, to come across some basic concepts such as seed, cycle, tail and
period.
A "seed" is defined as all values, whether arbitrarily obtained or not that start the process of number
generation. As we can see, this is a reference to that which comes from a source and/or that which
"is born", "arises". Usually, in the literatures, the seed is expressed mathematically by the letters S.
We can say the “cycle” is the numbers that are generated before repeating sequentially.
The “tail” is the initial part of the numbers that are generated and that normally does not make up
the cycle.
Finally, we can say the “period” is the sum of the tail and the cycle. Full sequence of numbers
generated.
2.4 Properties of good Pseudo Random Number Generators (PRNG)
Pseudo-random number generators (PRNG) are as their name suggests, generators of random
numbers given a function f(x). According to Vieira et al. (2004), there are several aspects that can
determine whether a generating algorithm is good or not, such as:
Uniformity: the generating model should be able to generate numbers are very well distributed
given the class intervals considered. In this sense, all numbers in a given range must be ½ of
probability to came up and therefore, passing statistical tests like Chi Square Test.
Independence: the sequence of numbers considered random y1, y2, y3, yn... must possess the
characteristic of independence, that is, for example, not autocorrelated. The next result does not
depend on the previous results, at least in principle, thus making it difficult to predict which will be
the next numbers generated by this same sequence.
Long period: a pseudo-random number generator must be able to generate a large number of
numbers, without repeating themselves. In this respect, preferably without repeating number in
cycles.
Ease of implementation and efficiency: in this aspect, it is about the concern in having a generator
capable of generating random numbers considering the “computational cost”, so, among other
aspects, the smaller the amount of algebraic operations the better.
Replicable: pseudo-random number generators must be able to be repeatable, that is, given the same
parameters the same results are obtained. This aspect is very relevant for the field of simulations
and testing, for example.
Portability: it is about the portability of the generator to handle different types of computers.
Disjoint subsequences: generator must be able to generate new random numbers without therefore
going through all the intermediate states.
2.5 Known statistical tools and tests for evaluating pseudo-random number generators
2.5.1 Frequency and Chi-square Test
Stevenson (1981) shows us the chi-square test is an adherence test that is used to evaluate
statements made about a distribution in a population.
Figure 2. Chi-square Test formula by Lacerda et al. (2002).
Lacerda et al. (2002) states the chi-square test can be defined as the weighted difference of the
observed number of results, or if we prefer, the actual values obtained (Nk), within the interval k,
and the expected value (mk) as shown in figure 2 above.
We can say the chi-square test results are good when the D2 values are as small as possible,
especially considering the limits given by the degrees of freedom and desired confidence levels. If,
perchance, the values exceed these limits, we can say that perhaps the distribution is not considered
well distributed if we consider a uniform distribution, for example.
2.5.2 Linear graphs
Ander-Egg (1971) states that linear graphs are widely used for data visualization practice because of
their simplicity of construction on a Cartesian plane. Author also emphasizes although it is not one
of the best forms of statistical representation, it can be useful in identifying possible patterns in the
behavior of a variable along a generated series.
Figure 3. An example of a Linear Graph Image by Bencardino (2012).
2.5.3 Histogram
Histogram according to Levine et al. (2013) is a grouped bar chart containing numerical data, using
vertical bars so as to represent the frequencies or percentages in each group, class interval.
Through figure 4 obtained through author Bencardino (2012) we can observe a graphical example
of a histogram.
Figure 4. Graphical display of a histogram by Bencardino (2012).
2.5.4 Scatter Plot
A Scatter Plot according to Levine et al. (2013) is a great graphical way to analyze whether there is
any relationship between two variables X and Y, since two-dimensional points are drawn between
the variables. Thus, in the field of random number studies, it is an interesting way to analyze the
level of uniformity and independency of the data through the points generated between the
horizontal and vertical axes of the graph.
About the scatter plot, Bencardino (2012) gives us an example, as shown in Figure 5.
Figure 5. Example of a Scatter Plot by Bencardino (2012).
2.5.5 Average and Standard Deviation
The mean as stated by Levine et al. (2013) is a well-known measure of central tendency with too
many practical applications. It serves as a “balance point”, a “neutralizer” given a set of data. We
can calculate it by the ratio of the sum of the values by their respective number of values, as you
can see in the Figure 6, below obtained through the author Bencardino (2012).
Figure 6. Average’s fórmula by Bencardino (2012).
According to Stevenson (1981) standard deviation is a measure of dispersion that is defined as the
square root of the deviations from the mean. In addition, the standard deviation is also known as the
square root of the variance. In figure 7, we can see how it is calculated.
Figure 7. Standard Deviation’s Formula by Stevenson (1981).
2.5.6 Run Test
According to Rosa (2016) the Run Test is one of the ways we can evaluate whether or not a given
sequence of generated numbers comes from some random process.
Also according to the author previously mentioned, the probability of a number Xn being greater or
less than a given value Xi that “separates” the classes follows a Binomial distribution.
Rosa (2016) states that in Run Test it is also necessary to emphasize before calculating the Z-test
there are a number of steps we must go through with the data obtained from a generated sequence.
First, we must consider a binary character in the separation of the data, that is, numbers “above” or
“below” Xi value, where Xi is usually the median value of the complete sequence of generated
numbers; in addition, we can create two different categories such as “even numbers” or “odd
numbers”, for example.
Suppose that in a sequence of 30 numbers the following distribution of even (E) and odd (O)
numbers was observed:
EEEOOEEOEOOOOEOEEEOOOEOEEOOEEEO
The so-called “Number of Runs” of a given sequence of random numbers can be obtained through
contiguous groupings as stated by Rosa (2016). In this sense, in this series above, we will find a
number of runs equal to 16. Once counted, the number of runs observed will be useful for
calculating the expected number of runs, as shown in the figure 8 below.
Figure 8. Formula to calculate the number of expected runs by Rosa (2016).
After calculating the expected number of runs, we must calculate the standard deviation of the
number of sequences, as Rosa (2016) gives us his formulation in the figure below.
Figure 9. Standard deviation of the number of sequences by Rosa (2016).
After the previous process, we finally arrive at the Z-test calculation, in which through large
samples with n1 > 20 and n2 > 20 as stated by Rosa (2016), we can approximate the statistic with a
normal distribution in which we will perform a hypothesis test of, such as:
h0: sequence of numbers can be considered as random
h1: sequence of numbers cannot be considered as random.
Z-test can be calculated by the ratio of the difference between the observed number of runs by the
expected number of runs divided by the square root of the standard deviation of that same sequence
of numbers considered.
Figure 10. Z-test by Rosa (2016).
The null hypothesis can be rejected if the z value founded is greater than the critical value as shown
below.
Figure 11. Rejection of the null hypothesis by Rosa (2016).
2.5.7 Shannon Entropy
Shannon Entropy, a very important concept for the field of Information Theory, was named
according to Wu et al. (2012) after studies by Claude Shannon in 1948 through his work "A
Mathematical Theory of Communication".
This concept can be understood as “a measure of the uncertainty associated with a random
variable.Specifically, Shannon entropy quantifies the expected value of the information contained in
a message” (Wu et al.,2012, p.2).
Probabilistic Uncertainty Shannon can be defined by the following expression:
Figure 12. Shannon Entropy Equation by Wu et al. (2012).
2.5.8 Itamaracá
The model proposed in this study - Itamaracá - comes from the Tupi-Guarani language, in which it
refers to something like: "Stone that sings". In this sense, being, therefore, a reference to something
that is "random", "unexpected".
Like every pseudo random number generator, Itamaracá also has some distinctive features in its
mathematical algorithm. In this model, we must consider values for 3 seeds, S0, S1 and S2, besides
making it clear what the value of N is, that is, the maximum desirable value of "drawn" numbers
between the range 1 and N, where where N∈ℕ.
After selected all the 3 seeds, S0, S1 and S2, the calculation process is divided in two main and very
simple steps: Pn (n Process) and Final Calculation.
In topics 2.5.8.1 and 2.5.8.2 below it will be shown how the model was originally found in its crude
form and that little by little through its “crude” version an even more simplified version was
achieved, as shown in topic 2.5.8.3 in view of aspects such as ease of number generation and
computational costs.
2.5.8.1 Pn (n Process)
In this stage we need taking into account the absolute values considering the differences between
the 3 seeds that must be moving in the sequence.
Pn= ABS (S2 – S1 + S1 – S0)
Where,
Pn = n Process
ABS = Absolute Value
S1, S2 and S3 = seeds pertenced in the range of N selected by a user criterion
After this process, we need to obtain its final result.
2.5.8.2 Final Calculation
In this step, we must multiply the “x” result obtained in the first step (in Pn) by the square root value
in which its founded value is desirable to be near to 2. That is, so there may be possibilities to
generate both even and odd numbers in the sequence and the same time, allow us to generate well
distributed numbers within the range of N.
FRNSn = ABS {N - [Pn * SQUARE ROOT(1< Xrn <4)]}
Where:
FRNSn = Future values of the sequence of random numbers given a current time period of n.
ABS = Absolute value.
N = Maximum value within a range selected by a user criterion.
Pn = n Process
Xrn = Rational number with an arbitrary number of decimal places in the range 1 to 4.
2.5.8.3 Simplified version of Itamaracá
In view of the problematic issue as pointed out in too much literature on computational costs, and
that the simpler a PRNG model is the better, an even simpler way to obtain a sequence of random
numbers through Simplified Itamaracá will be presented.
The simplified version of Itamaracá, or its final version found and also understood as the closest
alternative to computational feasibility, has some features would be unnecessary in its original first
version, such as in the n Process (Pn) step the formulation ABS (S2 – S1 + S1 – S0) is exactly equal to
ABS (S2 – S0). In this sense, we can use ABS (S2 – S0) in the n Process (Pn) step. Therefore, we can
arrive at the smallest number of mathematical operations to achieve the same result.
Another relevant point, we can disregard the extraction of the square root of an arbitrarily chosen x
value between 1 < Xrn < 4 for just a “fixed” value close to 2 from a rational number with decimals
also arbitrarily chosen. This obtained value, therefore, will be multiplied by Pn which in turn will be
subtracted by N considering absolute values. In this sense, follow the simplified algorithm below:
FRNSn = ABS [N – (Pn * Xrn)]
Where:
FRNSn = Future values of the sequence of random numbers given a current time period of n.
ABS = Absolute value.
N = Maximum value within a range selected by a user criterion.
Pn = n Process
Xrn = “Fixed” rational number close to 2 with an arbitrary number of decimal places.
3 RESULTS AND DISCUSSIONS
3.1 Itamaracá Reviews
As a way of exemplifying how the Itamaracá model works, the following introductory values for N
and for the seeds S0, S1 and S2 will be considered:
Table 1
Introductory values for the Itamaracá
N = 10,000
S0 = 4,120
S1 = 1,300
S2 = 490
First four numbers generated can be demonstrated below:
1st number:
P1 = ABS (490 – 1300 + 1300 – 4120) = 3,630
FRNS1 = ABS {10,000 - [3,630 * SQUARE ROOT(3.9)]} = 2,831
2nd number:
P2 = ABS (2,831 – 490 + 490 – 1,300) = 1,531
FRNS2 = ABS {10,000 – [1,531 * SQUARE ROOT(3.9)]} = 6,976
3rd number:
Here, we can use a "line break" between the differences of the seed values. In this sense, P2 looking
like this:
P2= ABS (S2n+1 – S2 + S1 - S0)
Putting the values into the algorithm,
P3 = ABS (6,976 – 2,831 + 490 – 1300) = 3,335
FRNS2 = ABS {10,000 - [3,335 * SQUARE ROOT(3.9)]} = 3,415
It should be emphasized that the “line break” is an optional item and may or may not be inserted in
the model, at the user’s discretion. At first, statistical results do not tend to differ too much, if not it
is just a matter of “hindering” a reversibility of the model and someone arriving at the initial seed
values, if tested for something related to data encryption, for example. In this sense, the scientific
community is invited to provide more studies on this subject.
4rd number
In the fourth generated number and onward we can return to the initial Pn.
P4 = ABS (3,415 + 6,976 + 6,976 – 2,831) = 583
FRNS3 = ABS {10,000 - [583 * SQUARE ROOT(3.9)]} = 8,848
We can say the first four numbers generated by Itamaracá algortihm were: 2,831 - 6,976 – 3,415
and 8,848.
3.1.1 Frequency and Chi-Square Analysis
As we can see in table 2, the numbers are well distributed, since in each class interval there is an
approximate probability of 10% of occurrence, and furthermore, considering the fact that each
number within the interval of N also has an approximate chance of 50%, it is therefore perfectly
normal that there are some random fluctuations both for more and for less on the average expected
probability. In this sense, we can expect in the Itamaracá model, behaviors close to a uniform
distribution in the generation of pseudo random numbers.
Table 2
Showing the results of the Chi-Square Test considering 10,000 numbers generated by Itamaracá
n
Class Interval
Freq.
(x)
Freq.
(%)
Prob.
Prob.*
N
(A-C)^2/C
1
0
1,000
1,018
10.18%
1/10
100.1
0.32
2
1,001
2,000
991
9.91%
1/10
100.1
0.08
3
2,001
3,000
994
9.94%
1/10
100.1
0.04
4
3,001
4,000
985
9.85%
1/10
100.1
0.23
5
4,001
5,000
1,000
10.00%
1/10
100.1
0.00
6
5,001
6,000
1,033
10.33%
1/10
100.1
1.08
7
6,001
7,000
1,008
10.08%
1/10
100.1
0.06
8
7,001
8,000
999
9.99%
1/10
100.1
0.00
9
8,001
9,000
999
9.99%
1/10
100.1
0.00
10
9,001
10,000
974
9.74%
1/10
100.1
0.68
Σ
10,000
100.00%
2.50
In this example, in addition to the analysis of the frequency distribution, the algorithm was also
tested from the point of view of the Chi Square Test.
Considering a 95% confidence level and dividing our N into 10 class intervals, we obtain a degree
of freedom equal to 9. Thus, for the algorithm to be considered valid by the uniformity criterion, the
chi square test value cannot exceed the limit of 16.92.
Again, from Table 2. we can see that the Itamaracá result for 10,000 generated numbers was 2.50.
Therefore, we can conclude that it also tends to pass the test.
3.1.2 Repeated Numbers
In Itamaracá's algorithm it was identified from the 10,000 numbers generated, 3,694 repeated
numbers.
Assuming a pseudo-random number generator has a uniform distribution between 0 and 1 (or, if
you prefer, in whole numbers), then it is expected that each number within a given interval has a
probability of ½, that is, of 50%.
Thus, it is natural that both PRNGs and TRNGs may to some degree have repeated values.
3.1.3 Run Test (Even/Odd)
As a way of testing the Run Test, we will consider the following first 100 random numbers
generated: 2,831 – 6,976 – 3,415 – 8,848 - 6,303 - 4,296 – 1,010 – 453 – 2,412 – 7,232 - 3,386 -
8,076 - 8,333 - 231 - 5,493 - 4,390 - 1,785 - 2,678 - 6,619 - 455 - 5,609 - 8,006 - 4,912 - 8,624 -
8,778 - 2,365 - 2,361 - 2,672 - 9,392 - 3,885 - 7,606 - 6,472 - 4,890 - 4,636- 6,374 - 7,070 - 5,194 -
7,670 - 8,815 - 2,848 - 478 - 6,464 - 2,861 - 5,295 - 7,693 - 457 - 446 - 4,312 - 2,388 - 6,164 - 6,342
- 2,192 - 2,156 - 1,734 - 9,095 - 3,703 - 6,112 - 4,109 - 9,198 - 3,906 - 9,599 - 9,208 - 471 - 8,027 -
7,668 - 4,215 - 2,471 - 264 -2,198 - 9,461 - 8,162 - 1,778 - 5,172 - 4,095 -5,426 - 9,499 - 672 - 613
- 7,548 - 3,579 - 4,141 - 3,271 - 9,391 - 368 - 4,266 - 120 - 9,512 - 359 - 9,528 - 9,967 - 8,975 -
8,907 - 7,906 - 7,889 - 7,990 - 9,834 - 6,158 - 6,383 - 3,184 - 4,126.
In the Run Test a 95% significance level was considered, and with this we obtain its critical value
equal to ±1.96, which means values higher than this both positive and negative reject the hypothesis
the numbers are considered independent.
Run Test considering alternating between even or odd numbers, the following results were arrived
at:
n1 (Odd) = 41
n2 (Even) = 59
Number of runs: 50
Expected number of runs = 49.38
Standard Deviation = 23.15
Ztest = 0.128859
3.1.4 Run Test (Median)
Run Test considering the alternation between numbers greater or less than the median, in this
example, we obtain a result equal to 5,042. Therefore, we can consider that numbers < 5,042 get a
value of “0” and numbers > 5,042 get a value of “1”.
Using the same sequence of the first 100 numbers generated by the Run Test (Even/Odd) the
following results were obtained:
n1 (0) = 50
n2 (1) = 50
Number of runs: 53
Expected number of runs = 51
Standard Deviation = 24.75
Ztest = 0.402015
3.1.5 Autocorrelation
Given a sample of 10,000 generated numbers, we get the following autocorrelation results for up to
10 lags:
Table 3
Autocorrelation observed in Itamaracá through 10,000 generated numbers
k lags
Autocorrelation (-1:+1)
Lag 0
1
Lag 1
-0.011056
Lag 2
0.014225
Lag 3
0.010617
Lag 4
-0.006973
Lag 5
-0.020135
Lag 6
-0.008179
Lag 7
0.114837
Lag 8
-0.016052
Lag 9
-0.009193
Lag 10
-0.011365
As we can see in table 3 above, we note the values of the autocorrelation function are in general
very close to zero, meaning, according to the literature, a high level of independence. Although in
lag 7 we find a slightly higher value than the others, but still when squaring its value, we find a
result of 1.32%, that is, 1.32% is the part explained by trying an exercise of generating new
numbers.
3.1.6 Some considerations about Itamaracá
Itamaracá - considering both its "crude" version and its even more simplified version in which have
the same behavior - in general has proven to be a good random number generator, especially in the
criteria that evaluate independence and uniformity. Its applications at values much higher than those
demonstrated in this paper have also shown similar statistical results to those obtained through this
study. Another point to be highlighted is that it was not observed any rule of choice regarding the
value of the seeds, it is enough that they are chosen arbitrarily with values are within the range from
1 to N where N∈ℕ, their maximum value.
As every pseudo-random number generator, Itamaracá also has some identified limitations. As an
example, at some point probably after a large amount of generated numbers, the repetition of the
same sequence of generated numbers may occur, due to the fact the numbers generated in the
sequence are exactly the same and in the same order as the initial seeds (S2 - S0) in which seeds are
mobile in time. Thus, creating a new cycle of the same numbers generated previously in the period
ended.
3.2 Comparing results between the proposed model with the RandBetween Function by
Microsoft Excel and TRNG by the Random Org platform
Table 4
Comparing the results between Itamaracá, RandBetween and TRNG by Random Org considering
10,000 numbers generated
Itamaracá
RandBetween
Random Org
Chi-Square Test
2.50
8.56
3.65
Repeated numbers/N
3,694
3,653
3,763
Average; Standard Deviation
5,084;2,867
5,005;2,890
4,925;2,905
Run Test (Even/Odd)
0.128859
1.047364
0.004101
Run Test (Median)
0.402015
0.808377
0.603023
Autocorrelation (Average of
the first 10 lags different from
0)
0.002827
-0.006046
0.000980
Shannon Entropy
3.45355
3.45355
3.45284
Results obtained considering the same criteria in Itamaracá, we can observe that all the models have
passed the frequency distribution test and subsequent analysis of the Chi-square Test, standing out
as the lowest value, which is desirable, for Itamaracá.
A second aspect analyzed was with respect to the analysis of repeated numbers in the list given the
generation of numbers equal to the value of N, in this case equal to 10,000 numbers generated. At
this point, we can observe that all three models have similar results, with a slightly higher value
than the others, at first unexpected from the TRNG, but also within normality since all numbers
within the range have the same “chances” of appearing given the characteristic of events that follow
a uniform distribution.
Regarding the analysis of the mean and standard deviation, we can say that considering a uniform
distribution of numbers, the ideal is that they are close to the result of the median, which is equal to
5,042. Thus, we can note all algorithms analyzed are with values very close to the median,
highlighting the result obtained by the function RandBetween. Respective standard deviations are
also within the expected range, especially when compared to TRNG.
In the Run Test result considering both the alternation between odd and even numbers, as well as
the alternation between numbers lower or higher than the median, we see that all models passed the
test, since a significance level of 95% was considered and their respective critical value is equal to
±1.96, that is, values higher than this both positive and negative would reject the hypothesis that the
numbers are independent.
With regard to the values obtained by Autocorrelation, it is also noted that all three models have
values very close to zero. However, standing out as the best result are the values found in the TRNG
by Random Org, since due to its nature, we really expect lower values than those found in a PRNG,
as is the case of Ita and the RandBetween function by Microsoft Excel.
With respect to Shannon Entropy, we can observe that both Ita and RandBetween - both pseudo-
random number generators - have Entropy values very close to the one obtained by TRNG. This is a
positive point for both.
Sometimes, despite being within limits, when comparing both Itamaracá – the proposed model –
with others such as RandBetween and Random Org TRNG, they may present “better” results than
others, which is natural to occur and that changes with each new simulation, with new seeds and
new parameters, in the case of a pseudo-random number generator like Itamaracá and RandBetween
from Microsoft Excel. As for TRNGs like Random Org, with new data retrieval. In addition, the
sample size considered can also impact the results.
Through the results presented above in summary form, we can observe that all of these are in
perfect conformity with the statistical tests analyzed.
3.3 Comparison of the graphical visualization results between the proposed model and the
RandBetween function by Microsoft Excel and TRNG by the Random Org platform
Figure 13. Scatter Plot for Ita.
Figure 14. Scatter Plot for RandBetween.
Figure 15. Scatter Plot for TRNG.
Figure 16. Line Graph for Ita.
Figure 17. Line Graph for RandBetween.
Figure 18. Line Graph for TRNG.
Figure 19. Histogram for Ita.
Figure 20. Histogram for RandBetween.
Figure 21. Histogram for TRNG.
We can see from the images in this section all the models compared have features that make the
random number generators more reliable.
CONCLUSION
The generation of random numbers is too important for several fields of study and practical
applications for the development of mankind. The present study, presented a new and simple
proposal of a Pseudo Random Number Generator (PRNG) called "Itamaracá" (Ita in a abbreviated
form). Ita model, like all PRNG algorithms, has some limitations, but in general, it showed good
results in the statistical tests considered, and thus, as one more model in the portfolio, it is fully
available for use and above all, for new studies, especially those applied to a specific objective and
real problem.
REFERENCES
Ander-Egg, Ezequiel. Introducción a las Técnicas de Investigación Social. [Introduction to Social
Research Techniques]. Buenos Aires: Editorial HVMANITAS. 1971.
Bencardino, C.M. Estadística Básica Aplicada [Basic Applied Statistics]. Bogotá, D.C.: Ecoe
Editions. 2012.
Das, S., Maity, K., Bhattacharjee, K. (2018). A Search for Good Pseudo-random Number Generators:
Survey and Empirical Studies. [Preprint submitted to Elsevier]. Department of Information
Technology, Indian Institute of Engineering Science and Technology.
D. Knuth, The Art of Computer Programming, Vol. 2, SemiNumerical Algorithms. Reading, MA.:
Addison-Wesley, 1969.
Herny Ramadhani Mohd Husny Hamid; Norhaiza Ya Abdullah,N “Physical Authentication Using
Random Number Generated (RNG) Keypad Based on One Time Pad (OTP) Concept”, 2015
Fourth International Conference on Cyber Security, Cyber Warfare, and Digital Forensic
(CyberSec)
Kozlowski, L. In Shannon entropy calculator. www.shannonentropy.netmark.pl
Lacerda, W.S., Freitas, M.E.A., Pereira, A.R., Jr. Geração de Números Aleatórios [Random Number
Generator]. Sinergia, v.3, n.2, p.154-161. 2002.
https://www.repositorio.ufop.br/handle/123456789/1643?locale=pt_BR
Levine, D.M., Stephan, D.F., Krehbiel, T.C., Berenson, M.L. Estatística Teoria e Aplicações usando
o Microsoft Excel em Português [Statistics Theory and Applications using Excel in
Portuguese]. Rio de Janeiro: LTC. 2013.
Randomness. (2021, October 27). In Wikipedia. https://en.wikipedia.org/wiki/Randomness
Rosa, C.A. (2016). Números Aleatórios: Geração, Qualidade e Aplicações [Random Numbers:
Generation, Quality and Applications] Universidade Federal do ABC.
Stevenson, W.J. Estatística Aplicada a Administração [Statistics Applied to Management]. São
Paulo: HARBRA. 1981.
Vieira, C.E.C., Ribeiro, C.C., Souza, R.C. Geradores de Números Aleatórios [Random Number
Generators]. Rio de Janeiro: PUC Rio. 2004.
W. Yue, Y. Zhou, G. Saveriades, S. Agaian, J. P. Noonan, and P. Natarajan, ‘‘Local Shannon entropy
measure with statistical tests for image,’’ Inf. Sci., vol. 222, no. 222, pp. 323–342, Feb. 2013.