Conference PaperPDF Available

Foundational statistical methods in comparative design for simulation experiments

Authors:

Abstract and Figures

This study presents a comprehensive examination of the application of traditional statistical methods to simulation modeling within the hypothetical context of comparing manual and automated production lines in manufacturing. Through a detailed methodology involving the AnyLogic simulation platform and Minitab for statistical analysis, we emphasize the significance of power analysis, two-sample t-tests, and one-way ANOVA in validating and optimizing simulation models. Our hypothetical findings demonstrate the potential of statistical analysis to identify significant efficiency improvements, with a particular focus on the implications of process modifications on automated production lines. The primary contribution of this research lies in illustrating the practical application of statistical tools in simulation studies, serving as a manual for simulation modelers in logistics and manufacturing sectors. By foregrounding the statistical methods over specific operational improvements, this study aims to bridge the gap in literature regarding the integration of foundational statistical analysis within simulation modeling, offering valuable insights for enhancing decision-making and optimization in manufacturing simulations. INTRODUCTION AND HISTORICAL PERSPECTIVE
Content may be subject to copyright.
FOUNDATIONAL STATISTICAL METHODS IN COMPARATIVE DESIGN
FOR SIMULATION EXPERIMENTS
Ahmed Tolba1and Maximilian Selmair2
1”Friedrich List” Faculty of Transport and Traffic Sciences, Technical University of Dresden,
Email: ahmed@tolba.xyz
2Email: maximilian@selmair.de
KEYWORDS
Modeling and Simulation, Experimental design, Com-
parative experiments, Statistical methods, Two-Sample
T-Test, One-Way ANOVA
Abstract
This study presents a comprehensive examination of
the application of traditional statistical methods to sim-
ulation modeling within the hypothetical context of
comparing manual and automated production lines in
manufacturing. Through a detailed methodology in-
volving the AnyLogic simulation platform and Minitab
for statistical analysis, we emphasize the significance
of power analysis, two-sample t-tests, and one-way
ANOVA in validating and optimizing simulation mod-
els. Our hypothetical findings demonstrate the poten-
tial of statistical analysis to identify significant effi-
ciency improvements, with a particular focus on the
implications of process modifications on automated
production lines. The primary contribution of this
research lies in illustrating the practical application
of statistical tools in simulation studies, serving as a
manual for simulation modelers in logistics and man-
ufacturing sectors. By foregrounding the statistical
methods over specific operational improvements, this
study aims to bridge the gap in literature regarding the
integration of foundational statistical analysis within
simulation modeling, offering valuable insights for
enhancing decision-making and optimization in manu-
facturing simulations.
INTRODUCTION AND
HISTORICAL PERSPECTIVE
In the dynamic realms of logistics and manufacturing,
simulation modeling has emerged as a cornerstone
technique, enabling practitioners to navigate complex
systems and optimize operational efficiency. Recent
advancements in these sectors underscore the grow-
ing need for robust statistical methods to validate and
refine simulation models.
However, a significant gap persists in the literature
concerning the direct application of foundational statis-
tical methods, such as hypothesis testing, confidence
intervals, ANOVA, and sample size choice, within
simulation contexts specific to logistics and manufac-
turing. While contemporary research often gravitates
towards the integration of advanced machine learn-
ing methods in simulations, as highlighted in studies
like Comparing statistical and machine learning meth-
ods for time series forecasting in data-driven logistics
(Schmid et al., 2023), or delves into combining op-
timization algorithms with simulation techniques for
complex scenarios, the application of traditional statis-
tical methods remains underexplored.
This paper seeks to bridge this gap, illustrating the
practical application and significance of these tradi-
tional statistical tools in simulation models relevant
to logistics and manufacturing. By linking theoretical
statistical concepts to industry-specific challenges and
applications, this study aims to enhance the toolkit
of simulation modelers, thereby augmenting decision-
making and optimization capabilities in these sectors.
The inclusion of case studies and real-world scenarios
from logistics and manufacturing will further demon-
strate the necessity and impact of robust statistical
analysis in simulations, offering valuable insights and
practical guidance to professionals in this field.
LITERATURE REVIEW
The integration of statistical methods into simulation
modeling for logistics and manufacturing has been
pivotal in advancing operational efficiencies. Yet, the
landscape of academic research reveals a pronounced
tilt towards sophisticated, often machine learning-
based, analytical techniques at the expense of tradi-
tional statistical methods. Notwithstanding the proven
efficacy of machine learning in enhancing simulation
outcomes, this bias overlooks the fundamental impor-
tance and applicability of hypothesis testing, ANOVA,
and confidence intervals tools that offer nuanced in-
sights into model validation and optimization.
Emerging research underscores the promise of
blending advanced algorithms with simulation to
tackle the inherent complexities of modern logistical
and manufacturing systems (Tolk, 2022). However,
this progressive stride inadvertently minimizes the di-
alogue around traditional statistical methods, which
are critical in grounding simulation models in robust
analytical frameworks.
The literature review identifies a notable research
gap: the nuanced application of traditional statistical
tools in enhancing the precision and validity of simu-
lation models within the logistical and manufacturing
domains. By spotlighting this oversight, the review
underscores the imperative to reintegrate foundational
statistical analysis into the discourse, advocating for
a balanced approach that leverages both cutting-edge
and time-honored methodologies to enrich simulation
modeling practices.
BACKGROUND
Sample Size Choice
In the context of simulation modeling, particularly for
logistics and manufacturing, determining the appropri-
ate sample size is crucial for ensuring the accuracy and
reliability of the results. The lack of known population
parameters, such as the standard deviation, necessi-
tates the use of the t-distribution for calculating sample
sizes (Serdar et al., 2021). Theoretically, the correct
sample size helps in accurately capturing the dynamics
of the system being modeled, thus making the simula-
tion outcomes statistically valid and representative of
real-world scenarios (Liu et al., 2014).
Formula and Practical Application with T-Distribution
Given that the population standard deviation is usually
unknown in simulation contexts, the t-distribution is
used for calculating both the sample size and the mar-
gin of error. The formula for determining the sample
size is:
n=tα
2, df ×s
E2
(1)
Here,
n
is the required sample size,
tα
2, df
is the t-
value from the t-distribution for the desired confidence
level and degrees of freedom (
df
),
s
is the estimated
standard deviation from a pilot study or previous data,
and Eis the margin of error (Jones et al., 2003).
The margin of error formula, which uses the t-
distribution, is:
E=tα
2, df ×s
n(2)
This margin of error indicates the maximum accept-
able difference between the sample statistic and the
population parameter, providing a measure of preci-
sion for the simulation results.
Implementing Sample Size Calculation in a Real-
World Scenario
Consider a manufacturing company evaluating the im-
plementation of an automated production line. They
start with a pilot study, simulating a few weeks of oper-
ation to estimate the standard deviation
s
of the produc-
tion throughput. Based on this preliminary data and
the desired precision for their throughput estimates,
they use the above t-distribution formulas to calculate
the required sample size for their full-scale simulation.
This calculation will guide them on the number of
additional production weeks they should simulate to
confidently assess the performance of the automated
line.
This methodical approach ensures that the simula-
tion modeling undertaken by the company is robust,
providing reliable data to inform strategic decisions
about the new production line’s efficiency and effec-
tiveness.
Hypothesis Testing
Hypothesis testing in simulation modeling is a statis-
tical method used to make inferences about the popu-
lation from which the simulation samples are drawn.
It is a critical step in validating the assumptions and
outcomes of the model (Kim, 2015). The process in-
volves setting up a null hypothesis (
H0
), which posits
no effect or no difference, and an alternative hypoth-
esis (
H1
), suggesting the presence of an effect or a
difference. This method is applied in logistics and
manufacturing simulations to assess the efficacy of dif-
ferent operational strategies and the impact of changes
in system parameters (Beal, 1989).
Formulas and Calculation
The process includes formulating hypotheses, select-
ing an appropriate test, and calculating the test statistic.
For a t-test, the test statistic is calculated using the for-
mula:
t=¯xµ0
s/n(3)
where
¯x
is the sample mean,
µ0
is the hypothesized
population mean,
s
is the sample standard deviation,
and
n
is the sample size. The p-value is then deter-
mined to assess the evidence against the null hypothe-
sis.
Implementation in a Real-World Scenario
In practice, a manufacturing company may use hypoth-
esis testing to compare the efficiency of automated
versus manual production lines. After simulating both
scenarios, they would calculate the mean efficiencies,
determine the test statistic, and calculate the p-value.
A p-value less than the significance level, such as 0.05,
would lead them to reject the null hypothesis and con-
sider the automated line more efficient, providing a
statistically sound basis for potential implementation.
Confidence Intervals
Confidence intervals are a critical statistical tool in
simulation modeling, offering a range within which
we can be reasonably sure that a population param-
eter resides within. In logistics and manufacturing
simulations, confidence intervals help modelers gauge
the precision and reliability of estimates like average
production times, system throughput, or demand lev-
els, providing a quantifiable measure of variability and
uncertainty (Althubaiti, 2023).
Formulas and Calculation
To construct a confidence interval around a sample
mean, we follow these steps:
1.
Calculate the sample mean (
¯x
), representing the
average of the simulation outputs.
2.
Estimate the standard error (SE) as
SE =s
n
,
where
s
is the sample standard deviation and
n
is
the sample size.
3.
Determine the margin of error by multiplying the
standard error with the t-value corresponding to
the desired confidence level (tα
2, df ).
4.
The confidence interval is then
¯x±Margin of Error.
Implementation in a Real-World Scenario
Consider a manufacturing company estimating the av-
erage throughput of an automated production line. Af-
ter simulating the line for a set number of weeks, they
calculate the sample mean throughput and standard
error. By referencing the t-distribution table for a 95 %
confidence level, they find the t-value to determine
the margin of error, constructing a confidence interval
that provides insights into the expected range of true
average throughput.
ANALYSIS OF VARIANCE
ANOVA, or Analysis of Variance, is a statistical tech-
nique utilized in simulation modeling to compare
the means of three or more independent groups. In
the realm of logistics and manufacturing simulations,
ANOVA is crucial for evaluating various scenarios,
strategies, or configurations, determining whether dif-
ferences in simulation parameters result in significant
outcome disparities in aspects such as production ef-
ficiency, cost, or time. Essentially, ANOVA assesses
if group means originate from the same population or
from different ones (Corty and Corty, 2011).
Formulas and Calculation
The ANOVA process includes:
1. Calculating group and overall means.
2.
Computing the sum of squares for between-group
variability (SSB) and within-group variability
(SSW) using:
SSB =
k
X
i=1
nixi¯x)2
SSW =
k
X
i=1
ni
X
j=1
(xij ¯xi)2
where
ni
is the number of observations in group
i
,
¯xi
is the mean of group
i
,
xij
is the
jth
obser-
vation in group i, and ¯xis the overall mean.
3. Calculating the F-statistic:
F=MSB
MSW
with mean squares calculated by:
MSB =SSB
k1
MSW =SSW
Nk
where
k
is the number of groups and
N
is the
total number of observations.
4.
Comparing the F-statistic to the critical F-value
to determine significance.
Implementation in a Real-World Scenario
In practice, a manufacturing company could apply
ANOVA to compare the efficiencies of various produc-
tion line configurations. By calculating the F-statistic
and referencing the F-distribution, they can discern
whether efficiency differences are statistically signifi-
cant, thereby informing decisions on the most optimal
configuration for their production line.
METHODOLOGY
This study sets out to compare the operational effi-
ciency of a proposed automated manufacturing line
against the existing manual production line, utilizing a
simulation-based approach for initial assessment and
statistical analysis for validation.
Development of the Simulation Model
The simulation model, constructed within the Any-
Logic environment, draws inspiration from the engi-
neering design specifications of the proposed auto-
mated line. The primary focus of this model is to
simulate the weekly output, providing a quantitative
basis for comparison against the manual line’s exist-
ing performance data, which is presumed accurate and
comprehensive.
Statistical Analysis of Simulation Outputs
Upon completion of the simulation runs, the resul-
tant data was systematically prepared for analysis in
Minitab. This preparation phase ensured data integrity
and compatibility for subsequent statistical evalua-
tions.
Comparative Analysis Using Two-Sample T-Test
Central to our investigation is the determination of
whether the automated line demonstrates a statistically
significant improvement in throughput. This inquiry
necessitated a two-sample t-test (Kim and Park, 2019),
preceded by a meticulous calculation to establish an
appropriate sample size that guarantees a 95 % confi-
dence level (Lakens, 2022). Minitab’s robust analytical
capabilities facilitated this process, enabling a thor-
ough examination and interpretation of the simulation
outputs.
The specific steps undertaken in Minitab included:
1.
Power and sample size calculation for the two-
sample t-test, considering a significant difference
of 100 units and a power value of 0.95, with the
pooled standard deviation used as an estimate for
variability.
2.
Execution of the two-sample t-test with an alter-
native hypothesis that the means are not equal,
maintaining a 95 % confidence level throughout
the testing process.
Evaluation of Process Modifications
A year following the hypothetical implementation
of the automated line, deemed superior in terms of
throughput, the study extends to assess the impact of
three specific modifications introduced to one of its
processes. Leveraging actual performance data from
the operational automated line, the study conducted a
one-way ANOVA test in Minitab. The aim was to as-
certain the statistical significance of each modification
on the line’s efficiency. The process was consistent
with the initial comparative analysis, involving critical
sample size determination to ensure the reliability of
the ANOVA test results (Hasan et al., 2020).
The sequential steps in Minitab for this phase were:
1.
Power and sample size calculation for one-way
ANOVA, accounting for four levels of the factor
under consideration and using the standard devia-
tion of the automated line, with a 95 % confidence
level.
2.
Performance of one-way ANOVA under the as-
sumption of equal variances across all groups
and a two-sided 95 % confidence interval (Chan-
drakantha, 2014).
3.
Implementation of Tukey’s post-hoc test in cases
of significant findings, to make pairwise compar-
isons between the groups at a 95 % confidence
level (Midway et al., 2020).
The methodology outlined herein provides a com-
prehensive framework for not only comparing the man-
ual and automated production lines through simulation
and statistical analysis but also for evaluating the ef-
fectiveness of subsequent process improvements in
the automated line. By detailing the sequential steps
employed in both AnyLogic and Minitab, this section
aims to offer a clear and structured approach to under-
taking such comparative analyses.
RESULTS
The analysis performed aimed to rigorously evaluate
the efficiency differences between a manual produc-
tion line and an automated line, as well as to assess the
impact of specific process modifications to the auto-
mated line. This section presents the findings from the
power and sample size calculations, the two-sample
t-test, the one-way ANOVA, and the post-hoc analysis.
Determination of Sample Size
for Two-Sample T-Test
The power analysis for the two-sample t-test aimed
to determine the appropriate sample size needed to
detect a specified difference between the manual
and automated production lines with a high degree
of confidence. The targeted difference was set at
100 units, considered the minimum operationally sig-
nificant change to justify the transition to an automated
line. Under the assumptions of a 0.05 significance
level (
α
) and an estimated pooled standard deviation
of 90.55 units, the power analysis indicated that a sam-
ple size of 23 units per group would achieve an actual
power of approximately 0.956. This power exceeds the
target power of 0.95, suggesting a high probability of
correctly identifying a true difference of the specified
magnitude should it exist.
Figure 1: Power Curve for 2-Sample t-Test
Figure 1 illustrates the power curve for our two-
sample t-test, with the x-axis representing the differ-
ence in means between the two production lines and
the y-axis depicting the power of the test, or the proba-
bility of detecting a true difference. The curve peaks at
the specified difference of 100 units, aligning with the
calculated sample size and confirming the robustness
of the test under our study parameters.
Comparative Analysis Using Two-Sample T-Test
The two-sample t-test was conducted to compare the
mean weekly outputs of the manual and automated
production lines. Descriptive statistics are presented in
Table 1, highlighting the sample size, mean, standard
deviation, and standard error of the mean for both
production types.
Table 1: Descriptive Statistics for Manual and Auto-
mated Production Lines
Sample N Mean StDev SE Mean
manual 25 1017.6 72.6 15
auto 25 1216.1 71.6 14
The estimation for the difference between the two
lines is summarized in Table 2. The confidence interval
for the difference between the means does not include
zero, suggesting a statistically significant difference.
Table 2: 95 % Confidence Interval for the Difference
in Means
Difference 95 % CI for Difference
-198.5 (-239.5, -157.5)
The null hypothesis for the t-test, stating that there
is no difference between the manual and automated
production lines, was rejected with a t-value of -9.73
and a p-value of less than 0.001, as shown in Table 3.
This indicates a highly significant difference in mean
output, favoring the automated line.
Table 3: T-Test Results for Manual vs. Automated
Production Lines
Null Hypothesis T-Value P-Value
µ1µ2= 0 -9.73 ¡0.001
Figure 2 displays the boxplot for the weekly output
of the manual and automated production lines, visually
representing the central tendency and dispersion of the
data. The boxplot illustrates that the automated line
not only has a higher median output but also shows a
comparatively similar range of variability to the man-
ual line, reinforcing the results of the t-test.
Figure 2: Boxplot of Weekly Output for Manual and
Automated Production Lines
In summary, the statistical analysis unequivocally in-
dicates that the automated production line outperforms
the manual line in terms of mean weekly output.
Determination of Sample Size for
One-Way ANOVA
To ensure sufficient power for detecting significant
differences between the four levels of the automated
production line, including the baseline and three sub-
sequent adjustments, a power analysis was performed.
The target was to detect a maximum difference of
50 units, with an assumption of an 80 unit standard
deviation, a common level of variability in similar
manufacturing processes.
Figure 3: Power Curve for One-Way ANOVA
Figure 3 displays the power curve generated from
the analysis, with the x-axis indicating the maximum
difference in means being tested, and the y-axis rep-
resenting the power, or the probability of detecting a
difference if one truly exists. The curve reaches the
desired power level at the 50 unit difference mark, con-
firming the adequacy of the calculated sample size.
Based on the analysis, a sample size of 89 for each of
the four levels yields an actual power slightly above
the target, at approximately 0.950247, indicating a
high likelihood of detecting the stipulated difference
in means among the group levels with the chosen sig-
nificance level (α) of 0.05.
Evaluation of Process Modifications Using
One-Way ANOVA
The one-way ANOVA was employed to evaluate the
impact of three process modifications on the automated
production line. The analysis was predicated on the
assumption of equal variances across all groups, with
the significance level set at α= 0.05.
Table 4: Factor Information for One-Way ANOVA
Factor Levels Values
Factor 4 auto,adj1,adj2,adj 3
The ANOVA results, summarized in Table 5, re-
vealed that the factor consisting of the different pro-
duction line configurations, including the baseline and
adjustments, contributed significantly to the variability
Table 5: Analysis of Variance for Production Line
Configurations
Source DF adj SS adj MS F-Value P-Value
Factor 3 923350 307783 44.04 ¡0.001
Error 356 2488238 6989
Total 359 3411588
in production output, as indicated by an F-value of
44.04 and a p-value less than 0.001.
Descriptive statistics for each level are presented in
Table 6, where the pooled standard deviation was cal-
culated to be 83.6028 units. Notably,
adj 3
exhibited a
significantly higher mean production output compared
to the other configurations.
Table 6: Descriptive Statistics and Confidence Inter-
vals for Production Line Configurations
Factor N Mean StDev 95 % CI
auto 90 1199.11 84.36 (1181.78, 1216.44)
adj 190 1197.09 72.69 (1179.76, 1214.42)
adj 290 1191.81 85.04 (1174.48, 1209.14)
adj 390 1312.80 91.24 (1295.47, 1330.13)
The interval plot in Figure 4 visualizes the 95 %
confidence intervals for the mean production output
of each group. The plot underscores the distinction of
adj 3
from the other configurations, aligning with the
ANOVA results and confirming a statistically signifi-
cant difference.
Figure 4: Interval Plot of 95 % Confidence Intervals
for Mean Production Output
These findings corroborate the hypothesis that not
all production line configurations yield equivalent out-
puts, with
adj 3
demonstrating a notably improved
performance.
Post-Hoc Comparisons Following ANOVA
Following a significant one-way ANOVA result,
Tukey’s post-hoc test was conducted to compare the
mean outputs of each adjustment to the automated line.
This test identifies which pairs of group means are
significantly different at the 95 % confidence level.
Figure 5: Tukey Simultaneous 95 % Confidence Inter-
vals for Differences of Means
Figure 5 illustrates the 95 % confidence intervals for
the differences in means between the groups. Intervals
that do not cross the zero difference line indicate a
significant difference between the group means. As
shown, the interval for
adj 3auto
,
adj 3adj1
, and
adj 3adj2
do not include zero, signifying that
adj 3
has a significantly higher mean output compared to the
auto baseline and the other adjustments.
The grouping information (Table 7) provides a clear
distinction where
adj 3
stands alone in Group A, indi-
cating its superior performance, and all other condi-
tions (
auto
,
adj 1
, and
adj 2
) are grouped in B, having
no significant difference among them.
The
adj 3
condition significantly outperformed all
other conditions, while there was no significant dif-
ference in output between the
auto
,
adj 1
, and
adj 2
conditions.
Implications of Groupings The distinct separation
of
adj 3
into Group A has profound operational im-
plications. It suggests that the modifications under-
lying
adj 3
are not only statistically significant but
could also lead to considerable enhancements in pro-
duction efficiency. Stakeholders should consider
adj 3
for potential scalability and further optimization of
manufacturing processes. It is essential to analyze
the cost-benefit ratio of this adjustment to ensure that
the implementation is economically viable and brings
tangible improvements to production capacity.
Future Directions While
adj 3
demonstrates a clear
benefit, the lack of significant differences among
auto
,
adj 1
, and
adj 2
provides an opportunity to explore
other potential benefits these adjustments may offer,
such as reduced maintenance costs or improved prod-
uct quality. Further research could also explore com-
bining aspects of the various adjustments to determine
if a synergistic effect could yield even greater improve-
ments.
Table 7: Grouping Information
Using the Tukey Method
Factor N Mean Grouping
adj390 1312.80 A
auto 90 1199.11 B
adj190 1197.09 B
adj290 1191.81 B
Table 8: Tukey Simultaneous Tests for Differences of Means
Difference of
Levels
Difference of
Means
SE of
Difference
95 % CI T-Value
Adjusted
P-Value
adj1auto -2.0 12.5 (-34.0, 30.0) -0.16 0.998
adj2auto -7.3 12.5 (-39.3, 24.7) -0.59 0.936
adj3auto 113.7 12.5 (81.7, 145.7) 9.12 0.000
adj2adj1-5.3 12.5 (-37.3, 26.7) -0.42 0.974
f
adj3adj1115.7 12.5 (83.7, 147.7) 9.28 0.000
adj3adj2121.0 12.5 (89.0, 153.0) 9.71 0.000
Discussion
This study’s findings offer compelling evidence of
the substantial efficiency gains achievable through the
transition from manual to automated production lines
in the manufacturing sector. The statistical analyses,
grounded in rigorous power and sample size calcula-
tions, highlight the automated line’s superior perfor-
mance, with a significant difference in mean weekly
outputs compared to the manual line. Such quantifiable
benefits underscore the value of investing in automa-
tion technologies, aligning with the broader industry
trend towards digitalization and smart manufacturing.
The application of a two-sample t-test provided a
robust framework for evaluating these differences, re-
inforcing the importance of statistical methods in vali-
dating simulation results. The significant p-value ob-
tained from the t-test not only confirms the efficiency
advantage of the automated line but also emphasizes
the necessity for careful statistical planning and analy-
sis in simulation studies, particularly when assessing
operational interventions in manufacturing.
Furthermore, the exploration of process modifica-
tions through one-way ANOVA and Tukey’s post-hoc
analysis sheds light on the potential for incremental im-
provements within automated systems. The standout
performance of
adj 3
demonstrates that specific adjust-
ments can lead to marked efficiency improvements.
This finding is pivotal for manufacturing engineers
and managers, suggesting that ongoing optimization
of automated lines can yield substantial operational
benefits beyond the initial automation gains.
However, the lack of significant differences among
the other adjustments (
auto
,
adj 1
, and
adj 2
) suggests
a nuanced landscape of process optimization, where
not all modifications lead to measurable improvements
in output. This observation underscores the complexity
of manufacturing systems and the need for a methodi-
cal approach to process enhancement, where statistical
analysis plays a crucial role in discerning the impact
of various changes.
Conclusion
In conclusion, this study provides a methodical and
statistically sound investigation into the efficiency im-
provements offered by automated manufacturing lines
over traditional manual lines. Through detailed simu-
lation modeling and rigorous statistical analysis, we
have demonstrated that automation significantly en-
hances production efficiency. Moreover, the analysis
of process modifications revealed that targeted inter-
ventions (
adj 3
) could further optimize performance,
offering a roadmap for continuous improvement in
manufacturing operations.
The findings reinforce the importance of embracing
statistical methodologies in the evaluation of manufac-
turing processes, ensuring that decisions are informed
by robust data analysis. As the industry moves further
into the realm of Industry 4.0, the insights garnered
from this study serve as a testament to the transfor-
mative potential of automation and the critical role of
statistical analysis in underpinning effective decision-
making and operational excellence.
Future research should broaden the application of
these statistical methods beyond specific case studies,
exploring their potential in a wide array of simula-
tion scenarios across various sectors. This approach
not only enhances the utility of traditional statistical
analysis in simulation modeling but also encourages a
more holistic understanding of its impact on decision-
making processes. By prioritizing the exploration of
statistical methodologies in simulations, future stud-
ies can contribute to advancing the field in the era of
digital transformation and data-driven insights.
References
Alaa Althubaiti. Sample size determination: A prac-
tical guide for health researchers. Journal of
general and family medicine, 24(2):72–78, 2023.
doi: 10.1002/jgf2.600.
S. L. Beal. Sample size determination for confidence
intervals on the population mean and on the
difference between two population means. Bio-
metrics, 45(3):969, 1989. ISSN 0006341X. doi:
10.2307/2531696.
Leslie Chandrakantha. Learning anova concepts us-
ing simulation. In Proceedings of the 2014
Zone 1 Conference of the American Society
for Engineering Education, pages 1–5. IEEE,
2014. ISBN 978-1-4799-5233-5. doi: 10.1109/
ASEEZone1.2014.6820644.
Eric W. Corty and Robert W. Corty. Setting sample
size to ensure narrow confidence intervals for
precise estimation of population values. Nursing
research, 60(2):148–153, 2011. doi: 10.1097/
NNR.0b013e318209785a.
Imran Hasan, Esmaeil Bahalkeh, and Yuehwern
Yih. Evaluating intensive care unit admis-
sion and discharge policies using a discrete
event simulation model. SIMULATION, 96
(6):501–518, 2020. ISSN 0037-5497. doi:
10.1177/0037549720914749.
S. R. Jones, S. Carley, and M. Harrison. An intro-
duction to power and sample size estimation.
Emergency medicine journal : EMJ, 20(5):453–
458, 2003. doi: 10.1136/emj.20.5.453.
Tae Kyun Kim. T test as a parametric statistic. Korean
journal of anesthesiology, 68(6):540–546, 2015.
ISSN 2005-6419. doi: 10.4097/kjae.2015.68.6.
540.
Tae Kyun Kim and Jae Hong Park. More about
the basic assumptions of t-test: normality and
sample size. Korean journal of anesthesiology,
72(4):331–335, 2019. ISSN 2005-6419. doi:
10.4097/kja.d.18.00292.
Dani
¨
el Lakens. Sample size justification. Collabra:
Psychology, 8(1), 2022. doi: 10.1525/collabra.
33267.
Xiaofeng Steven Liu, Brandon Loudermilk, and
Thomas Simpson. Introduction to sample size
choice for confidence intervals based on t statis-
tics. Measurement in Physical Education and
Exercise Science, 18(2):91–100, 2014. ISSN
1091-367X. doi: 10.1080/1091367X.2013.
864657.
Stephen Midway, Matthew Robertson, Shane Flinn,
and Michael Kaller. Comparing multiple com-
parisons: practical guidance for choosing the
best multiple comparisons test. PeerJ, 8:e10387,
2020. ISSN 2167-8359. doi: 10.7717/peerj.
10387.
Lena Schmid, Moritz Roidl, and Markus Pauly. Com-
paring statistical and machine learning methods
for time series forecasting in data-driven logis-
tics a simulation study, 2023.
Ceyhan Ceran Serdar, Murat Cihan, Do
˘
gan Y
¨
ucel,
and Muhittin A. Serdar. Sample size, power and
effect size revisited: simplified and practical ap-
proaches in pre-clinical, clinical and laboratory
studies. Biochemia medica, 31(1):010502, 2021.
doi: 10.11613/BM.2021.010502.
Andreas Tolk. Simulation-based optimization: Impli-
cations of complex adaptive systems and deep
uncertainty. Information, 13(10):469, 2022. doi:
10.3390/info13100469.
AUTHOR BIOGRAPHIES
AHMED TOLBA is a dynamic
Simulation and Mathematical
Modeling professional with a
rich background in Statistical
Data Analysis and Process Im-
provement. He has a proven
track record in simulating ma-
terial flow in Tesla Gigafactory
Berlin-Brandenburg, leveraging
tools like AnyLogic, Minitab, and Tableau. His career
journey includes impactful roles, such as a Planning &
Inventory Analyst at Brimore Cairo, where he signifi-
cantly optimized operating profit. With an academic
foundation in Industrial Engineering from Egypt-Japan
University of Science & Technology and ongoing ad-
vanced studies in Transportation Economics at TU
Dresden, Ahmed embodies a blend of technical exper-
tise and strategic acumen, poised to address complex
challenges in logistics and manufacturing sectors. His
e-mail address is: ahmed@tolba.xyz
MAXIMILIAN SELMAIR
possesses profound expertise
in Discrete Event Simulation
and Material Flow Optimization,
with a career path character-
ized by innovation and problem-
solving projects. His journey in-
cludes accompanying the ramp-
up of Tesla’s Gigafactory in
Berlin from its inception to its final stage of manu-
facturing 6.000 cars per day. As a Senior Simulation
Engineer within Tesla, he led multifaceted projects,
utilizing AnyLogic simulation software to dynami-
cally analyze and optimize production systems. This
experience underscores Maximilian’s dedication to ad-
vancing industrial engineering through both research
and practical implementation, a commitment he now
extends as a freelancing partner across various busi-
ness fields and companys. His e-mail address is:
maximilian@selmair.de
and his web page can
be found at maximilian.selmair.de
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Although sample size calculations play an essential role in health research, published research often fails to report sample size selection. This study aims to explain the importance of sample size calculation and to provide considerations for determining sample size in a simplified manner. Approaches to sample size calculation according to study design are presented with examples in health research. For sample size estimation, researchers need to (1) provide information regarding the statistical analysis to be applied, (2) determine acceptable precision levels, (3) decide on study power, (4) specify the confidence level, and (5) determine the magnitude of practical significance differences (effect size). Most importantly, research team members need to engage in an open and realistic dialog on the appropriateness of the calculated sample size for the research question(s), available data records, research timeline, and cost. This study aims to further inform researchers and health practitioners interested in quantitative research, so as to improve their knowledge of sample size calculation. This review aims to explain the importance of sample size calculation and to provide considerations for determining sample size in a simplified manner. Five elements essential in sample size calculation are discussed, and practical examples are used. This study aims to further inform researchers and health practitioners interested in quantitative research, so as to improve their knowledge of sample size calculation.
Article
Full-text available
Within the modeling and simulation community, simulation-based optimization has often been successfully used to improve productivity and business processes. However, the increased importance of using simulation to better understand complex adaptive systems and address operations research questions characterized by deep uncertainty, such as the need for policy support within socio-technical systems, leads to the necessity to revisit the way simulation can be applied in this new area. Similar observations can be made for complex adaptive systems that constantly change their behavior, which is reflected in a continually changing solution space. Deep uncertainty describes problems with inadequate or incomplete information about the system and the outcomes of interest. Complex adaptive systems under deep uncertainty must integrate the search for robust solutions by conducting exploratory modeling and analysis. This article visits both domains, shows what the new challenges are, and provides a framework to apply methods from operational research and complexity science to address them. With such extensions, simulation-based approaches will be able to support these new areas as well, although optimal solutions may no longer be obtainable. Instead, robust and sufficient solutions will become the objective of optimization processes.
Article
Full-text available
An important step when designing an empirical study is to justify the sample size that will be collected. The key aim of a sample size justification for such studies is to explain how the collected data is expected to provide valuable information given the inferential goals of the researcher. In this overview article six approaches are discussed to justify the sample size in a quantitative empirical study: 1) collecting data from (almost) the entire population, 2) choosing a sample size based on resource constraints, 3) performing an a-priori power analysis, 4) planning for a desired accuracy, 5) using heuristics, or 6) explicitly acknowledging the absence of a justification. An important question to consider when justifying sample sizes is which effect sizes are deemed interesting, and the extent to which the data that is collected informs inferences about these effect sizes. Depending on the sample size justification chosen, researchers could consider 1) what the smallest effect size of interest is, 2) which minimal effect size will be statistically significant, 3) which effect sizes they expect (and what they base these expectations on), 4) which effect sizes would be rejected based on a confidence interval around the effect size, 5) which ranges of effects a study has sufficient power to detect based on a sensitivity power analysis, and 6) which effect sizes are expected in a specific research area. Researchers can use the guidelines presented in this article, for example by using the interactive form in the accompanying online Shiny app, to improve their sample size justification, and hopefully, align the informational value of a study with their inferential goals.
Article
Full-text available
Calculating the sample size in scientific studies is one of the critical issues as regards the scientific contribution of the study. The sample size critically affects the hypothesis and the study design, and there is no straightforward way of calculating the effective sample size for reaching an accurate conclusion. Use of a statistically incorrect sample size may lead to inadequate results in both clinical and laboratory studies as well as resulting in time loss, cost, and ethical problems. This review holds two main aims. The first aim is to explain the importance of sample size and its relationship to effect size (ES) and statistical significance. The second aim is to assist researchers planning to perform sample size estimations by suggesting and elucidating available alternative software, guidelines and references that will serve different scientific purposes.
Article
Full-text available
Multiple comparisons tests (MCTs) include the statistical tests used to compare groups (treatments) often following a significant effect reported in one of many types of linear models. Due to a variety of data and statistical considerations, several dozen MCTs have been developed over the decades, with tests ranging from very similar to each other to very different from each other. Many scientific disciplines use MCTs, including >40,000 reports of their use in ecological journals in the last 60 years. Despite the ubiquity and utility of MCTs, several issues remain in terms of their correct use and reporting. In this study, we evaluated 17 different MCTs. We first reviewed the published literature for recommendations on their correct use. Second, we created a simulation that evaluated the performance of nine common MCTs. The tests examined in the simulation were those that often overlapped in usage, meaning the selection of the test based on fit to the data is not unique and that the simulations could inform the selection of one or more tests when a researcher has choices. Based on the literature review and recommendations: planned comparisons are overwhelmingly recommended over unplanned comparisons, for planned non-parametric comparisons the Mann-Whitney-Wilcoxon U test is recommended, Scheffé’s S test is recommended for any linear combination of (unplanned) means, Tukey’s HSD and the Bonferroni or the Dunn-Sidak tests are recommended for pairwise comparisons of groups, and that many other tests exist for particular types of data. All code and data used to generate this paper are available at: https://github.com/stevemidway/MultipleComparisons .
Article
Full-text available
Most parametric tests start with the basic assumption on the distribution of populations. The conditions required to conduct the t-test include the measured values in ratio scale or interval scale, simple random extraction, normal distribution of data, appropriate sample size, and homogeneity of variance. The normality test is a kind of hypothesis test which has Type I and II errors, similar to the other hypothesis tests. It means that the sample size must influence the power of the normality test and its reliability. It is hard to find an established sample size for satisfying the power of the normality test. In the current article, the relationships between normality, power, and sample size were discussed. As the sample size decreased in the normality test, sufficient power was not guaranteed even with the same significance level. In the independent t-test, the change in power according to sample size and sample size ratio between groups was observed. When the sample size of one group was fixed and that of another group increased, power increased to some extent. However, it was not more efficient than increasing the sample sizes of both groups equally. To ensure the power in the normality test, sufficient sample size is required. The power is maximized when the sample size ratio between two groups is 1 : 1.
Article
Full-text available
In statistic tests, the probability distribution of the statistics is important. When samples are drawn from population N (µ, σ2) with a sample size of n, the distribution of the sample mean X̄ should be a normal distribution N (µ, σ2/n). Under the null hypothesis µ = µ0, the distribution of statistics z=X¯-µ0σ/n should be standardized as a normal distribution. When the variance of the population is not known, replacement with the sample variance s2 is possible. In this case, the statistics X¯-µ0s/n follows a t distribution (n-1 degrees of freedom). An independent-group t test can be carried out for a comparison of means between two independent groups, with a paired t test for paired data. As the t test is a parametric test, samples should meet certain preconditions, such as normality, equal variances and independence.
Article
The efficient utilization and management of a scarce resource such as the intensive care unit (ICU) is critical to the smooth functioning of a hospital. This study investigates the impact of a set of operational policies on ICU behavior and performance. Specifically, the implemented policies are (a) wait time thresholds on how long patients can wait for an ICU bed, (b) the time windows during which patient discharges and transfers take place, and (c) different patient mix combinations. The average waiting time of patients for ICU beds and the admission ratio, the ratio of admitted patients to total ICU bed requests, are the performance measures under consideration. Using discrete event simulation, followed by analysis of variance and post hoc tests (Tukey multiple comparison), it is shown that increasing discharge windows has a statistically significant impact on the total number of admissions and average patient wait times. Moreover, average waiting time increased when wait time thresholds increased, especially when the number of emergency surgeries in the mix increased. In addition, larger proportions of elective surgery patients in the patient mix population can lead to significantly reduced ICU performance.
Conference Paper
Analysis of Variance (ANOVA) is an important topic in introductory statistics. Many students struggle to understand the ANOVA concepts. Statistical concepts are important in engineering education. In this paper, we describe how to use simulation with Excel Data Tables and standard functions to perform one-way ANOVA. We calculate different values of the F-statistic by resampling from the original sample and compute the p-value of the test. Using this approach, students will be able to get a better feel about the p-value concept. Our preliminary assessment shows that student learning is enhanced by incorporating this approach in the classroom.
Article
Sample sizes set on the basis of desired power and expected effect size are often too small to yield a confidence interval narrow enough to provide a precise estimate of a population value. Formulae are presented to achieve a confidence interval of desired width for four common statistical tests: finding the population value of a correlation coefficient (Pearson r), the mean difference between two populations (independent- and dependent-samples t tests), and the difference between proportions for two populations (chi-square for contingency tables). Use of the formulae is discussed in the context of the two goals of research: (a) determining whether an effect exists and (b) determining how large the effect is. In addition, calculating the sample size needed to find a confidence interval that captures the smallest benefit of clinical importance is addressed.