Conference PaperPDF Available

Estimating Time to Completion: Uncertain Processing Times in Games


Abstract and Figures

In this article we propose a method that enables computer programmers to generate an estimate of the time remaining in tasks that have uncertain or theoretically unpredictable processing times. The method uses the real-time mean – given the time presently elapsed and the number of task components completed thus far - in order to predict or estimate the amount of time remaining for the overall task. Experimental results based on a board game task in two categories suggest that the estimate can be considered reliable (i.e. within a 10% margin of error) not very long after the task has begun; in general, after approximately just 20% of it has been completed. The estimated time to completion is particularly meaningful to end-users who are otherwise just left waiting, or having to leave and return many times to check if the task has completed. The method, in principle, can be applied also to different tasks with uncertain processing times, game-related or otherwise.
Content may be subject to copyright.
Estimating Time to Completion
Uncertain Processing Times in Games
Azlan Iqbal
College of Information Technology
Universiti Tenaga Nasional
Selangor, Malaysia
Abstract—In this article we propose a method that enables
computer programmers to generate an estimate of the time
remaining in tasks that have uncertain or theoretically
unpredictable processing times. The method uses the real-time
mean – given the time presently elapsed and the number of task
components completed thus far - in order to predict or estimate
the amount of time remaining for the overall task. Experimental
results based on a board game task in two categories suggest that
the estimate can be considered reliable (i.e. within a 10% margin
of error) not very long after the task has begun; in general, after
approximately just 20% of it has been completed. The estimated
time to completion is particularly meaningful to end-users who
are otherwise just left waiting, or having to leave and return
many times to check if the task has completed. The method, in
principle, can be applied also to different tasks with uncertain
processing times, game-related or otherwise.
Keywords-time; estimate; task; programming; game
In computer gaming and also in other applications, we
sometimes find ourselves having to wait for the computer to
complete performing a particular task without knowing how
much longer it will take. This is often an unnecessary
inconvenience due to the programmers not having incorporated
any time estimating feature. The user typically has to dedicate a
machine to the application or check back with it from time to
time (based on experience) until the task is completed. For
example, in a board game the computer may be programmed to
examine the ‘game tree’ to a particular depth before making its
move [1]. Since the number of nodes or positions can be
different at each depth level and furthermore may entail the use
of different heuristics for each position – not to mention the
processing power available on that specific machine – the
amount of time required for the computer to perform a
satisfactory move is difficult to predict [2].
Then problem may be compounded when the computer
needs to say, analyze an entire database of games, e.g. for the
purpose of assessing their ‘correctness’, or generating
automatic commentary. This is because each game likely
contains a different number of moves and may be more or less
complex than others. A third example is when we install a
game, computer application or operating system. The estimate
provided of the amount of time remaining to completion is
usually vague and likely based on a generic software/hardware
setup, or ‘ideal’ installation conditions.
In this article, we propose a straightforward and intuitive
method of estimating the time remaining in tasks of such nature
by evaluating the average time consumption of the task’s
discrete components. Section II explains the method in detail.
Section III presents the experimental setup and results based on
a game task test case. Section IV discusses the results. Section
V concludes with a summary and direction for future work.
This method works best with computational tasks that have
discrete components whose completion times can be easily
measured. For example, if a database of games is to be
analyzed or processed, the individual games can serve as the
discrete components. This means that the moment the first
game is analyzed we can determine the amount of time it took,
and so forth for the second and third games. If a single game
itself is the task, the discrete components may be the moves by
each side. In principle, after the second discrete component has
completed, we can easily calculate the average time required
for a component. As more components are finished, the
average becomes more stable or reliable. This average can be
multiplied by the number of remaining discrete components to
give an estimate of how much time is left.
Equation (1) shows the basic formula; etc denotes the real-
time ‘estimated time to completion’ or remaining time, et
denotes the presently ‘elapsed time’, nc denotes the number of
discrete components completed thus far, and nt the number of
discrete components in total.
etc = (et ÷ nc) × (nt nc) (1)
To illustrate this further, let us say that we are analyzing a
database of 500 games. The first game (i.e. a discrete
component of the overall task in this case) takes 10 seconds to
analyze. The second game takes 15 seconds to analyze. The
average analysis time for a game is therefore (10 + 15) ÷ 2 or
12.5 seconds. The number of remaining discrete components is
500 2 = 498. So the remaining time = 12.5 × 498 seconds or
1 hour 43 minutes and 45 seconds. If the third game took 9
seconds to analyze, the average is brought down to (10 + 15 +
9) ÷ 3 or 11.3 seconds. So the estimated remaining time is now
This research, as a component of two larger projects, is sponsored in par
y the Ministry of Science, Technology and Innovation (MOSTI) in Malaysi
under their eScienceFund research grant (01-02-03-SF0188), and the Ministry
of Higher Education (MOHE) in Malaysia under their Fundamental Research
Grant Scheme (FRGS/1/10/TK/UNITEN/02/2).
© © 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works,
for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
reduced to 11.3 × 498 seconds or 1 hour, 33 minutes and 47
seconds. This information is therefore constantly changing;
rapidly at first but stabilizing over time. Fig. 1 shows the
pseudo-code for the estimated time to completion (ETC).
Figure 1. ETC function pseudo-code.
The question of interest here is: at which point or after how
long can we assume the estimated time to completion for the
overall task is reliable, i.e. within an acceptable margin of
error? We may be tempted to think – without the benefit of
experimental evidence – that halfway or 50% through the total
number of discrete task components we should have a reliable
estimate. While this is a reasonable guess and probably true, it
may be more than is necessary.
In order to test this method, we used the task of evaluating
the aesthetics of a database of mate-in-3 chess combinations. 1
Iqbal’s model of aesthetics bases its assessment on 17 aesthetic
features, only three of which occur in all combinations to some
degree [3-5]. Each aesthetic feature (e.g. violation of heuristics,
economy, sparsity) has its own unique evaluation function. A
feature, if detected in a move of the combination, is assessed
based on its corresponding aesthetic function. The overall
aesthetic assessment of a mate-in-3 combination consists of 31
points of evaluation, e.g. the pin and fork are each evaluated in
all three moves whereas violation of heuristics only in the first
move. The result is presented as a summative score of all the
evaluation points. The speed of evaluation for a combination is
therefore dependent on factors such as the presence of features,
complexity of the position at each move, and processing power
of the computer. Further details of the aesthetics model are not
relevant to the time estimation method proposed but interested
readers may refer to aforementioned references.
The overall task is therefore the complete aesthetic
assessment of the database, and the task components the
assessment of each combination within the database. In order
to test the method, two types of databases (four in each) and
three different computers were used. The databases were of
compositions by human composers (COMP) and combinations
taken from tournament games between humans (TG). Each
type had databases of size 10, 100, 1,000 and 10,000. The
computers used included the following specifications: Intel®
Core™ 2 Duo CPU E4600 @ 2.40GHz; Pentium® Dual-Core
1 A ‘combination’ refers to a move sequence taken from a real game, or a
composition and its solution.
CPU E5200 @ 2.50GHz; Intel® Core™ 2 CPU T5300 @
1.73GHz. All were running Windows XP SP3.
TG combinations tend to take longer to analyze because the
whole game needs to be played out (until checkmate) and then
‘rewound’ three moves before aesthetic assessment can take
place. For this reason, they were analyzed separately and
served as a useful comparison against the COMP
combinations. For each database of a particular type, the
aesthetic assessment was run on the three machines and the
average discrepancy (in %) between the actual time taken to
complete the overall task and the estimated time to completion
calculated (after each component task) was recorded
automatically. The discrepancy percentage (dp) is always a
positive value and calculated as shown in (2); act denotes the
actual time taken to complete the overall task on a particular
machine, and etc the estimated time to completion calculated
after a component task on the same machine.
dp = | [(act etc) ÷ act] | × 100 (2)
Time lengths were measured as integers and in seconds.
The results for the COMP and TG databases are as follows;
charts are presented separately for clarity.
Figure 2. COMP 10
Figure 3. COMP 100
Function ETC (start_time, task_count, task_total) as TIME
current_time = GET_CURRENT_TIME
time_elapsed = current_time start_time
average_time_per_component = time_elapsed ÷ task_count
remaining_time = average_time × (task_total task_count)
ETC = remaining_time
End Function
Figure 4. COMP 1,000
Figure 5. COMP 10,000
Figure 6. TG 10
Figure 7. TG 100
Figure 8. TG 1,000
Figure 9. TG 10,000
Figs. 2 through 5 show how the mean discrepancies for
each of the different COMP databases changed as each
composition was analyzed, whereas Figs. 6 through 9 show the
same for the TG databases. For instance, in Fig. 6, we can see
how after combination 5 was analyzed, the estimate of the time
remaining was accurate (0% discrepancy) but after
combination 6 was analyzed, the estimated time to completion
was off by 15%. A mean discrepancy of 10% was considered
a sufficiently reliable estimation for users. The point or
threshold at and after which the discrepancies never exceed this
– along with the percentage of the database that was already
analyzed at that point – is given in Table I below.
Threshold Completed Threshold Completed
10 4 40% 7 70%
100 29 29% 9 9%
1,000 49 4.9% 32 3.2%
10,000 36 0.36% 342 3.42%
The COMP databases contain combinations that are quicker
to analyze than the TG databases. This means that a typical
component task is shorter in the former than the latter. In fact,
based on the size 10,000 database results on each of the three
machines used, a COMP combination (i.e. one component
task), on average, took 0.18 seconds to analyze whereas a TG
combination took 0.42 seconds; more than twice as long.2 This
difference may be reflected in the thresholds of both. The
COMP databases averaged 29.5 combinations (SD 18.9)
whereas the TG databases averaged 97.5 combinations (SD
163.4). The median number of combinations using data from
both groups is 30.5. In terms of the portion of the database that
was analyzed at the threshold point, the COMP databases
averaged 18.57% (SD 0.19) whereas the TG databases
averaged 21.41% (SD 0.33).
Since the number of data points per group is relatively low,
the application of nonlinear regression and curve-fitting
techniques were considered unsuitable. There are therefore
primarily two ways to interpret the results. First, if we consider
irrelevant the size of the database to be analyzed, we could say
that after about 30 combinations, we have a reliable estimated
time to completion for COMP databases, but for TG databases,
we should wait for about 100 combinations to complete.
Alternatively, we could rely on the median of between 30 to 31
combinations given either database type. Second, if we
consider the database size relevant, we could say that for both
COMP and TG databases, we should wait for between 18-21%
of the combinations in the database to be analyzed before the
estimated time to completion is reliable.
A more general and reasonable compromise given either
database type or a hybrid might therefore be to wait for 20% of
the overall task to complete, or just 30 component task runs to
complete; whichever is possible and comes first. This also
applies, in principle, to any sort of computational task where its
component tasks may or may not have significantly (but not
extremely) different processing times. This recommendation
would therefore be an improvement over the ‘expected’ or
intuitive halfway point or 50% that some might assume. The
evidence suggests that we do not need to wait that long to
obtain a reliable estimate; not even close, in fact.
It is important to remember that there is likely no statistical
method of determining the precise percentage or number of
component tasks to wait for because the actual mean can only
be determined after the overall task is complete [6-7]. The issue
therefore relates back to the much-debated question of what is
a good sample size for estimating the mean of a population; or
rather, in this case, what is the minimum sample size for
determining a reliable-enough estimate of the time remaining.
In this article, we proposed a method of determining the
estimated time to completion for computational tasks that have
uncertain processing times, but can be seen as consisting of
smaller component tasks. A game-related task was tested based
on two groups, one with a significantly longer average
processing time for its components. The nature of the task itself
is likely irrelevant to the method because the main variable in
2 This was calculated based on the cumulative assessment time of the three
size 10,000 databases for each group, divided by 30,000. Since precision was
accurate only to one second (not fractions), and the relevant information
recorded only the elapsed time since the overall task began, individual task
processing times for the purpose of calculating the standard deviation (SD)
properly were not available. That information is, in any case, not particularly
relevant here.
question is the average time consumed thus far at any given
point or threshold, and its reliability in estimating the time
remaining. To the best of our knowledge this method, while
seemingly obvious or intuitive, lacked experimental validation.
Experimental results suggested that a reasonable approach
would be to wait for about 20% of the entire task to complete,
or after 30 component task runs, whichever is possible and
comes first. This recommendation therefore suits overall tasks
that are neither too short nor have too few discrete components.
Good examples would include processing games with many
moves or processing fairly large databases. For the specific
task groups that were tested, or for any other task with similar
component task processing times, the experimental results as
shown in the previous section should be directly applicable.
For other tasks, game-related or otherwise, users may rely on
the general “20% or 30 run” rule, or the results of experiments
such as the ones conducted here but based on those other tasks.
Perhaps the main direction for future work is therefore to
compare the findings of similar experiments to determine the
extent to which our recommendations can be accepted.
In software applications, this information is useful to users
who often find themselves waiting for their game-related or
computational tasks to complete. The estimated time to
completion can be indicated in the game or application using a
simple color code, e.g. red prior to the 20% threshold and
green afterwards to indicate that the estimate is now reliable to
within a 10% margin of error (and improving).
[1] J. Schaeffer et al., “Checkers is Solved,” Science, vol. 317, no. 5844, pp.
1518–1522, September, 2007.
[2] S. Bushinsky, “Deus Ex Machina – A Higher Creative Species in the
Game of Chess,” AI Magazine, vol. 30, no. 3, pp. 63-70, 2009.
[3] M. A. M. Iqbal, A Discrete Computational Aesthetics Model for a Zero-
sum Perfect Information Game, Ph.D. Thesis, University of Malaya,
Kuala Lumpur, Malaysia, 2008.
[4] A. Iqbal, “Aesthetics in Mate-in-3 Combinations, Part I – Combinatorics
and Weights,” ICGA Journal, vol. 33, no. 3, pp. 140–148, 2010.
[5] A. Iqbal, “Aesthetics in Mate-in-3 Combinations, Part II – Normality,”
ICGA Journal, vol. 33, no. 4, pp. 202–211, 2010.
[6] D. Rumsey, Statistics Essentials for Dummies, Wiley Publishing Inc.,
NJ, USA, 2010.
[7] R. A. Donnelly Jr., The Complete Idiot’s Guide to Statistics, Alpha,
Penguin Group, USA, 2004.
Azlan Iqbal received the B.Sc. and M.Sc.
degrees in computer science from Universiti Putra
Malaysia (2000 and 2001, respectively) and the
Ph.D. degree in computer science (artificial
intelligence) from the University of Malaya in
2009. He has been with the College of
Information Technology, Universiti Tenaga
Nasional since 2002, where he is senior lecturer.
He is a memb er of the IEEE and AA AI, and chief
editor of the electronic Journal of Computer
Science and Information Technology (eJCSIT).
His research interests include computational
aesthetics and computational creativity in games.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The game of checkers has roughly 500 billion billion possible positions (5 × 1020). The task of solving the game, determining the final result in a game with no mistakes made by either player, is daunting. Since 1989, almost continuously, dozens of computers have been working on solving checkers, applying state-of-the-art artificial intelligence techniques to the proving process. This paper announces that checkers is now solved: Perfect play by both sides leads to a draw. This is the most challenging popular game to be solved to date, roughly one million times as complex as Connect Four. Artificial intelligence technology has been used to generate strong heuristic-based game-playing programs, such as Deep Blue for chess. Solving a game takes this to the next level by replacing the heuristics with perfection.
Full-text available
One of the best examples of a zero-sum perfect information game is chess. Aesthetics is an important part of it that is greatly appreciated by players. Computers are currently able to play chess at the grandmaster level thanks to efficient search techniques and sheer processing power. However, they are not able to tell a beautiful combination from a bland one. This has left a research gap that, if addressed, would be of benefit to humans, especially chess players. The problem is therefore the inability of computers to recognize aesthetics in the game. Existing models or computational approaches towards aesthetics in chess tend to conflate beauty with composition convention without taking into account the significance of the former in real games. These approaches also typically use fixed values for aesthetic criteria that are rather inadequate given the variety of possibilities on the board. The goal was therefore to develop a computational model for recognizing aesthetics in the game in a way that correlates positively with human assessment. This research began by identifying aesthetics as an independent component applicable to both domains (i.e. compositions and real games). A common ground of aesthetic principles was identified based on the relevant chess literature. The available knowledge on those principles was then formalized as a collection of evaluation functions for computational purposes based on established chess metrics. Several experiments comparing compositions and real games showed that the proposed model was able to identify differences of statistical significance between domains but not within them. Overall, compositions also scored higher than real games. Based on the scope of analysis (i.e. mate-in-3 combinations), any such differences are therefore most likely aesthetic in nature and suggest that the model can recognize beauty in the game. Further experimentation showed a positive correlation between the computational evaluations and those of human chess players. This suggests that the proposed model not only enables computers to recognize aesthetics in the game but also in a way that generally concurs with human assessment.
In part 1 of this research, experimental results showed that, alone, the correlation strength of the scores generated by a computational aesthetics model (for mate-in-3 combinations in chess) with mean human-player aesthetic ratings can be misleading, and that the use of weights or multipliers (even those provided by domain experts) with aesthetic features is unreliable. In this article, the probability distribution of the human ratings was explored as a third criterion to further substantiate such a model’s viability (after achievement of a minimum qualifying standard, and having a reasonably good correlation with the human ratings). Only one approach from the thousands of alternatives tested was found that resembled the human ratings in this way. It combined a specific probability-split ‘random-alternating’ technique with selections of features to be both added and subtracted. The new and unexpected stochastic approach therefore contrasts against the author’s deterministic existing model that generates only precise aesthetic scores. Given the new model’s closer resemblance to the human ratings, its ability to now ‘change its mind’ slightly, and otherwise equivalent performance to the existing model, it was considered an overall improvement and a recommended modification. Additionally, this research highlights a curious 30-70 ‘strictness rule’ which suggests that humans appreciate only the top 30% of aesthetic features associated with an object yet simultaneously penalize it for (up to) the remaining 70% that ‘try’ but fail to ‘impress’.
Computers and human beings play chess differently The basic paradigm that computer programs employ is known as "search and evaluate." Their static evaluation is arguably more primitive than the perceptual one of humans. Yet the intelligence emerging from them is phenomenal. A human spectator is not able to tell the difference between a brilliant computer game and one played by Kasparov Chess played by today's machines looks extraordinary, full of imagination and creativity. Such elements may be Me reason that computers are superior to humans in Me sport of kings, at least for the moment. This article is about how roles have changed: humans play chess like machines, and machines play chess the way Minions used to play.
In this research, I attempted to improve upon an existing computational model for recognizing beauty in mate-in-3 combinations in the game of international (or Western) chess. The intention was also to get some idea how the existing model could be applied outside the current scope (e.g. to single moves, endgame studies). The first part of the research described in this article consists of two phases of experimentation comparing combinations taken from the domains of compositions and real games, and using a sample of mean human-player aesthetic ratings. First, it was discovered that, alone, having a higher positive correlation with the human ratings does not necessarily mean that (the variation of) the model is even viable. Second, variations of the existing model – in terms of the aesthetic features examined and the weights attributed to them – are apparently either worse or, in the minority of cases examined, only equivalent in performance to it. Third, experimental results bring into question the effectiveness of using different weights or multipliers (even those provided by domain experts) with aesthetic features for the purpose of discriminating between them in terms of inherent ‘importance’. The practice was found to be unreliable and therefore offered no improvement over the default intelligently-designed feature evaluation functions that, in principle, do not value some features over others.
The Complete Idiot's Guide to Statistics, Alpha, Penguin Group
  • R A Donnelly Jr
Statistics Essentials for Dummies
  • D Rumsey
The Complete Idiot's Guide to Statistics
  • R A Donnelly
R. A. Donnelly Jr., The Complete Idiot's Guide to Statistics, Alpha, Penguin Group, USA, 2004.