PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

We propose a model of time-dilation that follows from the application of concepts from information theory and computer theory to physical systems. Our model predicts equations for time-dilation that are identical in form to those predicted by the special theory of relativity. In short, our model can be viewed as an alternative set of postulates rooted in information theory and computer theory that imply that time-dilation will occur. We also show that our model is consistent with the general theory of relativity's predictions for time-dilation due to gravity.
Content may be subject to copyright.
A Computational Model of Time-Dilation
Charles Davi
June 26, 2019
Abstract
We propose a model of time-dilation that follows from the application
of concepts from information theory and computer theory to physical sys-
tems. Our model predicts equations for time-dilation that are identical
in form to those predicted by the special theory of relativity. In short,
our model can be viewed as an alternative set of postulates rooted in in-
formation theory and computer theory that imply that time-dilation will
occur. We also show that our model is consistent with the general theory
of relativity’s predictions for time-dilation due to gravity.
1 Introduction
Prior to the twentieth century, physicists appear to have approached nature with
a general presumption that fundamental physical properites such as energy and
charge are continuous. This was likely the result of a combination of prevail-
ing philosphocial views, and mathematical convenience, given that continuous
functions are generally easier to calculate than discrete functions. However, this
view of nature began to unravel in the early twentieth century as a result of the
experiments of Robert A. Millikan, and others, who demonstrated that charge
appeared to be an integer multiple of a single value, the elementary charge e
[17], and confirmed Einstein’s predictions for the energy of electrons ejected due
to the photoelectric effect, which suggested a quantized, particle theory of light
[18]. In the century that followed these experiments, the remarkable success of
quantum mechanics as a general matter demonstrated that whether or not the
ultimate, underlying properties of nature are in fact discrete, the behavior of
nature can nonetheless be predicted to a high degree of precision using models
that make use of discrete values. This historical progression from a presump-
tion of continuous values, towards the realization that fundamental properties
of nature such as charge are quantized, was facilitated in part by the devel-
opment of experimental techniques that were able to make measurements at
increasingly smaller scales, and the computational power of the computer itself,
which facilitated the use of discrete calculations that would be impossible to ac-
complish by hand. Though admittedly anecdotal, this progression suggests the
possibility that at a sufficiently small scale of investigation, perhaps we would
find that all properties of nature are in fact discrete, and thus, all apparently
1
continuous phenomena are simply the result of scale. Below we show that if
we assume that all natural phenomena are both discrete, and capable of being
described by computable functions, then we can achieve a computational model
of time-dilation that predicts equations that are generally identical in form to
those predicted by the special theory of relativity.
1.1 The Information Entropy
Assume that the distribution of characters in a string xis {p1, . . . , pn}, where
piis the number of instances of the i-th character in some alphabet Σ =
{a1, . . . , an}, divided by the length of x. For example, if x= (ab), then our
alphabet is Σ = {a, b}, and p1=p2=1
2, whereas if x= (aaab), then p1=3
4
and p2=1
4. The minimum average number of bits per character required to
encode xas a binary string without loss of information, taking into account only
the distribution of characters within x, is given by,
H(x) =
n
X
i=1
pilog(pi).1(1)
We call H(x) the information entropy of x. The intuition underlying the
information entropy is straightforward, though the derivation of equation (1)
is far from obvious, and is in fact considered the seminal result of information
theory, first published by Claude Shannon in 1948 [20]. To establish an intuition,
consider the second string x= (aaab), and assume that we want to encode x
as a binary string. We would therefore need to assign a binary code to each
of aand b. Since aappears more often than b, if we want to minimize the
length of our encoding of x, then we should assign a shorter code to athan
we do to b. For example, if we signify the end of a binary code with a 1, we
could assign the code 1 to a, and 01 to b.2As such, our encoding of xwould
be 11101, and since xcontains 4 characters, the average number of bits per
character in our encoding of xis 5
4. Now consider the first string x= (ab).
In this case, there are no opportunities for this type of compression because
all characters appear an equal number of times. The same would be true of
x= (abcbca), or x= (qs441z1zsq), each of which has a uniform distribution
of characters. In short, we can take advantage of the statistical structure of a
string, assigning longer codes to characters that appear less often, and shorter
codes to characters that appear more often. If all characters appear an equal
number of times, then there are no opportunities for this type of compression.
1Unless stated otherwise, all logarithms referenced in this paper are base 2.
2Rather than make use of a special delimiting character to signify the end of a binary
string, we could instead make use of a“prefix-code”. A prefix code is an encoding with the
property that no code is a prefix of any other code within the encoding. For example, if we
use the code 01 in a given prefix code, then we cannot use the code 010, since 01 is a prefix of
010. By limiting our encoding in this manner, upon reading the code 01, we would know that
we have read a complete code that corresponds to some number or character. In contrast, if
we include both 01 and 010 in our encoding, then upon reading an 01, it would not be clear
whether we have read a complete code, or the first 2 bits of 010.
2
In general, if a string xis drawn from an alphabet with ncharacters, and the
distribution of these characters within xis uniform, then H(x) = log(n), which
is the maximum value of H(x) for a string of any length drawn from an alphabet
with ncharacters.
Shannon showed in [20] that a minimum encoding of xwould asign a code
of length li= log( 1
pi) to each aiΣ. If the length of xis N, then each aiwill
appear N pitimes within x. Thus, the minimum total number of bits required
to encode xusing this type of statistical compression is Pn
i=1 Npili=N H(x).
Therefore, the minimum average number of bits per character required to encode
a string of length Nis H(x). Note that H(x) is not merely a theoretical measure
of information content, since there is always an actual binary encoding of x
for which the average number of bits per character is approximately H(x) [14].
Thus, H(x) is a measure of the average number of bits required to actually store
or transmit a single character in x. However, the value of H(x) is a function of
only the distribution of characters within x, and therefore, does not take into
account other opportunities for compression. For example, a string of the form
x=aNbNcNhas an obvious structure, yet H(x) = log(3) is maximized, since
xhas a uniform distribution of characters.3Thus, even if a string has a high
information entropy, the string could nonetheless have a simple structure.
1.2 The Information Content of a System
Despite the limitations of H(x), we can still use H(x) to measure the information
content of representations of physical systems, understanding that we are able
to account for only the statistical structure of the representation. We begin with
a very simple example: consider a system comprised of Nparticles that initially
all travel in the same direction, but that over time have increasingly random,
divergent motions. We could represent the direction of motion of each particle
relative to some fixed axis using an angle θ. If we fix the level of detail of our
representation of this system by breaking θinto groups of A= [0,π
2), B= [π
2, π),
C= [π, 3π
2), and D= [3π
2,2π), then we could represent the direction of motion
of each particle in the system at a given moment in time as a character from
Σ = {A, B, C, D }(see Figure 1 below). Note that this is clearly not a complete
and accurate representation of the particles, since we have, for example, ignored
the magnitude of the velocity of each particle. Nonetheless, we can represent
the direction of motion of all of the particles at a given moment in time as a
string of characters drawn from Σ of length N. For example, if at time tthe
direction of motion of each particle is θ= 0, then we could represent the motions
of the particles at tas the string x= (A· · · A), where the length of x, denoted
|x|, is equal to N. As such, the distribution of motion is initially entirely
concentrated in group A, and the resultant distribution of characters within
xis {1,0,0,0}. The information entropy of {1,0,0,0}is P4
i=1 pilog(pi)=0
bits,4and therefore, the minimum average number of bits per character required
3For example, if N= 2, then x=aabbcc.
4As is typical when calculating H(x), we assume that 0log(0) = 0.
3
to encode this representation of the particles at tis 0 bits. Over time, the
particles will have increasingly divergent motions, and as such, the distribution
of characters within xwill approach the uniform distribution, which is in this
case {1
4,1
4,1
4,1
4}, which has an information entropy of log(4) = 2 bits. Thus,
the information entropy of this representation of the particles will increase over
time.
θA
0
π
2
π
3π
2
Figure 1: A mapping of angles to Σ = {A, B, C, D}.
We could argue that, as a result, the information content of the system it-
self increases over time, but this argument is imprecise, since this particular
measure of information content is a function of the chosen representation, even
though the behavior of the system can impact the information content of the
representation. For example, if we made use of a finer gradation of the angle θ
above, increasing the number of groups, we would increase the number of char-
acters in our alphabet, thereby increasing the maximum information content
of the representation, without changing the system in any way. However, this
does not imply that representations are always arbitrary. For example, if some
property of a system can take on only ndiscrete values, then a representation
of the system that restricts the value of this property to one of these nval-
ues is not arbitrary. The point is that as a practical matter, our selection of
certain properties will almost certainly be incomplete, and measured at some
arbitrary level of precision, which will result in an arbitrary amount of infor-
mation. Thus, as a practical matter, we probably cannot answer the question
of how much information is required to completely and accurately represent a
physical system. We can, however, make certain assumptions that would allow
us to construct a complete and accurate representation of a system, and then
measure the information content of that representation.
Assumption 1.1. There is a finite set of nmeasurable properties Γsuch that
4
(1) for any measurable property P6∈ Γ, the value of Pcan be derived from the
values of the properties PiΓ, and (2) there is no PiΓsuch that the value
of Pican be derived from the other n1properties in ΓPi.
We call each of the properties PiΓ a basis property. Note that we are
not suggesting that all properties are a linear combination of the basis properties
within Γ. Rather, as discussed in Sections 1.3 and 2 below, we assume that all
other measurable properties can be derived from Γ using computable functions.
For example, if mass and velocity are included in Γ, then Assumption 1.1 implies
that momentum would not be included in Γ, since momentum can be derived
from mass and velocity using a computable function.5
Assumption 1.2. For any closed system,6each PiΓcan take on only a
finite number of possible values.
For example, assume that a closed system Scontains a finite number of N
elements.7Assumption 1.1 implies that there is a single set of basis properties Γ
from which all other measurable properties of any given element can be derived.
As such, in this case, Sconsists of a finite number of elements, each with a finite
number of measurable basis properties, and Assumption 1.2 implies that each
such basis property can take on only a finite number of possible values.
Assumption 1.3. For any system, all measurable properties of the system, as
of a given moment in time, can be derived from the values of the basis properties
of the elements of the system, as of that moment in time.
Together, Assumptions 1.1 and 1.3 allow us to construct a complete and
accurate representation of a system at a given moment in time. Specifically, if
we were able to measure the basis properties of every element of Sat time t,
then we could construct a representation of the state of Sas a set of Nstrings
S(t) = {s1, . . . , sN}, with each string representing an element of S, where each
string si= (v1, . . . , vn) consists of nvalues, with vjrepresenting the value of
the j-th basis property of the i-th element of Sat time t. Because S(t) contains
the values of the basis properties of every element of Sat time t, Assumption
1.3 implies that we can derive the value of any property of Sat time tfrom
the representation S(t) itself. For example, Assumption 1.3 implies that there
is some computable function fthat can calculate the momentum ρof Sat time
twhen given S(t) as input. Expressed symbolically, ρ=f(S(t)).8Thus, S(t)
contains all of the information necessary to calculate any measurable property of
Sat time t, and therefore, we can take the view that S(t) constitutes a complete
5We will discuss computable functions in greater detail in Sections 1.3 and 2 below, but for
now, a computable function can be defined informally as any function that can be implemented
using an algorithm.
6We view a system as closed if it does not interact with any other systems or exogenous
forces, and is bounded within some definite volume.
7We deliberately use the generic term “element”, which we will clarify in Section 2 below.
8We assume that all particles and systems have definite locations, and definite values for
all of their measurable properties. As such, our model, as presented herein, is necessarily
incomplete, since it does not address systems governed by the laws of quantum mechanics.
5
and accurate representation of the state of Sat time t. Note that Assumption
1.3 does not imply that we can determine all future properties of Sgiven the
values of the basis properties of its elements at time t, but rather, that the value
of any property of Sthat exists at time tcan be derived from the values of the
basis properties of its elements as of time t.9Of course, we are not suggesting
that we can construct such a representation as a practical matter, but rather,
we will use the concept of S(t) as a theoretical tool to analyze the information
content of systems generally, and ultimately, construct a model of time-dilation.
Recall that Assumption 1.2 implies that each basis property of Scan take on
only a finite number of possible values. If we assume that the nbasis properties
are independent, and that basis property Pican take on kipossible values, then
the basis properties of each element of Scan have any one of K=k1· · · kn
possible combinations of values. If we distinguish between the Nelements of S,
and assume that the basis properties of each element are independent from those
of the other elements, then there are KNpossible combinations of values for the
basis properties of every element of S. Since the values of the basis properties of
the elements of Sdetermine all measurable properties of S, it follows that any
definition of the overall state of Swill reference either the basis properties of the
elements of S, or measurements derived from the basis properties of the elements
of S. As such, any definition of the overall state of Swill ultimately reference
a particular combination of values for the basis properties of the elements of
S. For example, Assumption 1.3 implies that the temperature of a system is
determined by the values of the basis properties of its elements. Therefore, the
maximum number of states of Sis equal to the number of unique combinations
of values for the basis properties of the elements of S, regardless of our choice
of the definition of the overall state of S.10 We can assign each such state a
number from 1 to KN, and by Siwe denote a representation of the i-th state
of S. That is, S(t) denotes a representation of the state of Sat time t, whereas
Sidenotes a representation of the i-th possible state of Sin some arbitrary
ordering of the states of S. Thus, for any given moment in time t, there exists
an Sisuch that S(t) = Si. By |S|=KNwe denote the number of possible
states of S.
Now imagine that we measure the value of every basis property of every
element of Sover some long period of time, generating Msamples of the state
of S, and that for each sample we store the number assigned to the particular
state of Swe observe. For example, if we observe Sj, then we would add the
number jto our string. Thus, in this case, Σ = {1,...,|S|} is the alphabet, and
the resultant string is a string of numbers x= (n1· · · nM), representing the M
states of Sobserved over time. Further, assume that we find that the distribu-
tion of the states of Sover that interval of time is ϕ={p1, . . . , p|S|}, where pi
is the number of times Siis observed divided by the number of samples M. We
could then encode xas a binary string, and the minimum average number of
9We will discuss calculating future states of systems in Section 1.4 below.
10We will revisit this topic in the context of computable functions on the basis properties
of a system in Section 2 below.
6
bits required to identify a single state of Swould be H(x) = P|S|
i=1 pilog(pi).
Note that we are not encoding the values of the basis properties of the elements
within S(t), but we are instead representing each observed state of Swith a
number, and encoding the resultant string of numbers. That is, each possible
combination of values for the basis properties of the elements of Scorresponds
to a particular unique overall state of S, which, when observed, we represent
with a number. In contrast, in Section 1.4 below, we will use the Kolmogorov
complexity to measure the information contained in an encoding of S(t) itself.
If the distribution ϕis stable over time, then we write H(S) to denote the infor-
mation entropy of x, which we call the representational entropy of S. Thus,
H(S) is the average number of bits per state necessary to identify the particular
states of Sthat are observed over time. If ϕis the uniform distribution, then
we have,
H(S) = log(|S|).(2)
Thus, the representational entropy of a system that is equally likely to be in
any one of its possible states is equal to the logarithm of the number of possible
states. We note that equation (2) is similar in form to the thermodynamic
entropy of a system kBln(Ω), where kBis the Boltzmann constant, and Ω is
the number of microstates the system can occupy given its macrostate. This is
certainly not a novel observation, and the literature on the connections between
information theory and thermodynamic entropy is extensive. (See [3] and [16]).
In fact, the similarity between the two equations was noted by Shannon himself
in [20]. However, the goal of our model is to achieve time-dilation, and thus, a
review of this topic is beyond the scope of this paper. Finally, note that if ϕis
stable, then we can interpret pias the probability that S(t) = Sifor any given
t, and therefore, we can view H(S) as the expected number of bits necessary to
identify a single state of S.
We can also view Sas a medium in which we can store information. The
number of bits that can be stored in Sis also equal to log(|S|), regardless of the
distribution ϕ, which we call the information capacity of S. Under this view,
we do not observe Sand record its state, but rather, we “write” the current
state of Sby fixing the value of every basis property of every element of S, and
use that state to represent a number or character. As such, when we “read” the
current state of S, measuring the value of every basis property of every element
of S, we can view each possible current state Sias representing some number
or character, including, for example, the number i. As such, Scan represent
any number from 1 to |S|, and thus, the information contained in the current
state of Sis equivalent to the information contained in a binary string of length
log(|S|).11 For example, whether the system is a single switch that can be in any
one of 16 states, or a set of 4 switches that can be in any one of 2 states, in either
case, measuring the current state of the system can be viewed as equivalent to
reading log(16) = 4 bits of information. Thus, each state of Scan be viewed as
11Note that a binary string of length log(|S|) has |S|states, and as such, can code for all
numbers from 1 to |S|.
7
containing log(|S|) bits of information, which we call the information content
of S.
Note that the representational entropy of a system is a measure of how much
information is necessary to identify the states of a system that are observed over
time, which, although driven by the behavior of the system, is ultimately a mea-
sure of an amount of information that will be stored outside of the system itself.
In contrast, the information capacity and information content of a system are
measures of the amount of information physically contained within the system.
Though the information capacity and the information content are always equal,
conceptually, it is worthwhile to distinguish between the two, since the infor-
mation capacity tells us how much information a system can store as a general
matter, whereas the information content tells us how much information is ob-
served when we measure the basis properties of every element of a given state
of the system.
Finally, note that if a system is closed, then no exogenous information has
been “written” into the system. Nonetheless, if we were to “read” the current
state of a closed system, we would read log(|S|) bits of information. The in-
formation read in that case does not represent some exogenous symbol, but is
instead the information that describes the basis properties of the system. Thus,
the amount of information observed when we measure the basis properties of
every element of a given state of the system is log(|S|) bits.
1.3 The Kolmogorov Complexity
Consider again a string of the form x=aNbNcN. As noted above, xhas
an obvious structure, yet H(x) = log(3), which is the maximum information
entropy for a string drawn from an alphabet with 3 characters. Thus, the
information entropy is not a measure of randomness, since it can be maximized
given strings that are clearly not random in any sense of the word. Assume that
N= 108, and that as such, at least |x|H(x)=3×108log(3) bits are required to
encode xusing a statistical encoding. Because xhas such an obvious structure,
we can write and store a short program that generates x, which will probably
require fewer bits than encoding and storing each character of x. Note that
for any given programming language, there will be some shortest program that
generates x, even if we can’t prove as a practical matter that a given program
is the shortest such program.
This is the intuition underlying the Kolmogorov complexity of a binary
string x, denoted K(x), which is, informally, the length of the shortest program,
measured in bits, that generates xas output. More formally, given a Universal
Turing Machine U(a “UTM”) and a binary string x,K(x) is the length of
the shortest binary string yfor which U(y) = x[12].12 Note that K(x) does
not consider the number of operations necessary to compute U(y), but only the
12Note that some applications of K(x) depend upon whether the UTM is a “prefix machine”,
which is a UTM whose inputs form a prefix-code, and thus, do not require special delimiters
to indicate the end of a string. For simplicity, all UTMs referenced in this paper are not prefix
machines, and thus, an integer ncan be specified as the input to a UTM using log(n) bits.
8
length of y, the binary string that generates x. Thus, K(x) is not a measure of
overall efficiency, since a program could be short, but nonetheless require an un-
necessary number of operations. Instead, K(x) is a measure of the information
content of x, since at most K(x) bits are necessary to generate xon a UTM.
We will not discuss the theory of computability in any depth, but it is
necessary that we briefly mention the Church-Turing Thesis, which, stated in-
formally, asserts that any computation that can be performed by a device, or
human being, using some mechanical process, can also be performed by a UTM
[22] [21].13 In short, Turing’s formulation of the thesis asserts that every me-
chanical method of computation can be simulated by a UTM. Historically, every
method of computation that has ever been proposed has been proven to be ei-
ther equivalent to a UTM, or a more limited method that can be simulated by
a UTM. As such, the Church-Turing Thesis is not a mathematical theorem, but
is instead a hypothesis that has turned out to be true as an empirical matter.
The most important consequence of the thesis for purposes of this paper, is that
any mathematical function that can be expressed as an algorithm is assumed
to be a computable function, which is a function that can be caculated by a
UTM. However, it can be shown that there are non-computable functions,
which are functions that cannot be caculated by a UTM, arguably the most fa-
mous of which was defined by Turing himself, in what is known as the “Halting
Problem” [21].14 Unfortunately, K(x) is a non-computable function [23], which
means that there is no program that can, as a general matter, take a binary
string xas input, and calculate K(x). However, K(x) is nonetheless a powerful
theoretical measure of information content.
For example, consider the string x= (aaabbaaabbbb)N. The distribution of
characters in this string is uniform, and as such, H(x) = log(2) is maximized.
However, we could of course write a short program that generates this string
for a given N. Because such a program can be written in some programming
language, it is therefore computable, and can be simulated by U(y) = x, for
some y. Therefore, K(x)≤ |y|. While this statement may initially seem trivial,
it implies that K(