Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Jun 26, 2019

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on May 02, 2018

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on May 02, 2018

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Apr 27, 2018

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Apr 26, 2018

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Apr 26, 2018

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Apr 24, 2018

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Apr 24, 2018

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Apr 24, 2018

Content may be subject to copyright.

Content uploaded by Charles Davi

Author content

All content in this area was uploaded by Charles Davi on Apr 24, 2018

Content may be subject to copyright.

A Computational Model of Time-Dilation

Charles Davi

June 26, 2019

Abstract

We propose a model of time-dilation that follows from the application

of concepts from information theory and computer theory to physical sys-

tems. Our model predicts equations for time-dilation that are identical

in form to those predicted by the special theory of relativity. In short,

our model can be viewed as an alternative set of postulates rooted in in-

formation theory and computer theory that imply that time-dilation will

occur. We also show that our model is consistent with the general theory

of relativity’s predictions for time-dilation due to gravity.

1 Introduction

Prior to the twentieth century, physicists appear to have approached nature with

a general presumption that fundamental physical properites such as energy and

charge are continuous. This was likely the result of a combination of prevail-

ing philosphocial views, and mathematical convenience, given that continuous

functions are generally easier to calculate than discrete functions. However, this

view of nature began to unravel in the early twentieth century as a result of the

experiments of Robert A. Millikan, and others, who demonstrated that charge

appeared to be an integer multiple of a single value, the elementary charge e

[17], and conﬁrmed Einstein’s predictions for the energy of electrons ejected due

to the photoelectric eﬀect, which suggested a quantized, particle theory of light

[18]. In the century that followed these experiments, the remarkable success of

quantum mechanics as a general matter demonstrated that whether or not the

ultimate, underlying properties of nature are in fact discrete, the behavior of

nature can nonetheless be predicted to a high degree of precision using models

that make use of discrete values. This historical progression from a presump-

tion of continuous values, towards the realization that fundamental properties

of nature such as charge are quantized, was facilitated in part by the devel-

opment of experimental techniques that were able to make measurements at

increasingly smaller scales, and the computational power of the computer itself,

which facilitated the use of discrete calculations that would be impossible to ac-

complish by hand. Though admittedly anecdotal, this progression suggests the

possibility that at a suﬃciently small scale of investigation, perhaps we would

ﬁnd that all properties of nature are in fact discrete, and thus, all apparently

1

continuous phenomena are simply the result of scale. Below we show that if

we assume that all natural phenomena are both discrete, and capable of being

described by computable functions, then we can achieve a computational model

of time-dilation that predicts equations that are generally identical in form to

those predicted by the special theory of relativity.

1.1 The Information Entropy

Assume that the distribution of characters in a string xis {p1, . . . , pn}, where

piis the number of instances of the i-th character in some alphabet Σ =

{a1, . . . , an}, divided by the length of x. For example, if x= (ab), then our

alphabet is Σ = {a, b}, and p1=p2=1

2, whereas if x= (aaab), then p1=3

4

and p2=1

4. The minimum average number of bits per character required to

encode xas a binary string without loss of information, taking into account only

the distribution of characters within x, is given by,

H(x) = −

n

X

i=1

pilog(pi).1(1)

We call H(x) the information entropy of x. The intuition underlying the

information entropy is straightforward, though the derivation of equation (1)

is far from obvious, and is in fact considered the seminal result of information

theory, ﬁrst published by Claude Shannon in 1948 [20]. To establish an intuition,

consider the second string x= (aaab), and assume that we want to encode x

as a binary string. We would therefore need to assign a binary code to each

of aand b. Since aappears more often than b, if we want to minimize the

length of our encoding of x, then we should assign a shorter code to athan

we do to b. For example, if we signify the end of a binary code with a 1, we

could assign the code 1 to a, and 01 to b.2As such, our encoding of xwould

be 11101, and since xcontains 4 characters, the average number of bits per

character in our encoding of xis 5

4. Now consider the ﬁrst string x= (ab).

In this case, there are no opportunities for this type of compression because

all characters appear an equal number of times. The same would be true of

x= (abcbca), or x= (qs441z1zsq), each of which has a uniform distribution

of characters. In short, we can take advantage of the statistical structure of a

string, assigning longer codes to characters that appear less often, and shorter

codes to characters that appear more often. If all characters appear an equal

number of times, then there are no opportunities for this type of compression.

1Unless stated otherwise, all logarithms referenced in this paper are base 2.

2Rather than make use of a special delimiting character to signify the end of a binary

string, we could instead make use of a“preﬁx-code”. A preﬁx code is an encoding with the

property that no code is a preﬁx of any other code within the encoding. For example, if we

use the code 01 in a given preﬁx code, then we cannot use the code 010, since 01 is a preﬁx of

010. By limiting our encoding in this manner, upon reading the code 01, we would know that

we have read a complete code that corresponds to some number or character. In contrast, if

we include both 01 and 010 in our encoding, then upon reading an 01, it would not be clear

whether we have read a complete code, or the ﬁrst 2 bits of 010.

2

In general, if a string xis drawn from an alphabet with ncharacters, and the

distribution of these characters within xis uniform, then H(x) = log(n), which

is the maximum value of H(x) for a string of any length drawn from an alphabet

with ncharacters.

Shannon showed in [20] that a minimum encoding of xwould asign a code

of length li= log( 1

pi) to each ai∈Σ. If the length of xis N, then each aiwill

appear N pitimes within x. Thus, the minimum total number of bits required

to encode xusing this type of statistical compression is Pn

i=1 Npili=N H(x).

Therefore, the minimum average number of bits per character required to encode

a string of length Nis H(x). Note that H(x) is not merely a theoretical measure

of information content, since there is always an actual binary encoding of x

for which the average number of bits per character is approximately H(x) [14].

Thus, H(x) is a measure of the average number of bits required to actually store

or transmit a single character in x. However, the value of H(x) is a function of

only the distribution of characters within x, and therefore, does not take into

account other opportunities for compression. For example, a string of the form

x=aNbNcNhas an obvious structure, yet H(x) = log(3) is maximized, since

xhas a uniform distribution of characters.3Thus, even if a string has a high

information entropy, the string could nonetheless have a simple structure.

1.2 The Information Content of a System

Despite the limitations of H(x), we can still use H(x) to measure the information

content of representations of physical systems, understanding that we are able

to account for only the statistical structure of the representation. We begin with

a very simple example: consider a system comprised of Nparticles that initially

all travel in the same direction, but that over time have increasingly random,

divergent motions. We could represent the direction of motion of each particle

relative to some ﬁxed axis using an angle θ. If we ﬁx the level of detail of our

representation of this system by breaking θinto groups of A= [0,π

2), B= [π

2, π),

C= [π, 3π

2), and D= [3π

2,2π), then we could represent the direction of motion

of each particle in the system at a given moment in time as a character from

Σ = {A, B, C, D }(see Figure 1 below). Note that this is clearly not a complete

and accurate representation of the particles, since we have, for example, ignored

the magnitude of the velocity of each particle. Nonetheless, we can represent

the direction of motion of all of the particles at a given moment in time as a

string of characters drawn from Σ of length N. For example, if at time tthe

direction of motion of each particle is θ= 0, then we could represent the motions

of the particles at tas the string x= (A· · · A), where the length of x, denoted

|x|, is equal to N. As such, the distribution of motion is initially entirely

concentrated in group A, and the resultant distribution of characters within

xis {1,0,0,0}. The information entropy of {1,0,0,0}is −P4

i=1 pilog(pi)=0

bits,4and therefore, the minimum average number of bits per character required

3For example, if N= 2, then x=aabbcc.

4As is typical when calculating H(x), we assume that 0log(0) = 0.

3

to encode this representation of the particles at tis 0 bits. Over time, the

particles will have increasingly divergent motions, and as such, the distribution

of characters within xwill approach the uniform distribution, which is in this

case {1

4,1

4,1

4,1

4}, which has an information entropy of log(4) = 2 bits. Thus,

the information entropy of this representation of the particles will increase over

time.

θ→A

0

π

2

π

3π

2

Figure 1: A mapping of angles to Σ = {A, B, C, D}.

We could argue that, as a result, the information content of the system it-

self increases over time, but this argument is imprecise, since this particular

measure of information content is a function of the chosen representation, even

though the behavior of the system can impact the information content of the

representation. For example, if we made use of a ﬁner gradation of the angle θ

above, increasing the number of groups, we would increase the number of char-

acters in our alphabet, thereby increasing the maximum information content

of the representation, without changing the system in any way. However, this

does not imply that representations are always arbitrary. For example, if some

property of a system can take on only ndiscrete values, then a representation

of the system that restricts the value of this property to one of these nval-

ues is not arbitrary. The point is that as a practical matter, our selection of

certain properties will almost certainly be incomplete, and measured at some

arbitrary level of precision, which will result in an arbitrary amount of infor-

mation. Thus, as a practical matter, we probably cannot answer the question

of how much information is required to completely and accurately represent a

physical system. We can, however, make certain assumptions that would allow

us to construct a complete and accurate representation of a system, and then

measure the information content of that representation.

Assumption 1.1. There is a ﬁnite set of nmeasurable properties Γsuch that

4

(1) for any measurable property P6∈ Γ, the value of Pcan be derived from the

values of the properties Pi∈Γ, and (2) there is no Pi∈Γsuch that the value

of Pican be derived from the other n−1properties in Γ−Pi.

We call each of the properties Pi∈Γ a basis property. Note that we are

not suggesting that all properties are a linear combination of the basis properties

within Γ. Rather, as discussed in Sections 1.3 and 2 below, we assume that all

other measurable properties can be derived from Γ using computable functions.

For example, if mass and velocity are included in Γ, then Assumption 1.1 implies

that momentum would not be included in Γ, since momentum can be derived

from mass and velocity using a computable function.5

Assumption 1.2. For any closed system,6each Pi∈Γcan take on only a

ﬁnite number of possible values.

For example, assume that a closed system Scontains a ﬁnite number of N

elements.7Assumption 1.1 implies that there is a single set of basis properties Γ

from which all other measurable properties of any given element can be derived.

As such, in this case, Sconsists of a ﬁnite number of elements, each with a ﬁnite

number of measurable basis properties, and Assumption 1.2 implies that each

such basis property can take on only a ﬁnite number of possible values.

Assumption 1.3. For any system, all measurable properties of the system, as

of a given moment in time, can be derived from the values of the basis properties

of the elements of the system, as of that moment in time.

Together, Assumptions 1.1 and 1.3 allow us to construct a complete and

accurate representation of a system at a given moment in time. Speciﬁcally, if

we were able to measure the basis properties of every element of Sat time t,

then we could construct a representation of the state of Sas a set of Nstrings

S(t) = {s1, . . . , sN}, with each string representing an element of S, where each

string si= (v1, . . . , vn) consists of nvalues, with vjrepresenting the value of

the j-th basis property of the i-th element of Sat time t. Because S(t) contains

the values of the basis properties of every element of Sat time t, Assumption

1.3 implies that we can derive the value of any property of Sat time tfrom

the representation S(t) itself. For example, Assumption 1.3 implies that there

is some computable function fthat can calculate the momentum ρof Sat time

twhen given S(t) as input. Expressed symbolically, ρ=f(S(t)).8Thus, S(t)

contains all of the information necessary to calculate any measurable property of

Sat time t, and therefore, we can take the view that S(t) constitutes a complete

5We will discuss computable functions in greater detail in Sections 1.3 and 2 below, but for

now, a computable function can be deﬁned informally as any function that can be implemented

using an algorithm.

6We view a system as closed if it does not interact with any other systems or exogenous

forces, and is bounded within some deﬁnite volume.

7We deliberately use the generic term “element”, which we will clarify in Section 2 below.

8We assume that all particles and systems have deﬁnite locations, and deﬁnite values for

all of their measurable properties. As such, our model, as presented herein, is necessarily

incomplete, since it does not address systems governed by the laws of quantum mechanics.

5

and accurate representation of the state of Sat time t. Note that Assumption

1.3 does not imply that we can determine all future properties of Sgiven the

values of the basis properties of its elements at time t, but rather, that the value

of any property of Sthat exists at time tcan be derived from the values of the

basis properties of its elements as of time t.9Of course, we are not suggesting

that we can construct such a representation as a practical matter, but rather,

we will use the concept of S(t) as a theoretical tool to analyze the information

content of systems generally, and ultimately, construct a model of time-dilation.

Recall that Assumption 1.2 implies that each basis property of Scan take on

only a ﬁnite number of possible values. If we assume that the nbasis properties

are independent, and that basis property Pican take on kipossible values, then

the basis properties of each element of Scan have any one of K=k1· · · kn

possible combinations of values. If we distinguish between the Nelements of S,

and assume that the basis properties of each element are independent from those

of the other elements, then there are KNpossible combinations of values for the

basis properties of every element of S. Since the values of the basis properties of

the elements of Sdetermine all measurable properties of S, it follows that any

deﬁnition of the overall state of Swill reference either the basis properties of the

elements of S, or measurements derived from the basis properties of the elements

of S. As such, any deﬁnition of the overall state of Swill ultimately reference

a particular combination of values for the basis properties of the elements of

S. For example, Assumption 1.3 implies that the temperature of a system is

determined by the values of the basis properties of its elements. Therefore, the

maximum number of states of Sis equal to the number of unique combinations

of values for the basis properties of the elements of S, regardless of our choice

of the deﬁnition of the overall state of S.10 We can assign each such state a

number from 1 to KN, and by Siwe denote a representation of the i-th state

of S. That is, S(t) denotes a representation of the state of Sat time t, whereas

Sidenotes a representation of the i-th possible state of Sin some arbitrary

ordering of the states of S. Thus, for any given moment in time t, there exists

an Sisuch that S(t) = Si. By |S|=KNwe denote the number of possible

states of S.

Now imagine that we measure the value of every basis property of every

element of Sover some long period of time, generating Msamples of the state

of S, and that for each sample we store the number assigned to the particular

state of Swe observe. For example, if we observe Sj, then we would add the

number jto our string. Thus, in this case, Σ = {1,...,|S|} is the alphabet, and

the resultant string is a string of numbers x= (n1· · · nM), representing the M

states of Sobserved over time. Further, assume that we ﬁnd that the distribu-

tion of the states of Sover that interval of time is ϕ={p1, . . . , p|S|}, where pi

is the number of times Siis observed divided by the number of samples M. We

could then encode xas a binary string, and the minimum average number of

9We will discuss calculating future states of systems in Section 1.4 below.

10We will revisit this topic in the context of computable functions on the basis properties

of a system in Section 2 below.

6

bits required to identify a single state of Swould be H(x) = −P|S|

i=1 pilog(pi).

Note that we are not encoding the values of the basis properties of the elements

within S(t), but we are instead representing each observed state of Swith a

number, and encoding the resultant string of numbers. That is, each possible

combination of values for the basis properties of the elements of Scorresponds

to a particular unique overall state of S, which, when observed, we represent

with a number. In contrast, in Section 1.4 below, we will use the Kolmogorov

complexity to measure the information contained in an encoding of S(t) itself.

If the distribution ϕis stable over time, then we write H(S) to denote the infor-

mation entropy of x, which we call the representational entropy of S. Thus,

H(S) is the average number of bits per state necessary to identify the particular

states of Sthat are observed over time. If ϕis the uniform distribution, then

we have,

H(S) = log(|S|).(2)

Thus, the representational entropy of a system that is equally likely to be in

any one of its possible states is equal to the logarithm of the number of possible

states. We note that equation (2) is similar in form to the thermodynamic

entropy of a system kBln(Ω), where kBis the Boltzmann constant, and Ω is

the number of microstates the system can occupy given its macrostate. This is

certainly not a novel observation, and the literature on the connections between

information theory and thermodynamic entropy is extensive. (See [3] and [16]).

In fact, the similarity between the two equations was noted by Shannon himself

in [20]. However, the goal of our model is to achieve time-dilation, and thus, a

review of this topic is beyond the scope of this paper. Finally, note that if ϕis

stable, then we can interpret pias the probability that S(t) = Sifor any given

t, and therefore, we can view H(S) as the expected number of bits necessary to

identify a single state of S.

We can also view Sas a medium in which we can store information. The

number of bits that can be stored in Sis also equal to log(|S|), regardless of the

distribution ϕ, which we call the information capacity of S. Under this view,

we do not observe Sand record its state, but rather, we “write” the current

state of Sby ﬁxing the value of every basis property of every element of S, and

use that state to represent a number or character. As such, when we “read” the

current state of S, measuring the value of every basis property of every element

of S, we can view each possible current state Sias representing some number

or character, including, for example, the number i. As such, Scan represent

any number from 1 to |S|, and thus, the information contained in the current

state of Sis equivalent to the information contained in a binary string of length

log(|S|).11 For example, whether the system is a single switch that can be in any

one of 16 states, or a set of 4 switches that can be in any one of 2 states, in either

case, measuring the current state of the system can be viewed as equivalent to

reading log(16) = 4 bits of information. Thus, each state of Scan be viewed as

11Note that a binary string of length log(|S|) has |S|states, and as such, can code for all

numbers from 1 to |S|.

7

containing log(|S|) bits of information, which we call the information content

of S.

Note that the representational entropy of a system is a measure of how much

information is necessary to identify the states of a system that are observed over

time, which, although driven by the behavior of the system, is ultimately a mea-

sure of an amount of information that will be stored outside of the system itself.

In contrast, the information capacity and information content of a system are

measures of the amount of information physically contained within the system.

Though the information capacity and the information content are always equal,

conceptually, it is worthwhile to distinguish between the two, since the infor-

mation capacity tells us how much information a system can store as a general

matter, whereas the information content tells us how much information is ob-

served when we measure the basis properties of every element of a given state

of the system.

Finally, note that if a system is closed, then no exogenous information has

been “written” into the system. Nonetheless, if we were to “read” the current

state of a closed system, we would read log(|S|) bits of information. The in-

formation read in that case does not represent some exogenous symbol, but is

instead the information that describes the basis properties of the system. Thus,

the amount of information observed when we measure the basis properties of

every element of a given state of the system is log(|S|) bits.

1.3 The Kolmogorov Complexity

Consider again a string of the form x=aNbNcN. As noted above, xhas

an obvious structure, yet H(x) = log(3), which is the maximum information

entropy for a string drawn from an alphabet with 3 characters. Thus, the

information entropy is not a measure of randomness, since it can be maximized

given strings that are clearly not random in any sense of the word. Assume that

N= 108, and that as such, at least |x|H(x)=3×108log(3) bits are required to

encode xusing a statistical encoding. Because xhas such an obvious structure,

we can write and store a short program that generates x, which will probably

require fewer bits than encoding and storing each character of x. Note that

for any given programming language, there will be some shortest program that

generates x, even if we can’t prove as a practical matter that a given program

is the shortest such program.

This is the intuition underlying the Kolmogorov complexity of a binary

string x, denoted K(x), which is, informally, the length of the shortest program,

measured in bits, that generates xas output. More formally, given a Universal

Turing Machine U(a “UTM”) and a binary string x,K(x) is the length of

the shortest binary string yfor which U(y) = x[12].12 Note that K(x) does

not consider the number of operations necessary to compute U(y), but only the

12Note that some applications of K(x) depend upon whether the UTM is a “preﬁx machine”,

which is a UTM whose inputs form a preﬁx-code, and thus, do not require special delimiters

to indicate the end of a string. For simplicity, all UTMs referenced in this paper are not preﬁx

machines, and thus, an integer ncan be speciﬁed as the input to a UTM using log(n) bits.

8

length of y, the binary string that generates x. Thus, K(x) is not a measure of

overall eﬃciency, since a program could be short, but nonetheless require an un-

necessary number of operations. Instead, K(x) is a measure of the information

content of x, since at most K(x) bits are necessary to generate xon a UTM.

We will not discuss the theory of computability in any depth, but it is

necessary that we brieﬂy mention the Church-Turing Thesis, which, stated in-

formally, asserts that any computation that can be performed by a device, or

human being, using some mechanical process, can also be performed by a UTM

[22] [21].13 In short, Turing’s formulation of the thesis asserts that every me-

chanical method of computation can be simulated by a UTM. Historically, every

method of computation that has ever been proposed has been proven to be ei-

ther equivalent to a UTM, or a more limited method that can be simulated by

a UTM. As such, the Church-Turing Thesis is not a mathematical theorem, but

is instead a hypothesis that has turned out to be true as an empirical matter.

The most important consequence of the thesis for purposes of this paper, is that

any mathematical function that can be expressed as an algorithm is assumed

to be a computable function, which is a function that can be caculated by a

UTM. However, it can be shown that there are non-computable functions,

which are functions that cannot be caculated by a UTM, arguably the most fa-

mous of which was deﬁned by Turing himself, in what is known as the “Halting

Problem” [21].14 Unfortunately, K(x) is a non-computable function [23], which

means that there is no program that can, as a general matter, take a binary

string xas input, and calculate K(x). However, K(x) is nonetheless a powerful

theoretical measure of information content.

For example, consider the string x= (aaabbaaabbbb)N. The distribution of

characters in this string is uniform, and as such, H(x) = log(2) is maximized.

However, we could of course write a short program that generates this string

for a given N. Because such a program can be written in some programming

language, it is therefore computable, and can be simulated by U(y) = x, for

some y. Therefore, K(x)≤ |y|. While this statement may initially seem trivial,

it implies that K(