Content uploaded by Hykel Hosni
Author content
All content in this area was uploaded by Hykel Hosni on Feb 21, 2018
Content may be subject to copyright.
Content uploaded by Hykel Hosni
Author content
All content in this area was uploaded by Hykel Hosni on Oct 04, 2017
Content may be subject to copyright.
arXiv:1705.11186v1 [physics.soc-ph] 31 May 2017
FORECASTING IN LIGHT OF BIG DATA
HYKEL HOSNI AND ANGELO VULPIANI
Abstract. Predicting the future state of a system has always been a
natural motivation for science and practical applications. Such a topic,
beyond its obvious technical and societal relevance, is also interesting
from a conceptual point of view. This owes to the fact that forecasting
lends itself to two equally radical, yet opposite methodologies. A reduc-
tionist one, based on the first principles, and the na¨ıve-inductivist one,
based only on data. This latter view has recently gained some attention
in response to the availability of unprecedented amounts of data and
increasingly sophisticated algorithmic analytic techniques. The purpose
of this note is to assess critically the role of big data in reshaping the key
aspects of forecasting and in particular the claim that bigger data leads
to better predictions. Drawing on the representative example of weather
forecasts we argue that this is not generally the case. We conclude by
suggesting that a clever and context-dependent compromise between
modelling and quantitative analysis stands out as the best forecasting
strategy, as anticipated nearly a century ago by Richardson and von
Neumann.
Nothing is more practical than a good theory (L. Boltzmann)
1. Introduction and motivation
Uncertainty spans our lives and forecasting is how we cope with it, indi-
vidually, socially, institutionally, and scientifically. As a consequence, the
concept of forecast is an articulate one. Science, as a whole, moves forward
by making and testing forecasts. Political institutions make substantial use
of economic forecasting to devise their policies. Most of us rely on weather
forecasts to plan our daily activities. Thus, in forecasting, the boundaries
between the natural and the social sciences are often crossed, as well as the
boundaries between the scientific, technological and ethical domains.
This rather complex picture has been enriched significantly, over the past
few years, by the rapidly increasing availability of methods for collecting
and processing vast amounts of data. This revived a substantial interest in
purely inductive methods which are expected to serve the most disparate
Date: 30 May 2017.
1
2 HYKEL HOSNI AND ANGELO VULPIANI
needs, from commercial service to data-driven science. Data brokers sell
to third parties the digital footprints recorded by our internet activities or
credit card transactions. Those can be put to a number of different uses,
not all of them ethically neutral. For instance, aggressive forms of person-
alised marketing algorithms identify women who are likely to be pregnant
based on their internet activity, and similarly health related web searches
have been proved to influence individual credit scorings [26]. However,
data-intensive projects lie at the heart of extremely ambitious, cutting-
edge scientific enterprises, including the US “Brain Research through Inno-
vative Neurotechnologies” (http://www.braininitiative.nih.gov/) and
the African-Australian Square Kilometer Array, a radio telescope array con-
sisting of thousands receivers (http://skatelescope.org/).1
Those examples illustrate clearly that big data spans radically diverse do-
mains. This, together with its sodality with machine learning, has recently
been fuelling an all-encompassing enthusiasm, which is loosely rooted on
a twofold presupposition. First, the idea that big data will lead to much
better forecasts. Second, it will do so across the board, from scientific dis-
covery to medical, financial, commercial and political applications. It is
this enthusiasm which has recently led to making a case for the predictive-
analytics analogue of universal Turing machines, unblushingly referred to as
The Master Algorithm [12].
Based on this twofold presupposition, big data and predictive analytics are
expected to have a major impact in society, in technology, and all the way up
to the scientific method itself [21]. The extent to which those promises are
likely to be fulfilled is currently a matter for debate across a number of dis-
ciplines [1,15,22,17,3,23], while some early success stories rather quickly
turned into macroscopic failures [16]. This note adds to the methodological
debate by challenging both aspects of the presupposition for big data enthu-
siasm. First, more data may lead to worse predictions. Second, a suitably
specified context is crucial for forecasts to be scientifically meaningful. Both
points will be made with reference to a highly representative forecasting
problem: weather predictions.
The remainder of the paper is organised as follows. Section 2begins by
recalling that the very meaning of scientific prediction depends significantly
on an underlying theoretical context. Then we move on, in Section 3, to
challenging the na¨ıve inductivist view which goes hand in hand with big
data enthusiasm. In a rather elementary setting we illustrate the practical
1See, e.g. [2] for an appraisal of how, experiments of this kind, may lead to a paradigm
shift in the philosophy of science.
FORECASTING IN LIGHT OF BIG DATA 3
impossibility of inferring future behaviour from the past when the dimen-
sion of the problem is moderately large. Section 4develops this further by
emphasising that forecasts depend significantly on the modeller’s ability to
identify the proper level of description of the target system. To this end
we draw on the history of weather forecasting, where the early attempts at
arriving at a quantitative solution turned out to be unsuccessful precisely
because they took into account too much data. The representativeness of
the example suggests that this constitutes a serious challenge to the view
according to which big data could make do with the sole analysis of corre-
lations.
The main lesson can be put as follows: as anticipated nearly a century ago
by Richardson and von Neumann, a clever and context-dependent trade-off
between modelling and quantitative analysis stands out as the best strategy
for meaningful prediction. This flies in the face of the by now infamous
claim put forward in 2008 in Wired by its then editor C. Anderson “the
data deluge makes the scientific method obsolete”. In our experience aca-
demics have a tendency to roll their eyes when confronted with this, and
similar claims, and hasten to add that non-academic publications should
not be given so much credit. We believe otherwise. Indeed we think that
the importance of the cultural consequences of such claims is reason enough
for academics to take scientific and methodological issue against them, inde-
pendently of their publication venue. Whilst Anderson’s argument fails to
stand methodological scrutiny, as the present paper recalls, its key message
–big data enthusiasm– has clearly percolated society at large. This may
lead to very serious social and ethical shortcomings. For the combination
of statistical methods and machine learning techniques for predictive ana-
lytics is currently finding cavalier application in a number of very sensitive
intelligence and policing activities, as we now briefly recall.
This clearly illustrates that the scope of the epistemological problem tackled
by this note extends far beyond the scientific method and the academic silos.
1.1. From SKYNET to PredPol. Early in 2016 a debate took place on
alleged drone attacks in Pakistan. The controversial article by C. Grothoff
and J.M. Porup2opened as follows:
In 2014, the former director of both the CIA and NSA pro-
claimed that “we kill people based on metadata.” Now,
2http://arstechnica.co.uk/security/2016/02/the-nsas-skynet-program-may-be-killing-
thousands-of-innocent-people/
4 HYKEL HOSNI AND ANGELO VULPIANI
a new examination of previously published Snowden docu-
ments suggests that many of those people may have been
innocent.
Recall that SKYNET is the US National Security Agency’s programme
aimed at monitoring mobile phone networks in Pakistan. Leaked documents
[31] show that the primary goal of this programme is the identification of
potential affiliates to the Al Quaeda network. Further information suggests
that SKYNET builds on classification techniques, fed primarily on GSM
data drawn from the entire Pakistani population. This obviously puts the
classification method at high risk of overfitting, given, of course, that the
vast majority of the population is not linked to terrorist activities. Not
surprisingly then, the Snowden papers revealed a rather telling result of
the SKYNET sophisticated machine learning, which led to attach Ahmad
Zaidan, a bureau chief for Al-Jazeera in Islamabad, the highest probability
of being an Al Quaeda courier.
Two points are worth observing. First note, as some commentators have re-
ported [30], that the classification of Zaidan as strongly linked to Al Qaida
cannot be dismissed as utterly wrong. It of course all depends on what we
mean by “being linked”. As a journalist in the field he was certainly “linked”
to the organisation, and very much so if one counts the two interviews he
did with Osama Bin Laden. But of course, “being linked” with a terror
organisation may mean something entirely different, namely being actively
involved in the pursuit of their goal. This fundamental bit of contextual in-
formation is probably impossible to infer for a classification technique, even
the most accurate one. But SKYNET algorithms are far from it, which
brings us to the second noteworthy point. The leaked documents assess the
rate of false positives of the classification method used by SKYNET between
0.008% and 0.18%. Since the surveillance programme gathers data from a
population of 55 million people, this leads to up to 99 thousand Pakistani
who may have been wrongly labelled as “terrorists”. Whether or not this
actually led to deadly attacks through the “Find-Fix-Finish” strategy based
on Predator drones, this example illustrates the shortcomings of the uni-
versality of the combination of big data and machine learning. For if the
SKYNET programme was about detecting unsolicited emails, rather than
potential terror suspects, the false positive rate of 0.008% would be consid-
ered exceptionally good. It is far from it, if it may lead to causing highly
defamatory accusations, if not outright death to thousands of innocent peo-
ple. The observation to the effect that terrorists identification and spam
detection are completely different problems, with incomparable social, legal,
FORECASTING IN LIGHT OF BIG DATA 5
and ethical implications, though apparently trivial, may easily be overlooked
as a consequence of the big data enthusiasm.
On a less spectacular, but no less worrying scale, this can be seen to feed
the increasing excitement for predictive policing. Police departments in the
United States and in Europe have been recently purchasing commercially
available software to predict crimes. California based PredPol3is widely
used across the country and by some police departments in the United King-
dom. The New York Times reports4that Coplogic5has contracts with 5,000
police departments in the US. Keycrime6is a Milan based firm which has
been recently contracted by the Italian police. This list can be prolonged.
Predictive policing’s main selling point is of course expense reduction. If we
can predict where the next crime is going to be committed, we can optimise
patrolling. Being more precise requires less resources, less taxpayers money,
and it delivers surgical results. But context is once again neglected. When
introducing the methods and techniques underlying predictive policing the
authors of the 190 pages strong RAND report [27] on the subject note that
These analytical tools, and the IT that supports them, are
largely developed by and for the commercial world.
This, we believe, suffices to illustrate the relevance and urgency of a matter
which we now move on to discuss in greater generality. To this end we shall
begin by recalling a seemingly obvious, and yet surprisingly often overlooked,
feature of the forecasting problem, namely that not all forecasts are equal.
2. On forecasting
Laplace grasped rather clearly one important feature of how probability and
uncertainty relate to information when he pointed out that probability de-
pends partly on our knowledge and partly on our ignorance. What we do
know clearly affects our understanding of what we don’t know and, conse-
quently, our ability to estimate its probability.
It cannot be surprising then, that the meaning of scientific prediction or
forecasts changes with the growth of science. In [25], for instance, it is
suggested that one can get a clearer understanding of what physics is by
being specific about the accepted meaning of physical predictions.
3http://www.predpol.com/
4The Risk to Civil Liberties of Fighting Crime With Big Data, 6 November 2016
5http://www.coplogic.com/
6http://www.keycrime.com/
6 HYKEL HOSNI AND ANGELO VULPIANI
The origins of the very concept of scientific forecast can in fact be traced
back to the beginning of modern physics. The paradigmatic example being
classical mechanics – the deterministic world in which (for a limited class
of phenomena) one can submit definite Yes/No predictions to experimental
testing. A major conceptual revolution took place in the mid 1800s with
the introduction of probabilistic prediction, a notion which in the intervening
two centuries has taken three distinct interpretations. The first relates to
the introduction of statistical mechanics, and is indeed responsible for intro-
ducing a novel, stochastic, view of the laws of nature. The second started
at the beginning of the 1900s with the discovery of quantum mechanics.
The third, which is coming of age, relates to the investigation of complex
systems. It also observed in [25] that this development of the meaning of
scientific forecasting amounted to its progressive weakening. Whilst the con-
cept of stochastic prediction in statistical mechanics is clearly weaker than
the Yes/No prediction of the next solar eclipse, it can be regarded as being
stronger than predictions about complex systems which may involve prob-
ability intervals. The upside of increasingly weaker notions of forecasts is
the extension of the applicability of physics to a wider set of problem. The
downside is the lack of precision.
It is interesting to note that the first major shift in perspective – from the
binary forecasts of classical mechanics to the probabilistic ones of statistical
mechanics – can be motivated from an informational point of view. To
illustrate, we borrow from a classic presentation of Ergodic Theory [13],
in which a gas with kmolecules contained in a three-dimensional box is
considered. Since particles can move in any direction of the (Euclidean)
space, we are looking at a system with n= 3kdegrees of freedom. Assuming
complete information about the molecules’ masses and the forces they exert,
the instantaneous state of the system can be described fully –at least in
principle– by fixing nspatial coordinates and the ncorresponding velocities,
i.e. by picking a point in 2n−dimensional Euclidean space. We are now
interested in looking at how the system evolves in time according to some
underlying physical law. In practice though, the information we do possess
is seldom enough to determine the answer.
[This led Gibbs to] abandon the deterministic study of one
state (i.e., one point in phase space) in favor of a statistical
study of an ensemble of states (i.e., a subset of phase space).
Instead of asking “what will the state of the system be at
time t?”, we should ask “what is the probability that at time
tthe state of the system will belong to a specified subset of
phase space?”.[13]
FORECASTING IN LIGHT OF BIG DATA 7
This (to our present lights) very natural observation led to enormous conse-
quences. So it is likewise natural to ask, today, whether the present ability
to acquire, store, and analyse unprecedented amounts of data may lead the
concept of forecasts to the next level.
In what follows we address this question in an elementary setting. In par-
ticular we ask whether using our knowledge of the past states of a system
– and without the use of models for the evolution equation – meaningful
predictions about the future are possible. Our answer is negative to the
extent that rather severe difficulties are immediately found, even in a very
abstract and simplified situation. As we shall point out the most difficult
challenge to this view is understanding of the “proper level” of abstraction
of the system. This is apparent in the paramount case of weather forecasting
discussed in Section 4. We will see there that the key to underestanding the
“proper level” of abstraction lies with identifying the “relevant variables”
and the effective equations which rule their time evolution. It is important
to stress that the procedure of building such a description does not follow
a fixed protocol, applicable in all contexts given that certain conditions are
met. It should rather be considered as a sort of art, based on the intuition
and the experience of the researcher.
3. An extreme inductivist approach to forecasting using Big
Data
According to a vaguely defined yet rather commonly held view [15]big data
may lead to dispense with theory, modelling or even hypothesising. All of
this would be encompassed, across domains, by smart enough machine learn-
ing algoritms operating on large enough data sets. This extreme inductivist
conception of forecasts is thought of as depending solely on data. Is this
providing us with a new meaning of predictions, and indeed one which will
outdate scientific modelling as we currently understand it?
Two hypotheses, which are seldom made explicitly, are needed to articulate
an affirmative answer:
(1) Similar premisses lead to similar conclusions (Analogy);
(2) Systems which exhibit a certain behaviour, will continue doing so
(Determinism).
Note that both assumptions are clearly at work in the very idea of predic-
tive policing recalled above. For predicting who is going to commit the next
crime and where this is going to happen, requires one to think of the dis-
position to commit crimes as a persistent feature of certain people, who in
8 HYKEL HOSNI AND ANGELO VULPIANI
turn, tend to conform to certain specific features. Those analogies and the
deterministic character of the ‘disposition to commit crimes’ are very prone
to mistake correlation with causation. Racial profiling is the most obvious,
but certainly not the sole ethical concern which is being currently raised
in connection with the first performance assessments of predictive policing
[32].
Let us go back to our key point by noting that Analogy and Determinism
have been long debated in connection to forecasting and scientific prediction.
If a system behaves in a certain way, it will do so again seems a rather
natural claim, but, as pointed out by Maxwell7it is not such an obvious
assumption after all.
It is a metaphysical doctrine that from the same antecedents
follow the same consequents. [...] But it is not of much use in
a world like this, in which the same antecedents never again
concur, and nothing ever happens twice. [...] The physical
axiom which has a somewhat similar aspect is “That from
like antecedents follow like consequents”.
In his Essai philosophique sur les probabilit´es Laplace argued that analogy
and induction, along with a “happy tact”, provide the principal means for
“approaching certainty” in situations in which the probabilities involved are
“impossible to submit to calculus”. Laplace then hastened to warn the
reader against the subtleties of reasoning by induction and the difficulties
of pinning down the right “similarity” between causes and effects which is
required for the sound application of analogical reasoning.
More recently de Finetti sought to redo the foundations of probability by
challenging the very idea of repeated events, which constitutes the starting
point of frequentists approaches a la von Mises, a view which is not central
to Kolmogorov’s axiomatisation, but for which the Russian voiced some
sympathy. In a vein rather similar to that of Maxwell’s, de Finetti argued
extensively [10,11] that thinking of events as “repeatable” is a modelling
assumption. If the modeller thinks that two events are in fact instances of
the same phenomenon, she/he should state that as a subjective and explicit
assumption.
This assumption is certainly not mentioned in the extreme inductivist big
data narrative, which advocates an approach to forecasting which uses just
7Quoted in Lewis Campbell and William Garnett, The Life of James Clerk Maxwell,
Macmillan, London (1882); reprinted by Johnson Reprint, New York (1969), p. 440.
FORECASTING IN LIGHT OF BIG DATA 9
knowledge of the past, without the aid of theory. Let us then turn our atten-
tion to this view, and frame the question in the simplest possible terms. We
are interested in forecasts such that future states of a systems are predicted
solely on the basis of known past states. If this turns out to be problematic
in a highly abstract situation, then it can hardly be expected to work in
contexts marred by high model-uncertainty, like the ones of interest for big
data applications.
Basically [34], one looks for a past state of the system “near” to the present
one: if it can be found at day k, then it makes sense to assume that tomorrow
the system will be “near” to day k+ 1. In more formal terms, given the
series {x1, ..., xM}where xjis the vector describing the state of the system
at time j∆t, we look in the past for an analogous state, that is a vector xk
with k < M “near enough” (i.e. such that |xk−xM|< ǫ, being ǫthe desired
degree of accuracy). Once we find such a vector, we “predict” the future
at times M+n > M by simply assuming for xM+nthe state xk+n. It all
seems quite easy, but it is not at all obvious that an analog can be found.
The problem of finding an analog is strictly linked to the celebrated Poincar´e
recurrence theorem8: after a suitable time, a deterministic system with a
bounded phase space returns to a state near to its initial condition [28,14].
Thus an analog surely exists, but how long do we have to go back to find
it? The answer has been given by the Polish mathematician Mark Kac who
proved a Lemma [14] to the effect that the average return time in a region
Ais proportional to the inverse of the probability P(A) that the system is
in A.
To understand how hard it is to observe a recurrence, and hence to find an
analog, consider a system of dimension D.9The probability P(A) of being
in a region Athat extends in every direction by a fraction ǫis proportional
to ǫD, therefore the mean recurrence time is O(ǫ−D) . If Dis large (say,
larger than 10), even for not very high levels of precision (for instance, 5%,
8In its original version the Poincar´e recurrence theorem states that:
Given a Hamiltonian system with a bounded phase space Γ, and a set
A∈Γ, all the trajectories starting from x∈Awill return back to A
after some time repeatedly and infinitely many times, except for some
of them in a set of zero probability.
Actually, though this is seldom stressed in elementary courses, the theorem can be easily
extended to dissipative ergodic systems provided one only considers initial conditions on
the attractor, and “zero probability” is interpreted with respect to the invariant probability
on the attractor [6].
9To be precise, if the system is dissipative, Dis the fractal dimension DAof the
attractor [4].
10 HYKEL HOSNI AND ANGELO VULPIANI
10-5
10-4
10-3
10-2
10-1
100
101102103104105106107108
εmin(M)/εmax
M
N=20
N=21
M-1/DA
Figure 1. The relative precision of the best analog as
function of the size Mof the sequence. The data have been
numerically obtained[4] from a simplified climatic model in-
troduced by Lorenz, with two different choices of the param-
eters, see [18]; the vector xis in RNwith N= 20 and N= 21.
that is ǫ= 0.05), the return time is so large that in practice a recurrence is
never observed.
That is to say that the required analog, whose existence is guaranteed in the-
ory, sometimes cannot be expected to be found in practice, even if complete
and precise information about the system is available to us.
Fig. 1shows how even for moderately large values of the fractal dimension
of the attractor DA, a good analog can be obtained only in time series with
enormous length. If DAis small (in the example DA≃3.1) for an analog
with a precision 1% a sequence of length O(102) is enough; on the contrary
for DA≃6.6 we need a very large sequence, at least O(109).
In addition usually we do not know the vector xdescribing the state of the
system. Such rather serious difficulty is well known in statistical physics; it
has been stressed e.g. by Onsager and Machlup [24] in their seminal work
on fluctuations and irreversible processes, with the caveat: how do you know
you have taken enough variables, for it to be Markovian?; and by Ma [20]: the
hidden worry of thermodynamics is: we do not know how many coordinates
or forces are necessary to completely specify an equilibrium state.
Takens [33] gave an important contribution to such a topic: he showed
that from the study of a time series {u1, .., uM}, where ujis an observable
sampled at the discrete times j∆t, it is possible (if we know that the system
is deterministic and is described by a finite dimensional vector, and Mis
FORECASTING IN LIGHT OF BIG DATA 11
large enough) to determine the proper variable x. Unfortunately, at practical
level, the method has rather severe limitations:
a) It works only if we know a priori that the system is deterministic;
b) The protocol fails if the dimension of the attractor is large enough
(say more than 5 or 6).
Once again Kac’s lemma sheds light on the key difficulty encountered here:
the minimum size of the time size Mallowing for the use of Taken’s approach
increases as CMwith C=O(100) [34,4]. Therefore this method cannot
be used, apart for special cases (with a small dimension), to build up a
model from the data. All extreme inductivist approaches will have to come
to terms with this fundamental fact. One of the few success of the method
of the analogs is the tidal prediction from past history. This in spite of the
fact that tides are chaotic; the reason is the low number of effective degrees
of freedom involved [4].
4. Weather forecasting: the mother of all approaches to
prediction
Weather forecasts provide a very good illustration of some central aspects of
predictive models. Not last because of the extreme accuracy which this field
managed to achieve over the past decades. And yet this accuracy could be
attained only when it became clear that too much data would be detrimental
to the accuracy of the model. Indeed, as we now briefly review, in the early
days weather forecasts featured a naive form of inductivism not dissimilar
to the one fuelling the big data enthusiasm.
Let us stress that the main limit to predictions based on analogs is not the
sensitivity to initial conditions, typical of chaos. But, as realized by Lorenz
[4], the main issue is actually to find good analogs.
The first modern steps in weather forecasting are due to Richardson [29,19]
who, in his visionary work, introduced many of the ideas on which modern
meteorology is based. His approach was, to a certain extent, in line with
genuine reductionism, and may be summarised as follows: the atmosphere
evolves according to the hydrodynamic (and thermodynamics) equations for
the velocity, the density, and so on. Therefore, future weather can be pre-
dicted, in principle at least, by solving the proper partial differential equa-
tions, with initial conditions given by the present state of the atmosphere.
The key idea by Richardson to forecast the weather was correct, but in order
to put it in practice it was necessary to introduce one further ingredient that
he could not possibly have known [5]. After few decades von Neumann and
12 HYKEL HOSNI AND ANGELO VULPIANI
Charney noticed that the equations originally proposed by Richardson, even
though correct, are not suitable for weather forecasting [19,9]. The appar-
ently paradoxical reason is that they are too accurate: they also describe
high-frequency wave motions that are irrelevant for meteorology. So it is
necessary to construct effective equations that get rid of the fast variables.
The effective equations have great practical advantages, e.g. it is possible
to adopt large integration time steps making the numerical computations
satisfactorily efficient. Even more importantly, they are able to capture the
essence of the phenomena of interest, which could otherwise be hidden in
too detailed a description, as in the case of the complete set of original
equations. It is important to stress that the effective equations are not
mere approximations of the original equations, and they are obtained with
a subtle mixture of hypotheses, theory and observations [9,5].
5. Concluding remarks
The above argument shows that in weather forecasting the accuracy of pre-
diction need not be monotonic with the sheer amount of data. Indeed,
beyond a certain point the opposite is true. This, in our opinion, is a se-
rious methodological objection to the piecemeal big data entusiasm. Given
its representativeness among all forecasting methods, the conclusions drawn
with respect to predicting the weather are far reaching, and help unifying a
number of observations that have been recently put forward along the same
lines.
In many sciences and in engineering, an ever increasing gap between theory
and experiment can be observed. This gap tends to widen particularly in
the presence of complex features in natural systems science [25]. In socio-
economical systems the gap between data and our scientific ability to ac-
tually understanding them is typically enormous. Surely the availability of
huge amounts of data, sophisticated methods for its retrieval and unprece-
dented computational power available for its analysis will undoubtedly help
moving science and technology forward. But in spite of a persistent emphasis
on a fourth paradigm (beyond the traditional ones, i.e. experiment, theory
and computation) based only on data, there is as yet no evidence data alone
can bring about scientifically meaningful advance. To the contrary, as nicely
illustrated by Crutchfield [8], up to now it seems that the unique way to un-
derstand some non trivial scientific or technological problem, is following the
traditional approach based on a clever combination of data, theory (and/or
computations), intuition and wise use of previous knowledge. Similar con-
clusions have been reached in the computational biosciences. The authors
FORECASTING IN LIGHT OF BIG DATA 13
of [7] point out very clearly not only the methodological shortcomings (and
ineffectiveness) of relying on data alone, but also unfold the implications
of methodologically unwarranted big data enthusiasm for the allocation of
research funds to healthcare related projects: “A substantial portion of fund-
ing used to gather and process data should be diverted towards efforts to
discern the laws of biology”.
Big data undoubtedly constitute a great opportunity for scientific and tech-
nological advance, with a potential for considerable socio-economic impact.
To make the most of it, however, the ensuing developments at the interface
of statistics, machine learning and artificial intelligence, must be coupled
with adequate methodological foundations. Not least because of the serious
ethical, legal and more generally societal consequence of the possible misuses
of this technology. This note contributed to elucidating the terms of this
problem by focussing on the potential for big data to reshape our current
understanding of forecasting. To this end we pointed out, in a very ele-
mentary setting, some serious problems that the na¨ıve inductivist approach
to forecast must face: the idea according to which reliable predictions can
be obtained solely on the grounds of our knowledge of the past faces insur-
mountable problems – even in the most idealised and controlled modelling
setting.
Chaos is often considered the main limiting factor to predictability in de-
terministic systems. However this is an unavoidable difficulty as long as
the evolution laws of the system under consideration are known. On the
contrary, if the information on the system evolution is based only on ob-
servational data, the bottleneck lies in Poincar´e recurrences which, in turn,
depend on the number of effective degrees of freedom involved. Indeed,
even in the most optimistic conditions, if the state vector of the system
were known with arbitrary precision, the amount of data necessary to make
the meaningful predictions would grow exponentially with the effective num-
ber of degrees of freedom, independently of the presence of chaos. However,
when, as for tidal predictions, the number of degrees of freedom associated
with the scales of interest is relatively small, the future can be successfully
predicted from past history. In addition, in absence of a theory, a purely
inductive modelling methodology can only be based on times series and the
method on the analogs, with the already discussed difficulties [34].
We therefore conclude that the big data revolution is by all means a welcome
one for the new opportunities it opens. However the role of modelling cannot
be discounted: not only larger datasets, but also the lack of an appropriate
level of description [9,5] may make useful forecasting practically impossible.
14 HYKEL HOSNI AND ANGELO VULPIANI
References
[1] C. S. Calude and G. Longo. The Deluge of Spurious Correlations in Big Data. Foun-
dations of Science, 21, 1–18, 2016.
[2] D. Casacuberta and J. Vallverd´u. E-science and the data deluge. Philosophical Psy-
chology, 27(1), 126–140, 2014.
[3] S. Canali. Big Data, epistemology and causality: Knowledge in and knowledge out in
EXPOsOMICS. Big Data & Society, 3(2):1–11, 2016.
[4] F. Cecconi, M. Cencini, M. Falcioni, and A. Vulpiani The prediction of future from
the past: an old problem from a modern perspective American Journal of Physics 80(11),
1001-1008, 2012.
[5] S. Chibbaro, L. Rondoni, and A. Vulpiani Reductionism, Emergence and Levels of
Reality Springer-Verlag, Berlin, (2014)
[6] P. Collet, and J.-P. Eckmann Concepts and Results in Chaotic Dynamics: A Short
Course Springer-Verlag, Berlin, (2006)
[7] P. V. Coveney, E. R. Dougherty, and R.R. Highfield, Big data need big theory too Philo-
sophical Transactions of the Royal Society A: Mathematical, Physical and Engineering
Sciences, 280, 374, 1-11, 2016.
[8] J.P. Crutchfield The dreams of theory Wiley Interdisciplinary Reviews: Computational
Statistics 6, 75-79, 2014.
[9] A. Dahan Dalmedico History and epistemology of models: meteorology as a case study
Archive for the History of Exact Sciences 55, 395-422, 2001
[10] B. de Finetti. Theory of Probability, Vol 1. John Wiley and Sons, New York, 1974.
[11] B. de Finetti. Philosophical lectures on probability. Ed. A. Mura, Translated by H.
Hosni, Springer Verlag, Berlin, 2008.
[12] P. Domingos. The Master Algorithm: How the Quest for the Ultimate Learning Ma-
chine Will Remake Our World. Basic Books, New York, 2015.
[13] P.R. Halmos. Lectures on Ergodic Theory . Chelsea Publishing, London, 1956
[14] M. Kac On the notion of recurrence in discrete stochastic processes B ullettin of the
American Mathematical Society. 53, 1002–1010 1947.
[15] R. Kitchin. Big Data, new epistemologies and shifts. Big Data & Society, 1:1–12,
2014.
[16] D. Lazer, R. Kennedy, G. King, and A. Vespignani. The Parable of Google Flu: Traps
in Big Data Analysis. Science, 343(6167), 1203–1205, 2014.
[17] S. Leonelli. Data-Centric Biology: A Philosophical Study. Chicago University Press,
Chicago, 2016.
[18] E. N. Lorenz, Predictability- A problem partly solved in Proc. Seminar on Predictabil-
ity (ECMWF, Reading, UK, 1996), pp. 1–18.
[19] P. Lynch The Emergence of Numerical Weather Prediction: Richardson’s Dream
Cambridge University Press, Cambridge, (2006)
[20] S. K. Ma Statistical Mechanics World Scientific, Singapore, (1985).
[21] V. Mayer-Sch¨onberger and K. Cukier. Big Data: A Revolution That Wil l Transform
How We Live, Work, and Think. Houghton Mifflin, New York, (2013) 2013.
[22] E. Nowotny. The Cunning of Uncertainty. Polity, London, (2016).
[23] M. Nural, M. E. Cotterell, and J. Miller. Using Semantics in Predictive Big Data An-
alytics. Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress
2015, pages 254–261, 2015.
FORECASTING IN LIGHT OF BIG DATA 15
[24] L. Onsager, and S. Machlup, Fluctuations and irreversible processes Physical Review
91, 1505-1512, 1953.
[25] G. Parisi. Complex Systems: a Physicist’s Viewpoint. Physica A, 263:557–564, 1999.
[26] F. Pasquale. The Black Box Society, volume 36. Harvard University Press, Harvard,
2015.
[27] W. L Perry, B. McInnes, C. C Price, S. C Smith, and J. S Hollywood. Predictive
Policing: The Role of Crime Forecasting in Law Enforcement Operations. RAND Cor-
poration, Santa Monica, 2013.
[28] H. Poincar´e. Sur le probl`eme des trois corps et les ´equations de la dynamique, Acta
Mathematica 13, 1–270, 1890.
[29] L. F. Richardson. Weather Prediction by Numerical Methods Cambridge University
Press, Cambridge (1922)
[30] M. Robbins. Has a rampaging AI algorithm really killed
thousands in Pakistan? The Guardian 18 February 2016
http://www.theguardian.com/science/the-lay- scientist/2016/feb/18/has-a-rampaging-ai-algorithm-really-ki
[31] SKYNET: Applying Advanced Cloud-based Behavior Analytics. The Intercept, 8 May
2005. https://theintercept.com/document/2015/05/08/skynet-applying-advanced-cloud- based-behavior-analytics
[32] J. Saunders, P. Hunt, and J.S. Hollywood. Predictions put into practice: a quasi-
experimental evaluation of Chicago’s predictive policing pilot. Journal of Experimental
Criminology, 12, 1–25, 2016.
[33] F. Takens Detecting strange attractors in turbulence In: D. Rand, L.-S. Young (Ed.s),
Dynamical Systems and Turbulence, Lecture Notes in Mathematics, 898, 366–381, 1981.
[34] A. S. Weigend, and N. A. Gershenfeld (Ed.s) Time Series Prediction: Forecasting the
Future and Understanding the Past Addison-Wesley, Reading (1994).
(H.H.) Dipartimento di Filosofia, Universit`
a degli Studi di Milano, and, (A.V.)
Dipartimento di Fisica, Universit`
a degli Studi di Roma Sapienza and Cen-
tro Linceo Inderdisciplinare “Beniamino Segre”, Accademia dei Lincei, Roma
(Italy).
E-mail address:hykel.hosni@unimi.it; Angelo.Vulpiani@roma1.infn.it
A preview of this full-text is provided by Springer Nature.
Content available from Philosophy & Technology
This content is subject to copyright. Terms and conditions apply.