ArticlePDF Available

Democratizing earthquake predictability research: introducing the RichterX platform

Authors:

Abstract and Figures

Predictability of earthquakes has been vigorously debated in the last decades with the dominant -albeit contested -view being that earthquakes are inherently unpredictable. The absence of a framework to rigorously evaluate earthquake predictions has led to prediction efforts being viewed with scepticism. Consequently, funding for earthquake prediction has dried out and the community has shifted its focus towards earthquake forecasting. The field has benefited from collaborative efforts to organize prospective earthquake forecasting contests by introducing protocols, model formats and rigorous tests. However, these regulations have also created a barrier to entry. Methods that do not share the assumptions of the testing protocols, or whose outputs are not compatible with the contest format, can not be accommodated. In addition, the results of the contests are communicated via a suite of consistency and pair-wise tests that are often difficult to interpret for those not well versed in statistical inference. Due to these limiting factors, while scientific output in earthquake seismology has been on the rise, participation in such earthquake forecasting contests has remained rather limited. In order to revive earthquake predictability research and encourage wide-scale participation, here we introduce a global earthquake prediction platform by the name RichterX. The platform allows for testing of any earthquake prediction in a user-defined magnitude, space, time window anywhere on the globe. Predictions are assigned a reference probability based on a rigorously tested real-time global statistical earthquake forecasting model. In this way, we are able to accommodate methods issuing alarm based predictions as well as probabilistic earthquake forecasting models. We formulate two metrics to evaluate the participants’ predictive skill and demonstrate their consistency through synthetic tests.
Content may be subject to copyright.
Eur. Phys. J. Special Topics 230, 451–471 (2021)
c
EDP Sciences, Springer-Verlag GmbH Germany,
part of Springer Nature, 2021
https://doi.org/10.1140/epjst/e2020-000260-2
THE EUROPEAN
PHYSICAL JOURNAL
SPECIAL TOPICS
Regular Article
Democratizing earthquake predictability
research: introducing the RichterX platform
Yavor Kamer1,a, Shyam Nandan1,2, Guy Ouillon3, Stefan Hiemer1, and
Didier Sornette4,5
1RichterX.com, Mittelweg 8, Langen 63225, Germany
2Windeggstrasse, 5, 8953 Dietikon, Zurich, Switzerland
3Lithophyse, 4 rue de l’Ancien S´enat, 06300 Nice, France
4ETH Zurich, Department of Management, Technology and Economics, Scheuchzerstrasse
7, 8092 Zurich, Switzerland
5Institute of Risk Analysis, Prediction and Management (Risks-X), Academy for Advanced
Interdisciplinary Studies, Southern University of Science and Technology (SUSTech),
Shenzhen, P.R. China
Received 5 October 2020 / Accepted 7 October 2020
Published online 19 January 2021
Abstract. Predictability of earthquakes has been vigorously debated in
the last decades with the dominant -albeit contested -view being that
earthquakes are inherently unpredictable. The absence of a framework
to rigorously evaluate earthquake predictions has led to prediction ef-
forts being viewed with scepticism. Consequently, funding for earth-
quake prediction has dried out and the community has shifted its focus
towards earthquake forecasting. The field has benefited from collabo-
rative efforts to organize prospective earthquake forecasting contests
by introducing protocols, model formats and rigorous tests. However,
these regulations have also created a barrier to entry. Methods that do
not share the assumptions of the testing protocols, or whose outputs
are not compatible with the contest format, can not be accommodated.
In addition, the results of the contests are communicated via a suite
of consistency and pair-wise tests that are often difficult to interpret
for those not well versed in statistical inference. Due to these limit-
ing factors, while scientific output in earthquake seismology has been
on the rise, participation in such earthquake forecasting contests has
remained rather limited. In order to revive earthquake predictability
research and encourage wide-scale participation, here we introduce a
global earthquake prediction platform by the name RichterX. The plat-
form allows for testing of any earthquake prediction in a user-defined
magnitude, space, time window anywhere on the globe. Predictions
are assigned a reference probability based on a rigorously tested real-
time global statistical earthquake forecasting model. In this way, we
are able to accommodate methods issuing alarm based predictions as
well as probabilistic earthquake forecasting models. We formulate two
metrics to evaluate the participants’ predictive skill and demonstrate
their consistency through synthetic tests.
ae-mail: yaver.kamer@gmail.com
452 The European Physical Journal Special Topics
1 Introduction
Earthquake prediction is a hard problem, which has remained an elusive holy grail
of seismology. Unfortunately, the current incentive structures are pushing researchers
away from hard problems where results are rarely positive. Negative results are less
likely to lead to a publication or a citation. While the utility of these quantities
is being put into question [8], they are still widely used as performance criteria in
academia.
To avoid negative results and public reactions associated with failed earthquake
predictions, the seismological community has mainly shifted its focus to descriptive
case-studies, long-term probabilistic hazard analysis, and probabilistic forecasting
experiments. However, by not engaging the prediction problem, we have effectively
left it to be exploited by less reputable actors. These actors often emerge during
times of crisis, spreading disinformation leading to public anxiety. As a result, it
has become common to view any sort of prediction effort with suspicion and often
negative prejudice, forgetting that the scientific principle requires hypotheses to be
tested rather than disregarded due to prior held beliefs. It can be argued that many
of these prediction claims are not formulated as falsifiable hypotheses, yet it is our
duty as scientists to assist those interested by providing guidelines, protocols, and
platforms facilitating the scientific method.
To help revive the earthquake prediction effort and bring scientific rigor to the
field, here we propose a general platform to facilitate the process of issuing and eval-
uating earthquake predictions. The platform is general as it allows for the testing of
both deterministic alarm based predictions and probabilistic forecasts. The common
metrics proposed to evaluate the respective skills of these two different model classes
will put methods relying on different theories, physical mechanisms and datasets on
the same footing. In this way, we aim to achieve larger participation, to facilitate the
inclusion of different methods from various fields, and to foster collaborative learning
through regular feedback.
The paper is structured as follows. First, we introduce the general requirements
that a public earthquake prediction platform must satisfy and briefly explain how
RichterX addresses these. Second, we describe the implementation of our global real-
time earthquake forecasting model that the RichterX platform uses to inform the
public of short term earthquake probabilities, and that is also taken as a reference
to evaluate all submitted predictions. Next, we introduce two complementary perfor-
mance metrics that allow us to assess the predictive performance of the participants.
Finally, we conduct synthetic contests with known ground truth and candidate mod-
els to test the consistency of the proposed metrics.
2 Characteristics of the earthquake prediction platform RichterX
Participation: Considering that earthquakes have a huge global impact, previous
and current forecasting experiments have reached out to only a small number of par-
ticipants [41,42]. Our platform aims to attract broader participation, not only from
the relatively small seismology community in academia but also other scientific disci-
plines including fields like machine learning, pattern recognition, data mining, remote
sensing, etc., active in information technologies and engineering applications. The
platform also encourages and rewards public participation to increase public aware-
ness of the earthquake hazard, motivate prediction efforts, and, more importantly,
allow citizens to participate in a scientific challenge. Previous prediction experiments
have been criticized in this regard because they have treated the public as mere sub-
jects of a scientific experiment and sometimes as means to higher ends (i.e increased
The Global Earthquake Forecasting System 453
Fig. 1. The RichterX platform accessible at www.richterX.com, as viewed on a mobile
phone. (1) Forecast screen (a) Map colors indicate the monthly M5 earthquake count; black
circle represents the target area of the prediction; pop-up message reports the probability
according to the RichterX model; (b) Three tabs with a slider for adjusting the radius, time
duration and minimum magnitude of the prediction; (c) Toggle button to switch between
probabilities to-occur and not-to-occur; the number of events to-occur can be specified via
the up/down arrows; (d) Summary of the RichterX forecast in human-readable format. (2)
Forecast at the same location with radius reduced from 300 km to 100 km. (3) Forecast
screen with the Predict toggle on: (a) Slider for setting the prediction stake. (4) Prediction
preview screen showing a summary and the round at which the prediction closes.
public awareness) [43]. Participating in a global earthquake prediction contest will
allow the public to gain hands-on insight, internalize the current achievements and
difficulty of the problem. To achieve this, we have built a minimal graphical user
interface, compatible with desktop devices as well as most mobile phones, allowing
anyone to participate (see Fig. 1). We also provide an application programming inter-
face (API), allowing more sophisticated participants to submit predictions using a
computer algorithm. Moreover, we see the language barrier as one of the main fac-
tors hindering participation. We will, therefore, make the platform and the relevant
publications available in multiple languages.
Privacy: The negative connotation associated with failed predictions is an important
factor hampering prediction efforts. The RichterX platform provides the participants
with the option to anonymize their identity, allowing them to focus on the scientific
question instead of worrying about the possible loss of reputation.
Transparency: Results and conclusions of any forecasting or prediction contest must
be accessible to the general public. The results of previous forecasting contests such as
CSEP and RELM have been published, but these papers are often behind paywalls.
The CSEP public website containing results of several models and tests, although
rather technical and not very intuitive for the general public, has since gone offline. In
our view, transparency and ease of access to contest results, reinforce responsibility
and accountability.
Assuming that science is conducted to enhance public utility, the public is entitled
to know of its progression, which entails not only successes but also failures. Thus,
we are committed to making the results openly available to the public. In addition to
results about each earthquake (whether it was predicted or not), metrics regarding
the overall performance of each participant are updated on an hourly basis and in
454 The European Physical Journal Special Topics
the form of regularly issued public reports. The provision of this information will
counter false prediction allegations, serve as a verifiable track record, and allow the
public to distinguish between one-time guesses and skilled predictions.
Global coverage: Earthquakes do not occur randomly in space, but cluster on tec-
tonic features such as subduction zones, continental plate boundaries, volcanic regions
and intraplate faults. These active features span across the whole globe and produce
large earthquakes continuously. Previous forecasting experiments have focused mainly
on regions with very good instrumental coverage, available only in a small number
of countries (USA, Japan, New Zealand, Iceland, Italy, etc) [42]. Our goal is to fos-
ter a worldwide earthquake prediction effort by providing the community and the
public with a global reference earthquake forecasting model. With the help of such
a reference model, our platform will be able to accommodate any regional model
by evaluating it against the same global baseline, putting regional added value in a
global perspective.
Real-time updates: Temporal clustering is another main feature of earthquake
occurrence: the probability that an earthquake will occur in a given space window
can vary greatly in time. Thus, if a prediction is to be evaluated according to a
reference model probability, such a reference model should be updated in near real-
time as soon as a new event occurs. Together with global coverage, this requirement
poses serious computational demands that have hindered the implementation of such
models. Recent advances in the field of statistical earthquake modeling [17,24,33,34]
have allowed us to undertake this challenge. Having secured the computational and
hosting capabilities, the RichterX platform is able to provide the global community
with worldwide earthquake forecasts updated on an hourly basis.
Active learning: The reference model provided on the RichterX platform aims to
reflect the current state-of-art in statistical seismology. Hence it is not set in stone
but is subject to further improvements as the participants, through successful predic-
tions, effectively highlight regions and time frames where the model performance is
lacking. In this way, the participants serve as referees continuously peer-reviewing the
reference model, which thereby is permanently improving, providing the community
with a higher bar to surpass.
Feedback mechanism: The goal of our prediction experiment is to provide the par-
ticipants with meaningful feedbacks, allowing them to test their hypotheses, models,
assumptions, and auxiliary data. Through repeated iteration of submitting predic-
tions, testing, and receiving feedback, we expect the participants to improve their
prediction performance. For the public observers, the results should be presented
transparently and succinctly, allowing for an intuitive comparison of the participants’
performances. Therefore, we have developed a skill assessment scheme that is both
easy to understand for the public and powerful in distinguishing between different
predictive skills. It is important to note that the participants may lose interest if the
experiment takes too long to deliver results. The provided feedback may also lose its
relevance if not provided in a timely fashion. The RichterX platform issues results
on a bi-weekly basis and cumulative metrics spanning longer durations. In contrast,
consider that previous earthquake forecasting experiments by CSEP were carried out
for 5 years [27], with preliminary results being released only after 2.5 years [41].
Incentives: We hope that the opportunity to easily test and compare different
hypotheses and models on a global scale would provide enough stimulus for the
academic community. At the same time, it is important to recognize that science can
be costly. Apart from the devoted time, many published studies are behind paywalls,
data processing requires expensive hardware, and some data sources can be subject
to fees. Thus, we believe amateur scientists, students, and the general public can
The Global Earthquake Forecasting System 455
be incentivized to participate by providing rewards, with “scientific microfundings”
similar to microcredits in the business field. These can be monetary or in the form
of technical equipment or journal subscriptions. Some studies have raised concerns
that improper use of monetary rewards can reduce the intrinsic motivation of the
participants [28]. However, recent studies have shown that financial rewards have a
positive effect on engagement and satisfaction [4,12]. The delivery of such monetary
rewards is now much easier due to the popularization of crypto-currencies [10,26].
These recent developments allow us to financially transact with the successful partic-
ipants without requiring a bank account, which almost half of the world’s population
does not have access to [5].
Social responsibility: It is essential to recognize that earthquake prediction is not
only a scientific goal but also a topic that has the potential to affect the lives of many
people, especially those living in seismically active regions. The contest participants
should be aware that the events that they are trying to predict are not just num-
bers on a screen, but actual catastrophes causing human suffering. We believe that
providing a mechanism for expressing solidarity with the victims can help raise this
awareness. To this end, the RichterX platform encourages the participants to donate
their rewards to charitable organizations such as GiveWell, Humanity Road [44] and
UNICEF [11], which take part in global earthquake relief efforts. Recent studies indi-
cate that the use of decentralized ledger technologies can improve transparency and
accountability in humanitarian operations [7]. Therefore, all donations on RichterX
are made using cryptocurrencies and recorded on the blockchain, allowing for anyone
to verify the amount and destination independently. In this way, we hope to prevent
a possible detachment between a community that engages with earthquakes from a
scientific perspective and people who suffer their physical consequences.
3 A global, real-time reference earthquake forecasting model
3.1 Introduction
The characteristics summarized in the previous section have emerged due to the expe-
rience gained from previous earthquake prediction and forecasting experiments. In
his address to the Seismological Society of America in 1976, Clarence Allen proposed
that an earthquake prediction should be assigned a reference probability indicating
how likely it is to occur by chance [2]. Indeed, the development of a model that
can assign a probability for any time window anywhere in the world has been one
of the main hurdles. There have been several accomplishments in the modeling of
global seismicity. Those efforts began with models based on smoothing locations of
observed large earthquakes [22], progressing to combining past seismicity with strain
rates estimates [3,21]. Recently, the Global Earthquake Model working group led a
collaborative effort to harmonize many regional models [39]. Although these models
are important milestones, they model seismicity as a memoryless, stationary process.
As a result, they do not capture the time-dependent aspect of earthquake occurrence.
The choice of treating earthquakes as a stationary process likely is motivated by the
risk assessment practices in the civil engineering and insurance industry. Yet, we
believe that, as the seismology community reassesses its assumptions and develops
more realistic models, the industry will, in turn, adapt to these changes.
Based on empirical laws derived from observed seismicity, the Epidemic Type
Aftershock Sequence (ETAS) model was introduced to enhance stationary statistical
earthquake models by accounting for the time-space clustering of seismicity [38].
Retrospective and prospective studies show that statistical models outperform models
derived from physical concepts such as rate-and-state, stress transfer, seismic gaps,
456 The European Physical Journal Special Topics
or characteristic earthquakes [6,23,48]. The recent developments in ETAS modeling
have not only made a global scale application possible, but they have also highlighted
the importance of abolishing assumptions about the distribution of simulated events
[34,35]. Details about the model development, testing, and prospects can be found in
the accompanying paper [36]. Here we describe the real-time online implementation
and operation of the model in the context of the platform.
3.2 Data
The RichterX platform employs a dedicated server, the so-called “grabber”, that
periodically connects to web-based earthquake data center feeds. The grabber com-
pares our current local database with the remote host for the addition of new or the
deletion of old events. If any change is detected, the grabber synchronizes our current
event catalog with the remote database. We are obtaining data from multiple global
agencies, such as the GFZ Geofon [14] and INGV Early-Est [29], but our primary
data source is the USGS ComCat feed [45]. Our data polling frequency is usually
around once every few minutes but can be increased automatically during elevated
seismic activity.
3.3 Model selection, calibration, and forward simulations
We have developed multiple candidate models that are different variations of the
ETAS model or use different input datasets. We use pseudo-prospective testing to
select among these models. See details of the experiment and competing models in
the accompanying paper [36]. In this procedure, only data recorded before a certain
time is considered and divided into sets of training and validation; competing models
are trained on the training data, and their forecasting performances are compared
on the validation set. The performances are averaged by moving forward in time and
repeating the tests. The top-ranking model, and its final parameter set optimized
over the whole dataset, is deployed online on the platform servers. These servers use
the real-time earthquake event data provided by the grabber as input and conduct
forward simulations on an hourly basis. The result of these forward simulations is a
collection of synthetic event datasets that represent a spatio-temporal projection of
how global seismicity will evolve.
The ETAS model is stochastic, i.e., its output samples statistical distributions,
and therefore multiple forward simulations are needed to obtain an accurate rep-
resentation of the underlying probabilities. Each such simulation produces a global
synthetic catalog containing location, time, and magnitudes of events for the next 30
days. The total number of events produced at each real-time update can reach sev-
eral millions. These simulated events are uploaded onto our online database, where
they can be queried via the web-based user interface on www.richterX.com. Using this
interface, the participants can select any point on the globe, define a circular region, a
time window, and a magnitude range that they are interested in (see Fig. 1). The full
distribution of the simulated events within the user-specified time-space-magnitude
range is then used to calculate the probability of earthquake occurrence. In essence,
this probability corresponds to the number of simulations having events satisfying
the user-defined criteria divided by the total number of simulations.
To cope with the computational demands of these simulations, we have scheduled
several servers to run periodically in a staggered fashion. In this way, we can assure
that the model forecasts are updated within less than an hour after each earthquake.
The servers are also distributed in different locations to add redundancy in case of
service interruptions.
The Global Earthquake Forecasting System 457
4 How RichterX works
Earthquake predictions and forecasts have been issued and studied for decades. How-
ever, there is still confusion about their definition and proper formulation. We believe
it is essential to be strict about terminology. Science advances by accumulating evi-
dence in support or against hypotheses, and vague statements can become a missed
opportunity for testing and obtaining such evidence.
4.1 Earthquake forecast and earthquake prediction
We define an earthquake forecast as the statement of a probability that a minimum
number of events will occur within a specific time-space-magnitude window. There-
fore, a statement cannot be regarded as an earthquake forecast if either one of these
four parameters is omitted. For instance, the operational aftershock forecasts issued
by the USGS do not specify a space window for the issued probabilities (Field et al.,
2014; USGS, 2019), and therefore cannot be tested. Similarly, any ambiguity in the
parameters also renders the statement untestable. For instance, the statement “The
probability (that the San Andreas fault) will rupture in the next 30 years is thought
to lie somewhere between 35% and 70%” [20] does not satisfy the forecast definition
because both rupture size and occurrence probability are ambiguous. Unfortunately,
this is a common malpractice, and public officials often communicate probabilities by
giving a range [9]. The range is usually due to several models or different scenarios
leading to different probabilities. Using different approaches and assumptions is to be
encouraged; however, the resulting probability should be communicated as a single
value. There exist various techniques on how models can be weighed and ensembled
according to their predictive performances [13].
We define earthquake prediction as the statement that a minimum number of
earthquakes will, or will not occur in a specific time-space-magnitude window. Under
our definition, an earthquake prediction always results in a binary outcome: it is either
true or false. This definition is more general than its commonly used predecessors
[18,20,49] because it considers the negative statement, that an earthquake will not
occur, as an equally valid earthquake prediction. By construction, if the probability of
an earthquake to occur in a space-time-magnitude window is P, the probability of an
earthquake not to occur is 1-P. While Pis often small, it can exceed 0.5 immediately
after large earthquakes or during seismic swarms. In such cases, a prediction of no
occurrence carries more information potential, as it refers to a more unlikely outcome.
In this way, negative predictions can serve as a feedback mechanism that counters
overestimated earthquake forecast probabilities.
Once an earthquake prediction is issued, it is considered to be in a pending (i.e.
open) state. The “To-occur” predictions, which predict the occurrence of an event
or events, are closed as true if the number of predicted events is observed in the
predefined space-time-magnitude window, or as false if otherwise. The “Not-to-occur
predictions, which predict that no event will occur, are closed as true if there are no
events in their predefined space-time-magnitude windows, or as false if at least one
such event occurs.
The definitions of earthquake forecast and earthquake predictions are similar, as
they both refer to a constrained space-time-magnitude window. They differ in that
the former conveys the expected outcome with a real number while the latter uses
a binary digit. In that sense, regardless of the observed outcome, a forecast carries
more information compared to a prediction. Forecasts are also more straightforward
to evaluate; any set of independent earthquake forecasts (i.e having non-overlapping
space-time windows) can be evaluated based on the sum of their log-likelihood, which
458 The European Physical Journal Special Topics
Fig. 2. (1) Ranks screen showing the scores for a selected round: (a) round beginning and
end dates; (b) table showing anonymized user names, skill class and current rX score, see
Section 3.3 for details. (2) M5+ target events colored as blue for predicted and red for
not-predicted: (a) magnitude vs time plot; (b) spatial distribution of the events. (3) Results
screen for a participant: (a) filter criteria; (b) results table list of prediction locations, round
number and status; (c) expandable row with further details. (4) Prediction details screen:
(a) magnitude-time plot highlighting the elapsed portion of prediction window with red;
(b) spatial distribution of events around the prediction circle; (c) prediction statement with
space-time-magnitude and event number details.
is analogous to their joint likelihood:
LL =
N
X
i=1
log (OiPi+ (1 Oi) (1 Pi)) (1)
where Nis the total number of forecasts, Pidenotes the probability of each forecast
and Oirepresents the outcome as 1 (true) or 0 (false). The larger the sum, the higher
the joint likelihood and hence the more skillful a forecast set is. The performance
evaluation of prediction sets is covered in the Performance Assessment section.
4.2 Rules and regulations
The goal of the RichterX platform is to organize a prediction contest that provides
timely feedback, skill assessment, and incentives for participation. Therefore, we have
tried to devise a system that fosters collaborative competition and rewards skill while
maintaining fairness. To attract broader participation, we tried to make the contest’s
regulations intuitive and straightforward without sacrificing statistical rigor. Here,
we will present these rules and the reasons behind them.
4.2.1 Limited number of predictions
Each participant is allowed to place a maximum of 100 predictions every 24 hours.
This prediction budget is expressed in units of so-called “earthquake coins” (EQC). It
is recharged continuously in real-time, such that after 15 minutes, the participant
The Global Earthquake Forecasting System 459
accumulates 1 EQC and can submit another prediction. The accumulated budget
cannot exceed 100 EQC. Hence if participants want to submit more predictions, they
have to wait. In this way, we hope to encourage the participant to engage with the
platform regularly and follow the evolution of seismicity and think thoroughly before
using it to submit predictions. We expect the participants to perceive their limited
prediction budget as valuable, since it is scarce.
4.2.2 One user – one account
Each participant is allowed to have only one account on the platform. Since we
are providing monetary rewards as an incentive for public participation, users could
increase their chance of getting a reward by creating multiple accounts and placing
random predictions. We have addressed this by requiring each user to validate their
account via a mobile phone application, i.e., a chatbot. The bot runs on the messaging
platform Telegram and verifies the user by requiring them to enter a secret code. If
the code is correct, the user is matched with their unique Telegram ID, which requires
a valid mobile phone number. All reward-related operations are verified through this
unique ID.
It is important to note that policies limiting participation rate and preventing
multiple accounts are common in online courses and contests such as Kaggle [37,47].
However, previous earthquake forecasting competitions conducted by CSEP, and also
its upcoming second phase CSEP2 [40], do not impose such policies. As a result,
participants who submit several versions of a model can increase their chance of
obtaining a good score, creating a disadvantage for participants who submit only a
single model.
4.2.3 Submitting earthquake predictions
The user interface provided on the RichterX platform allows the participants to query
our global reference model and obtain its forecasts within the following ranges: time
duration from 1 to 30 days, a circular region with radius from 30 to 300 km, lower
magnitude limit from M5+ to M9.9+ and a number of events from 1+ to 9+. Once
these parameters are set, the platform will report a probability of occurrence P(or
non-occurrence 1P). The participant can then submit a prediction assigned with
this model probability. This probability is used to assess the participant’s prediction
skill by accounting for the outcome of their closed predictions.
In addition to the time, space, magnitude, and number parameters, the user can
also specify a so-called “stake” for each prediction. The stake acts as a multiplier
allowing the participants to submit the same prediction several times, provided that
it is within their prediction budget (EQCs). Therefore the stake can be thought of
as a proxy for the confidence attributed to a prediction.
The reference model updates automatically on an hourly basis. Thus, when a
new earthquake occurs, the region in its vicinity becomes unavailable for submitting
predictions. Once the new earthquake is incorporated as an input and the model has
been updated, the region becomes available for the submission of new predictions.
This allows us to fairly compare users and our reference model, as both parties are
fed with the same amount of information. The radius of the blocked area (Rb) scales
as a function of the event magnitude according to the empirical magnitude-surface
rupture length scaling [46] given in the following equation.
Rb= 10 + 103.55+0.74M(km) (2)
460 The European Physical Journal Special Topics
This assures that the projection of the fault rupture, where most aftershocks are
expected to occur, remains within the restricted region regardless of the rupture
direction. The additional 10 km in the Rbterm accounts for the typical global location
uncertainty.
4.2.4 Evaluation of earthquake predictions
Target events are all M5 events, as reported by the USGS ComCat [45]. Predic-
tions are evaluated on a bi-weekly round basis. A time frame of only 14 days may
seem too short, yet our target region is the whole Earth rather than a specific local-
ity. To put this in perspective, the first regional forecasting experiment, the Regional
Earthquake Likelihood Models (RELM), was limited to the state of California, USA,
took place during a 5 year period of 2006–2010 and had a total of 31 target events
[27]. This corresponds to roughly half of the global bi-weekly M5+ event count (mean
63, median 58 since 1980).
5 Performance assessment metrics
5.1 Conditions for proper metrics
In the case of probabilistic forecasts, a scoring rule is said to be proper if it incen-
tivizes the forecaster to convey their actual estimated probability [19]. In other words,
a proper scoring rule does not affect the probability issued by the forecaster. An
improper scoring rule, however, can be exploited by modifying one’s forecast in a
certain way specific to the scoring rule. For example, if a scoring rule does not penal-
ize false positives, then participants can gain an advantage by issuing more alarms
than they usually would have. Deterministic predictions do not convey the informa-
tion of probability; thus, the definition of properness given above becomes irrelevant
[19]. Yet it is useful to consider a more general definition: a proper scoring rule, in
the context of a contest, aligns the goals of the organizers and the incentives of the
participants.
One goal of the RichterX platform is to encourage broad public and academic
participation from various fields of expertise. Therefore, we need a scoring rule that
is statistically rigorous, easy to understand, and applicable on short time scales. The
scoring rule should also ensure that the public participants are rewarded propor-
tionally to their predictive skills, as opposed to a winner-takes-all approach, while
incentivizing their regular participation. Another important goal is to provide the sci-
entific community with a generalized platform where different models and hypotheses
(be it alarm based or probabilistic) can be evaluated to determine performance and
provide feedback to researchers. For this second goal, the scoring rule needs to be
exclusively focused on skill and be generally applicable. To achieve both goals, we
have chosen to implement two scoring strategies that complement each other. These
are the RichterX score and the information ratio score. In the following section,
we will describe how these two scores are implemented and used jointly in the
competition.
5.2 RichterX Score (rX)
The definition of the rX score is straightforward; each submitted prediction counts
as a negative score equal to the prediction stake (s). If a prediction comes true, the
The Global Earthquake Forecasting System 461
value of the stake(s) multiplied by the odds (1/p) is added to the score; if a prediction
fails, the score remains at s:
R=(s1
psif true
sif false (3)
This can be rewritten as
R=Ossp
p+ (O1) s(4)
where O= 1 if the prediction is true and O= 0 if false.
Our goal is to incentivize the participants to challenge our model and highlight
regions or time frames where it can be improved; thus, we want to reward those who
perform better than our reference model. The expected gain from any prediction,
according to the model, can be written as the probability-weighted sum of wins and
losses:
E[R] = ps1
ps+ (1 p) (s) = 0.(5)
The expected gain of a participant is thus zero (positive scores indicating a better
performance than our model). The significance of a positive score (i.e. the probability
for a participant to improve on it by chance assuming that the model is correct) has
to be estimated for each participant. The latter performance estimator will become
more reliable as a participant accumulates independent submitted predictions.
At the end of each bi-weekly contest round, each participant’s score is calculated
using all their Npredictions closed during the round. Thus, summing expression (4)
over all Npredictions yields:
R=
N
X
i=1
OiSiSiPi
Pi+ (Oi1) Si(6)
where Siis the stake, Piis the prediction probability given by the reference model,
and Oiis a binary variable representing the prediction results as 1 (true) or 0 (false).
Monetary reward is distributed proportionally among all participants with positive
scores.
The scores are reset to 0 at the beginning of each round to encourage new partic-
ipants to join the competition. However, resetting the scores each round introduces a
problem. Since the only participation cost is the invested time, participants who see
that their negative scores are reset at the beginning of each round are incentivized
to make low-probability/high reward predictions, especially towards the end of the
round. If a few such predictions come true, the user can get a positive score, and if
they end up with a negative score, the participants would just have to wait for the
next round for their scores to be reset, and then they can try again. This would be
problematic because the participants can start treating the experiment as a game of
chance with no penalty for false predictions, rather than a contest of skill.
To counter this, we apply a carry-over function that introduces a memory effect
for negative scores: if a participant has a negative score not less than 100 at the
end of the round, they carry-over 10% of this negative score to the next round as a
penalty. The carry-over percentage increases proportionally with the amount of the
negative score and caps of at 90% as given in the following equation:
Ct=(max {|Rt1|/1000,0.9}Rt1if ∆Rt1<100
0.1∆Rt1if 0 >Rt1≥ −100 (7)
462 The European Physical Journal Special Topics
Hence, a participant with a score of 200 would carry over a penalty of 40,
while a user with 1000 would carry over 900 to the next round. In this way,
participants are incentivized to obtain positive scores consistently, instead of inter-
mittently. Nevertheless, since predictions can be submitted at any time, a participant
may stop submitting new predictions as soon as they have reached a positive score.
This problem, which has already been discussed by [19], is somewhat alleviated by
distributing the reward proportionally to the participant’s score with respect to the
combined total of all other positive participants. Therefore, there is always an incen-
tive to continue participating as other users become positive and start claiming larger
portions of the fixed overall reward.
Another aspect of the rX score is that it is a function of the prediction stake.
Two participants with the same predictions but different stakes will get different
scores. Assuming they possess some information gain, regular participants will be
able to submit the same prediction repeatedly, thereby increase their stake, and
obtain higher scores compared to those who follow a different strategy, testing their
predictions regardless of their returns. This makes sense in the context of a com-
petition where the participants provide added value by testing our reference model
through their predictions. Yet, as we are not necessarily interested in the optimiza-
tion of staking strategies, there is also a need to assess the predictive skill of each
participant regardless of their staking weights. For this purpose, we employ a second
metric.
5.3 Information ratio score (IR)
To assess the predictive skill of a participant, we need to answer the following two
questions: Firstly, how much better is the participant’s performance compared to the
reference model, and secondly, is this performance significant. To answer the first
question, we calculate a metric called the “information ratio” (IR):
IR =
1
N
N
P
i=1
Oi
1
N
N
P
i=1
Pi
(8)
IR is essentially the participant’s success rate (fraction of true predictions among
all predictions) divided by the reference model probability averaged over all predic-
tions (i.e., the model’s expected success rate) of the participant. This formulation
implies that there is an upper bound of IR = 1/min(Pi) and incentivizes the partici-
pants to achieve higher success rates in regions and time frames for which the model
gives low probabilities. Assuming that the reference model is true, the expected IR
value for any set of predictions would tend to 1.
To answer the question of whether a participant’s IR is statistically significant, we
employ Monte Carlo sampling to build an IR distribution given their set of submitted
predictions. This distribution is independent of the actual prediction outcomes as
we sample the model probability of each prediction Pito generate several possible
outcomes O0
i
xiU(0,1)
O0
i=(1 if xi< P
0 if xiPi
(9)
where U(a, b) is the uniform distribution within bounds aand b. We then calculate
the IRmof each outcome set according to equation (8), where mdenotes the index of
The Global Earthquake Forecasting System 463
the Monte Carlo sample. This forms the null-distribution that is used to benchmark
the actual IR value of the participant. The ratio of the sampled model IR values
that are above or equal to the participant’s value (α) can then be interpreted as the
probability of observing an IR at least as high as the participant’s, i.e., the p-value
under the null hypothesis that the reference model is true:
gm=(1 if IRmIRu
0 if IRm<IRu
α=1
M
M
P
m=1
gm
(10)
where IRuis the participant’s information ratio, and Mis the number of Monte
Carlo samples used to sample the distribution. If α0.05, then the participant is
considered to be significantly better than the reference model.
5.4 Accounting for overlapping predictions
In all previous equations, we have assumed that the submitted predictions are inde-
pendent, i.e., that they do not overlap in space and time. This assumption simplifies
the derivations of expected success rate, allowing for probabilities of independent
predictions to be averaged, and also makes it easier to calculate significance levels by
sampling each prediction independently during the Monte Carlo procedure. However,
the participants are free to submit predictions at anytime, anywhere on the globe.
Since predictions are submitted as circles with a maximum radius, participants who
want to cover larger areas completely will have to submit several overlapping predic-
tions. We also see that some participants re-issue predictions at the same locations
when an earthquake does not occur (assuming some local stress accumulation or a
characteristic period) or when it occurs (expecting aftershocks). Updating a hypoth-
esis as new information becomes available is the hallmark of the scientific method.
In the ideal case, if a precursory signal becomes gradually more prominent as an
earthquake approaches, one can expect overlapping predictions with narrower space-
time windows to be issued. Therefore instead of constraining the participants by
forbidding overlapping predictions, we prefer to deal with such predictions.
The question of evaluating overlapping predictions has been investigated previ-
ously by Harte and Vere-Jones, Harte et al. [15,16], who introduced the entropy score
as a pseudo-likelihood to evaluate M8 predictions, which are expressed as a set of
overlapping circles [25]. The entropy score is rather complicated and “awkward”, as
the authors put it, thus we have refrained from using it as we would like to keep the
performance criteria as intuitive as possible for the general public. The Molchan dia-
gram, which accounts for the total time-space volume covered by prediction alarms,
can also be employed to deal with predictions overlapping in space and time [31,32].
It is worth noticing that Molchan and Romashkova [30] successfully adopted their
methodology to the M8 predictions using specific features, such as constant large
circle sizes and large magnitudes, to assess its predictive skill. This is rather differ-
ent from our application, which involves evaluating and comparing different sets of
predictions that can each be a mix of to-occur and not-to-occur, with varying circle
sizes.
For the particular case of the RichterX prediction contest, the rX score is additive
and already incorporates the concept of “stake” that has the same effect as re-issuing
the same prediction; thus, it does not require any modification. However, the over-
lapping predictions constitute a problem for the IR score and its significance α. This
can be seen with a simple example of two non-overlapping to-occur predictions that
464 The European Physical Journal Special Topics
Fig. 3. Left: a set of overlapping predictions showing time and space domain. Right: a sample
of 4 sets containing only non-overlapping predictions obtained by the selective sampling
procedure described in the text.
require two earthquakes to come true. In comparison, two identically overlapping
predictions would come true with a single event. Intuitively, it follows that true inde-
pendent predictions are “worth” more in terms of significance than overlapping ones.
To take into account the presence of overlapping predictions, we employ a sampling
approach, whereby we begin with the full set of overlapping and non-overlapping pre-
dictions of each participant and, by selective sampling, create sets consisting only of
non-overlapping predictions (see Fig. 3). The IR metric and the associated αvalues
are calculated for each of these sampled sets, and the resulting averages are assigned
as the participant’s skill and significance. The selective sampling of each participant’s
predictions is performed in the following steps:
1. Considering all closed predictions in a given round, we calculate the distance
between the prediction centers for all predictions that overlap in the time domain.
2. If the distance between the centers of two predictions that overlap in time is less
than the sum of their radii, then these predictions are labeled as “overlapping”.
Predictions that do not overlap with any other prediction are labeled as “non-
overlapping”.
3. After all predictions are labeled, the overlapping predictions are put in the “can-
didate” set. We begin by randomly selecting one of these candidates and remove
all the predictions that overlap it (both in space and time).
4. We put the selected prediction in the “selection” set and repeat the procedure
by randomly selecting one of the predictions in the candidate set. We repeat this
until the candidate set is exhausted.
5. We then add all the non-overlapping predictions to the selection set. This set
constitutes a sample set of independent prediction that we then use to calculate
the IR score and αvalues as described above. We calculate an average value for
both metrics by repeating this sampling procedure several times.
Based on the significance threshold (α0.05) combined with the IR metric, we
categorize the participants into the following skill classes: (A) significant participants
with IR 2 and at least 5 independent predictions; (B) significant participants with
IR 1.33 and at least 5 independent predictions; (C) participants with IR>1 but
who fail to satisfy either the significance, prediction number or IR criteria to become
an A or B; (D) all participants with IR <1. It can be argued that requesting a
minimum number of predictions may affect the participants’ behavior; some might
start placing predictions that they would not have placed just to reach the limit. We
The Global Earthquake Forecasting System 465
concede that the contest regulations will affect participant behavior in one way or
another and deem such effects admissible as long as they do not hinder the goals
of the competition. Participants who achieve skill classes of A or B are rewarded
additionally to the reward distributed proportionally to the rX score. By distribut-
ing rewards according to two different but complementary performance metrics, we
hope to make exploiting a single metric less enticing and to incentivize demonstrating
actual skill. The rX score is relatively easier to calculate since the score of each new
prediction is simply added to the current balance. However, the skill classes based on
the IR score are more difficult to calculate because each new prediction affects the
average prediction probability and estimating significance requires numerical simu-
lation. We acknowledge that such statistical concepts can be intimidating for the
general public and hinder participation. Therefore, we have implemented a recom-
mendation algorithm that uses the currently closed predictions of each participant
to suggest an additional number of true predictions with a probability sufficient to
achieve skill classes B or A. The flowchart of the recommendation algorithm is given
in Figure 4. In essence, the algorithm estimates what is the minimum number and
highest reference model probability of additional true predictions that would satisfy
both the significance and the IR criteria. If the participant has achieved skill class
B, the algorithm would recommend predictions for achieving skill class A, while for
classes C and D the recommendation would aim at B. In principle, similar recom-
mendations can be calculated not necessarily for the minimum but for any number of
predictions; the minimum probabilities would increase as the number of predictions
increases. Figure 5shows the outputs of the recommendation system based on the
closed predictions of two different participants.
6 Synthetic consistency tests
We proposed the two score metrics introduced in the previous section to assess the
predictive skills of individual participants as well as probabilistic forecasting mod-
els that can be sampled with deterministic predictions through an application pro-
gramming interface. Fairness in reward distribution and reputation based contest is
an essential factor that motivates participants. Moreover, from a scientific point of
view, it is crucial to establish that the proposed metrics are powerful enough to dis-
criminate between good and bad models such that research can be focused in more
promising directions.
To test the consistency of the proposed metrics, we conduct a simplified synthetic
ranking test. The test consists of three main components: (1) the ground truth model
that generates the events; (2) several competing models that issue predictions trying
to predict the generated events; (3) a reference model that is used as the basis of
prediction probabilities entering in the rX and IR metrics. The synthetic prediction
contest is carried out by all of the competing models issuing Nppredictions based on
their expectations and the reference model probability. The outcome of the submitted
predictions is dictated by the ground truth model. The scores are then calculated
using the outcomes and the reference model probabilities assigned to the predictions
submitted by the candidate models. The synthetic test is carried out in these steps:
1. The ground truth model is defined as a 1D probability vector with Npelements
T=U(0.01,0.99)
2. Outcomes, occurrence, or no-occurrence, are generated by sampling each of the
individual probabilities in the Tvector to create an outcome vector Oas per
equation (9)
3. A set of mprogressively worse candidate models Ciis created by perturbing the
ground truth model by adding uniform random noise with increasing amplitude.
466 The European Physical Journal Special Topics
Fig. 4. Flow chart of the recommendation algorithm estimating the probability and number
of true predictions needed to achieve a higher skill class. SR: success rate, ARP: average
RichterX probability, IRtar: target information ratio.
The perturbed probabilities are capped to remain within the [0.01, 0.99] interval
xiU(0,1)
Ci= max min T+i(x0.5)
m,0.99,0.01(11)
4. For each of the Nppredictions, a candidate model indexed jdecides to issue
ato-occur or not-to-occur prediction by choosing the prediction type with the
maximal expected return.
The Global Earthquake Forecasting System 467
Fig. 5. (1) Summary of prediction outcomes for a participant: (a) Two bar charts indicating
outcome of to-occur and not-to-occur predictions as false (red) or true (blue), the length
of each bar scales with the prediction stake; (b) Map showing the location of the to-occur
and not-to-occur predictions as purple up or beige down arrows. (2) Skill assessment plot
for a participant: (a) Current skill class, from A to D, significance value (1α) and number
of independent predictions; (b) Recommendation containing the number of true predictions
with a given probability needed to achieve a higher skill class; (c) Success rate vs average
RichterX probability. Participants with less than 5 independent predictions are shown as
triangles, others as squares. Colors indicate the skill class; A bright green, B green, C yellow,
D gray. The selected participant is indicated by a symbol with thicker edges. (3) Same as
(1) but for a different participant. (4) Same as (2) but for a different participant.
E[Occ] = Ci(j)1R(j)
R(j)(1 Ci(j))
E[Noc] = (1 Ci(j)) R(j)
1R(j)Ci(j)
(12)
where E[Occ] and E[Noc] denote a candidate model’s expected return for a to-
occur and not-to-occur prediction respectively.
5. All the issued predictions are assigned as true or false according to the outcome
vector O, and each candidate model receives rX and IR scores.
We expect the consistency to improve with an increasing number of predictions
Np. Thus, we conducted the synthetic test for Np= [100,1000,5000]. Figure 6shows
the results of the synthetic tests for the case when the reference model is chosen
as the median (Ci= 250) and when the reference model is chosen as the worse
(Ci= 500), respectively. We can see that, as the number of predictions increases, the
fluctuation in both score metrics decreases, highlighting a linear relationship with
the true rank. The skill of the reference model relative to the candidate models also
plays an important role in interpreting the consistency results.
Since we created the candidate models by adding an increasing amount of noise
to the ground truth model we also know the true ranking. We can study the scoring
consistency by comparing the ranking obtained by each metric to the true ranking
via the Kendall’s rank correlation coefficient τ[1] in the following equation, where
pcand pdare concordant (i.e having the same relative order) and discordant pairs,
and nis the number of elements being ranked:
468 The European Physical Journal Special Topics
Fig. 6. The rX (left column) and IR (right column) scores for the 500 competing models
resulting from Npindependent predictions. Increasing Npvalues are shown in darker shades.
Top row plots the results when the reference model is chosen as rank 250, i.e the average
model. Bottom row corresponds to the reference chosen as rank 500, i.e. the worst model.
τ=2 (pcpd)
n(n1) (13)
The coefficient τis equal to 1 when the two rankings are exactly the same; 1
when they are the reverse of each other and values close to zero when they are
unrelated. Figure 7plots the τcoefficients as a function of increasing number of
predictions, for different reference model choices. We observe that the IR score is
more powerful in retrieving the original ranking, resulting in consistently higher τ
values, both at small and large number of predictions and regardless of the choice of
the reference model
Another important observation is that, when the reference model is chosen as the
best model (i.e very close to the generating one), the ranking becomes inconsistent.
Remember from equation (5), that, if the reference model is the generating model,
we expect the rX score for any set of predictions to have an expected value of zero.
Similarly, for the IR metric, the expected value would be 1 (see Eq. (9)). Figure 8
confirms this by plotting the individual model scores when the reference model is
chosen as the best. For any number of predictions, the score values fluctuate around
0 and 1 for rX and IR respectively, explaining the near zero rank correlation values
observed in Figure 7.
The Global Earthquake Forecasting System 469
Fig. 7. Rank correlation between true rank and inferred ranked of 500 models as a function
of increasing number of predictions. Results based on IR and rX score are shown as solid
and dashed lines respectively. Different color are used to show the results when the true
rank of the reference model is chosen as 1(best), 100, 250, 400 or 500 (worst).
Fig. 8. The same as Figure 6when the reference model is chosen as rank 1, i.e the best
model.
It is important to note that the complete inability to distinguish between worse
models will occur only when the reference model is very close to the truth. In the
context of a prediction contest, this would mean that we have reached our final goal
and that further research cannot provide added value. In reality, we know that our
current models have a lot of room for improvement. We have demonstrated that,
when this is the case, the proposed metrics are able to rank both models better
and worse than the reference (see Figs. 6and 7). Nevertheless, if there are concerns
regarding the ranking of models that are much worse than the RichterX reference
model, these can be easily addressed by invoking a very weak reference model, such
as smoothing past seismicity assuming Poissonian rates.
470 The European Physical Journal Special Topics
7 Conclusion
The RichterX platform aims to rekindle the earthquake prediction effort by organiz-
ing a prediction contest invoking large scale participation both from the public and
from different fields of academia. To facilitate this contest, we have implemented a real-
time global earthquake forecasting model that can estimate the short term earthquake
occurrence probabilities anywhere in the world. On one hand, this platform makes the
contest highly accessible, allowing anyone with just a mobile phone to submit a pre-
diction. On the other hand, it allows the public to query earthquake occurrence proba-
bilities in real-time for any specific region, which becomes vital information, especially
after the large mainshocks. In this way, with a single platform we hope to achieve three
main goals: (1) Inform the public about short-term earthquake probabilities anywhere
on the globe in real time; (2) Serve as a public record empowering the media and public
officials to counter claims of earthquake prediction after the fact; (3) Allow researchers
from various fields to easily participate in an earthquake prediction contest and chal-
lenge state-of-the-art global statistical seismology models.
Wide scale participation has the potential to bring forward and allow for the test-
ing of various data sources that may or may not have precursory information. Current
earthquake forecasting contests, which rely on systematic reporting of earthquake
rates for large regions in predefined space-time resolutions, are not suitable for the
testing of intermittent observations, such as earthquake lights, groundwater chem-
istry, electromagnetic and thermal anomalies, and so on. The RichterX platform can
easily accommodate alarm based predictions based on such data sources. In addi-
tion, through synthetic ranking tests, we have shown that the proposed performance
metrics can distinguish between probabilistic models that are better or worse than
the reference model, and retrieve the true performance ranking.
Publisher’s Note The EPJ Publishers remain neutral with regard to jurisdictional claims
in published maps and institutional affiliations.
References
1. H. Abdi, in Encyclopedia of Measurement and Statistics (Sage, Thousand Oaks, CA,
2007), p. 508–510
2. C.R. Allen, Bull. Seismol. Soc. Am. 66, 2069 (1976)
3. P. Bird, D.D. Jackson, Y.Y. Kagan, C. Kreemer, R.S. Stein, Bull. Seismol. Soc. Am.
105, 2538 (2015)
4. F. Cappa, J. Laut, M. Porfiri, L. Giustiniano, Comput. Human Behav. 89, 246 (2018)
5. A. Chaia, A. Dalal, T. Goland, M.J. Gonzalez, J. Morduch, R. Schiff, Half the world
is unbanked: financial access initiative framing note (Financial Access Initiative, New
York, 2009)
6. R. Console, M. Murru, F. Catalli, G. Falcone, Seismol. Res. Lett. 78, 49 (2007)
7. G. Coppi, L. Fast, Blockchain and distributed ledger technologies in the humanitar-
ian sector (Hpg commissioned report, London, 2019), http://hdl.handle.net/10419/
193658
8. M.A. Edwards, S. Roy, Academic research in the 21st Century: Maintaining scientific
integrity in a climate of perverse incentives and hypercompetition (2017), https://www.
liebertpub.com/doi/abs/10.1089/ees.2016.0223
9. Erdst¨osse im Wallis, Zahlreiche Erdst¨osse schrecken Menschen im Wallis auf (2019),
https://www.tagesanzeiger.ch/panorama/vermischtes/naechtliches-erdbeben-
erschuettert-das-wallis/story/13668757
10. A. Extance, Nature 526, 21 (2015)
11. C. Fabian, Innov. Technol. Governance Globalization 12, 30 (2018)
12. D. Fiorillo, Ann. Public Cooperative Econ. 82, 139 (2011)
The Global Earthquake Forecasting System 471
13. D. Fletcher, Model averaging (Springer, 2019)
14. GEOFON, Deutsches GeoForschungszZntrum GFZ (1993)
15. D. Harte, D. Vere-Jones, Pure Appl. Geophys. 162, 1229 (2005)
16. D. Harte, D.F. Lp, M. Wreede, D. Vere-Jones, Q. Wang, New Zealand J. Geol. Geophys.
50, 117 (2007)
17. S. Hiemer, Y. Kamer, Seismol. Res. Lett. 87, 327 (2016)
18. D.D. Jackson, Proc. Nat. Acad. Sci. USA 93, 3772 (1996)
19. I.T. Jolliffe, Meteorol. Appl. 15, 25 (2008)
20. T.H. Jordan, Seismol. Res. Lett. 77, 3 (2006)
21. Y.Y. Kagan, Worldwide Earthquake Forecasts (2017)
22. Y.Y. Kagan, D.D. Jackson, Geophys. J. Int. 143, 438 (2000)
23. Y.Y. Kagan, D.D. Jackson, R.J. Geller, Seismol. Res. Lett. 83, 951 (2012)
24. Y.Kamer, S. Hiemer, J. Geophys. Res. Solid Earth 120, 5191 (2015)
25. V.I. Keilis-Borok, V.G. Kossobokov, Phys. Earth Planet Inter. 61, 73 (1990)
26. Y.M. Kow, First Monday 22 (2017)
27. Y.-T.T. Lee, D.L. Turcotte, J.R. Holliday, M.K. Sachs, J.B. Rundle, C.-C.C. Chen, K.F.
Tiampo, Proc. Nat. Acad. Sci. USA 108, 16533 (2011)
28. M.R. Lepper, D. Greene, The hidden costs of reward: New perspectives on the psychology
of human motivation (Lawrence Erlbaum, Oxford, England, 1978)
29. A. Lomax, A. Michelini, Pure Appl. Geophy. 170, 1385 (2013)
30. G. Molchan, L. Romashkova, [arXiv: 1005.3175](2010)
31. G.M. Molchan, Phys. Earth Planet. Inter. 61, 84 (1990)
32. G.M. Molchan, Tectonophysics 193, 267 (1991)
33. S. Nandan, G. Ouillon, S. Wiemer, D. Sornette, J. Geophys. Res. Solid Earth 122, 5118
(2017)
34. S. Nandan, G. Ouillon, D. Sornette, S. Wiemer, Seismol. Res. Lett. 90, 1650 (2019)
35. S. Nandan, G. Ouillon, D. Sornette, S. Wiemer, J. Geophys. Res. Solid Earth 124, 8404
(2019)
36. S. Nandan, Y. Kamer, G. Ouillon, S. Hiemer, D. Sornette, Eur. Phys. J. Special Topics
230, 425 (2021)
37. C.G. Northcutt, A.D. Ho, I.L. Chuang, Comput. Edu. 100, 71 (2016)
38. Y. Ogata, J. Am. Stat. Assoc. 83, 9 (1988)
39. M. Pagani, J. Garcia, D. Monelli, G. Weatherill, A. Smolka, Ann. Geophys. 58 (2015),
https://www.annalsofgeophysics.eu/index.php/annals/article/view/6677
40. W. Savran, P. Maechling, M. Werner, D. Schorlemmer, D. Rhoades, W. Marzocchi, J.
Yu, T. Jordan, The Collaboratory for the Study of Earthquake Predictability Version
2 (CSEP2): Testing Forecasts that Generate Synthetic Earthquake Catalogs (EGUGA,
2019), p. 12445
41. D. Schorlemmer, J.D. Zechar, M.J. Werner, E.H. Field, D.D. Jackson, T.H. Jordan,
Pure Appl. Geophys. 167, 859 (2010)
42. D. Schorlemmer, M.J. Werner, W. Marzocchi, T.H. Jordan, Y. Ogata, D.D. Jackson,
S. Mak, D.A. Rhoades, M.C. Gerstenberger, N. Hirata, M. Liukis, P.J. Maechling,
A. Strader, M. Taroni, S. Wiemer, J.D. Zechar, J. Zhuang, Seismol. Res. Lett. 89,
1305 (2018)
43. A. Sol, H. Turan, Sci. Eng. Ethics 10, 655 (2004)
44. K. Starbird, L. Palen, Working & sustaining the virtual disaster desk, in Proceedings
of the ACM Conference on Computer Supported Cooperative Work, CSCW, New York,
USA, 2013 (ACM Press, New York, USA, 2013)
45. U.S. Geological Survey Earthquake Hazards Program, Advanced National Seismic Sys-
tem (ANSS) comprehensive catalog of earthquake events and products (2017)
46. D.L. Wells, K.J. Coppersmith, Bull. Seismol. Soc. Am. 84, 974 (1994)
47. J. Whitehill, Climbing the kaggle leaderboard by exploiting the log-loss oracle, Technical
report (2018)
48. J. Woessner, S. Hainzl, W. Marzocchi, M.J. Werner, A.M. Lombardi, F. Catalli,
B. Enescu, M. Cocco, M.C. Gerstenberger, S. Wiemer, J. Geophys. Res. 116, 1 (2011)
49. H.O. Wood, B. Gutenberg, Earthquake Prediction (1935)
... The probabilistic approach yields a likelihood that an earthquake will occur in a given region at some time span. Every earthquake prediction method should be based solely on statistics i.e., probability (kAmEr et al., 2021). ...
... Детерминистички приступ предвиђању потреса захтева познавање географске ширине и дужине епицентра потреса, магнитуду и време потреса, што се сматра нереалним за разлику од пробабилистичног приступа (врло слично временској прогнози), који даје вероватноћу да ће се догодити потрес за дати регион у од ређеном временском периоду. Исто тако, свака метода предвиђања потреса треба да буде искључиво базирана на статистици, односно вероватноћи (kAmEr et al., 2021). ...
Article
Full-text available
The Solar Heliospheric Observatory (SOHO) satellite was launched on the 2nd of December 1995 at L1 Lagrange point (1.5x106 km from Earth) with the purpose of gathering data for helioseismology, remote sensing of the solar atmosphere, and solar wind in situ. The satellite was positioned into orbit in early 1996, with data acquisition expected to commence on January 20th. The correlation between increased values of solar wind parameters and earthquakes in the Balkan peninsula zone between 1996 and 2018 was made possible by data obtained through continuous proton density and proton velocity monitoring. The assessment of the anomalous threshold was based on statistically determined parameters due to the huge fluctuation of solar wind over time and distinct value increases of proton density and speed. Visual representations of proton density and proton speed were created for the time window preceding each earthquake after defining the boundary between normal and anomalous values. According to the chart analysis, increased proton density occurred in 40 of the 50 cases observed, whereas increased proton velocity appeared in 28 of the 50 cases. Using hypergeometrical probability and an unbiased test with randomly generated parameters, the discovered correlation was statistically verified. A retrospective selection bias analysis is also provided in the research paper.
... The evaluation metrics will offer a positive score when the earthquake prediction model outperforms the reference model. In the evaluation metrics of statistical seismology, such as Molchan diagram (Molchan, 1991), gambling score (Zhuang, 2010), RichterX score (Kamer et al., 2021) and 3D error diagram (Zhang et al., 2023), the reference model can take several alternative forms. ...
Article
Full-text available
Given the robust nonlinear regression capabilities of Artificial Intelligence (AI) technology, its commendable performance in numerous geophysical tasks is expected. Yet, AI technology suffers from (a) its “black box” nature and (b) the fact that some complicated artificial neural networks (ANNs) claiming superior performance do not surpass some simple geophysical models that clearly describe the underlying physical processes. Numerous reports rely on standard machine learning metrics, often using a spatially uniform Poisson (SUP) distribution as their reference. A good performance just means that the artificial neural network (ANN) outperforms this basic reference, potentially offering little novelty to the scientific community. Worse, this can lead to spurious inference. We demonstrate this by using the monthly average human‐made Nighttime Light Map and the cumulative energy of earthquakes in various space‐time units as inputs for an Long short‐term memory model. The goal is to predict earthquakes with a magnitude of M ≥ 5.0 across the entire Chinese Mainland. With the SUP reference model, the ANN concludes that human‐made Nighttime Light possesses substantial earthquake prediction capability. This is evidently flawed reasoning. We show that this stems from the poor reference model and this spurious inference disappears when using a better benchmark consisting of a spatially varying Poisson (SVP) model informed from statistical seismology. This is implemented by weighting the punishments/rewards of our ANN associated with failed/successful predictions by prior probabilities provided by the stronger SVP model. Scores obtained with the time‐space Molchan diagram demonstrate the strong performance improvement obtained by training ANN with a better reference model.
... Having been tested thoroughly and systematically (Woessner et al., 2011;Ogata et al., 2013;Strader et al., 2017;Taroni et al., 2018;Nandan et al., 2019b;Savran et al., 2020), ETAS models meanwhile remain the state-of-the art of earthquake forecasting and are being used or considered for OEF at various locations Rhoades et al., 2016;Field et al., 2017;Kamer et al., 2021;Nandan, Kamer, et al., 2021;. Besides using the most basic formulation of ETAS, modelers also commonly refine the model. ...
Article
Full-text available
The development of new earthquake forecasting models is often motivated by one of the following complementary goals: to gain new insights into the governing physics and to produce improved forecasts quantified by objective metrics. Often, one comes at the cost of the other. Here, we propose a question-driven ensemble (QDE) modeling approach to address both goals. We first describe flexible epidemic-type aftershock sequence (ETAS) models in which we relax the assumptions of parametrically defined aftershock productivity and background earthquake rates during model calibration. Instead, both productivity and background rates are calibrated with data such that their variability is optimally represented by the model. Then we consider 64 QDE models in pseudoprospective forecasting experiments for southern California and Italy. QDE models are constructed by combining model parameters of different ingredient models, in which the rules for how to combine parameters are defined by questions about the future seismicity. The QDE models can be interpreted as models that address different questions with different ingredient models. We find that certain models best address the same issues in both regions, and that QDE models can substantially outperform the standard ETAS and all ingredient models. The best performing QDE model is obtained through the combination of models allowing flexible background seismicity and flexible aftershock productivity, respectively, in which the former parameterizes the spatial distribution of background earthquakes and the partitioning of seismicity into background events and aftershocks, and the latter is used to parameterize the spatiotemporal occurrence of aftershocks.
... Having been tested thoroughly and systematically (Woessner et al., 2011;Ogata et al., 2013;Strader et al., 2017;Taroni et al., 2018;Nandan et al., 2019c;Savran et al., 2020), ETAS models meanwhile remain the state-of-the art of earthquake forecasting and are being used or considered for OEF at various locations D. Rhoades et al., 2016;Field et al., 2017;Nandan et al., 2021a;Kamer et al., 2021;van der Elst et al., 2022;). Besides using the most basic formulation of ETAS, modelers also commonly refine the model. ...
Preprint
Full-text available
The development of new earthquake forecasting models is often motivated by one of the following complementary goals: to gain new insights into the governing physics and to produce improved forecasts quantified by objective metrics. Often, one comes at the cost of the other. Here, we propose a question-driven ensemble (QDE) modeling approach to address both goals. We first describe flexible ETAS models in which we relax the assumptions of parametrically defined aftershock productivity and background earthquake rates during model calibration. Instead, both productivity and background rates are calibrated with data such that their variability is optimally represented by the model. Then we consider 64 QDE models in pseudo-prospective forecasting experiments for Southern California and Italy. QDE models are constructed by combining model parameters of different ingredient models, where the rules for how to combine parameters are defined by questions about the future seismicity. A QDE model can then be interpreted as a model which addresses different questions with different ingredient models. We find that certain models best address the same issues in both regions, and that QDE models can substantially outperform the standard ETAS and all ingredient models.
Article
Full-text available
While deterministically predicting the time and location of earthquakes remains impossible, earthquake forecasting models can provide estimates of the probabilities of earthquakes occurring within some region over time. To enable informed decision‐making of civil protection, governmental agencies, or the public, Operational Earthquake Forecasting (OEF) systems aim to provide authoritative earthquake forecasts based on current earthquake activity in near‐real time. Establishing OEF systems involves several nontrivial choices. This review captures the current state of OEF worldwide and analyzes expert recommendations on the development, testing, and communication of earthquake forecasts. An introductory summary of OEF‐related research is followed by a description of OEF systems in Italy, New Zealand, and the United States. Combined, these two parts provide an informative and transparent snapshot of today's OEF landscape. In Section 4, we analyze the results of an expert elicitation that was conducted to seek guidance for the establishment of OEF systems. The elicitation identifies consensus and dissent on OEF issues among a non‐representative group of 20 international earthquake forecasting experts. While the experts agree that communication products should be developed in collaboration with the forecast user groups, they disagree on whether forecasting models and testing methods should be user‐dependent. No recommendations of strict model requirements could be elicited, but benchmark comparisons, prospective testing, reproducibility, and transparency are encouraged. Section 5 gives an outlook on the future of OEF. Besides covering recent research on earthquake forecasting model development and testing, upcoming OEF initiatives are described in the context of the expert elicitation findings.
Article
Full-text available
We present the development and testing of multiple epidemic-type aftershock sequence (ETAS)-based earthquake forecasting models for Switzerland, aiming to identify suitable candidate models for operational earthquake forecasting (OEF) at the Swiss Seismological Service. We consider seven model variants: four variants use parameters obtained through fitting the ETAS model to the Swiss earthquake catalog, and three use generic parameters that were fit to Californian seismicity or global seismicity from regions tectonically similar to Switzerland. The model variants use different pieces of information from the current state-of-the-art time-independent earthquake rate forecast underlying the Swiss seismic hazard model (SUIhaz2015), and one is calibrated on a larger local data set that includes smaller earthquakes by allowing a time-dependent estimate of the completeness magnitude. We test all variants using pseudoprospective short-term (7-day) forecasting experiments and retrospective long-term (30-year) consistency tests. Our results suggest that all ETAS-based models outperform the time-independent SUIhaz2015 forecast in the short term, but two of the model variants overestimate event numbers in the long term. ETAS parameters are found not to be universally transferrable across tectonic regimes, and region-specific calibration is found to add value over generic parameters. Finally, we rank all model variants based on six criteria, including the results of the pseudoprospective and retrospective tests, as well as other criteria such as model run time or consistency with the existing long-term model, using a multicriteria decision analysis approach. Based on this ranking, we propose the ETAS model calibrated on the Swiss catalog, and with the spatial background seismicity distribution of SUIhaz2015 as the ideal candidate for the first Swiss OEF system. All procedures and choices involved in the development and testing of the Swiss ETAS model follow recently established expert recommendations and can act as a reference in the establishment of time-variant earthquake forecasting models for other regions.
Thesis
Full-text available
The ancient problem of earthquake prediction is simultaneously fascinating and disheartening. Disheartening, because after millennia of observing and documenting earthquakes, we are still caught by surprise every time tremendous amounts of energy happen to be released from the Earth’s crust, destroying buildings, entire cities, and lives in the process. Fascinating, because the seemingly chaotic occurrence of earthquakes does exhibit astonishingly patterned behavior. Instead of earthquake prediction, which is commonly viewed to be the specification of location, time, and magnitude of the next devastating event, the prevalent practice today is that of earthquake forecasting. That is, for a pre-specified space-time-magnitude domain, the earthquake probability is calculated, ideally using all knowledge regarding occurrence patterns of seismic events. Two main branches of earthquake forecasting can be distinguished: Probabilistic Seismic Hazard Analysis (PSHA) combines time-independent earthquake probabilities with expected ground motions to form the basis of long-term protective measures in the form of building codes. Time-dependent operational earthquake forecasting (OEF) on the other hand aims to capture temporal fluctuations of earthquake probabilities based on which short-term protective measures mostly affecting people’s behavior for a narrow period of time can be taken. While PSHA is widely used around the world, OEF is by and large still in its infancy. Perhaps in its adolescence; models for time-dependent earthquake forecasting exist since decades and have been thoroughly tested ever since they were first described. All the more surprising is the lack of operational implementations of such models, with few, relatively recent exceptions. The most prominent models for time-dependent earthquake forecasting are EpidemicType Aftershock Sequence (ETAS) models. They model seismicity as the sum of background earthquakes and their cascades of aftershocks. Aftershock triggering is described using few empirically derived laws regarding the number of triggered events as well as their occurrence in space and time around their parent event. Earthquake magnitudes follow, like many other quantities in nature, a power law. In this thesis, the advancement of the field of earthquake forecasting is approached from multiple perspectives, with ETAS models taking a key role throughout. First, we address the topic of declustering, that is, identifying and removing earthquake clusters from catalogs, in the context of PSHA. We systematically assess the effect of declustering on the size distribution of the remaining mainshocks. By declustering synthetically generated earthquake catalogs, and afterwards calculating their b-values, we find that declustering introduces a bias to the earthquake size distribution of a catalog. This highlights the problematic practice of using declustered catalogs for PSHA. A possible approach to PSHA which does not rely on declustering is to quantitatively model earthquake clusters instead of removing them. The ETAS model provides a simple method to characterize the clustering behavior of a region through a set of parameters to be calibrated using a complete earthquake catalog of that region. Because the completeness of earthquake catalogs can vary in short-term and long-term time horizons for distinct reasons, we propose two extended parameter calibration techniques suited to deal with each of these cases. Both methods are shown to accurately invert the data-generating parameters from synthetic catalogs, and one of the methods is further tested in pseudo-prospective forecasting experiments in California. The results thereof reveal that accounting for short-term aftershock incompleteness when forecasting yields significantly improved forecasts. This improvement vanishes as the difference in magnitude between the newly included small events and the target forecasted events increases, which is possibly explained through a tendency of earthquakes to trigger similarly sized aftershocks. After discussing and refining existing methods for earthquake forecasting, we commit to advancing the operationalization of time-dependent earthquake forecasting in Switzerland. We develop six variants of ETAS models using the Swiss earthquake catalog, additional information extracted from the time-independent SUIhaz2015 model, or generic ETAS parameters calibrated on Californian data. One of the models uses the extended parameter calibration technique mentioned earlier. In seven-day pseudo-prospective forecasting experiments and 30- and 50-year retrospective consistency tests, we find that the purely ETASbased models are best suited for the first Swiss OEF system. The poor performance of the model using Californian ETAS parameters highlights the importance of locally applicable parameter inversion methods. Finally, we propose a novel approach to combine existing forecasting models into new ones. A question-driven ensemble (QDE) model is constructed by combining parameters of ingredient models, where the rules to combine parameters are driven by questions about the number of expected events, the spatio-temporal distribution of forecasted background earthquakes, and the spatio-temporal distribution of forecasted aftershocks. A QDE model can be viewed as a model which answers each question with a different ingredient model. We first describe flexible ETAS (flETAS) models which use nonparametric formulations of background seismicity and aftershock productivity during parameter inversion. The QDE approach is then tested by comparing combinations of flETAS models in pseudo-prospective forecasting experiments in Southern California and Italy. Our results show significant superiority of certain QDE models compared to their ingredient models, and we find striking similarities between the results in the two regions. This emphasizes the usefulness of the approach not only for the development of future earthquake forecasting models, but also for understanding strengths and weaknesses of existing ones.
Conference Paper
Full-text available
Nature is scary. You can be sitting at your home and next thing you know you are trapped under the ruble of your own house or sucked into a sinkhole. For millions of years we have been the figurines of this precarious scene and we have found our own ways of dealing with the anxiety. It is natural that we create and consume prophecies, conspiracies and false predictions. Information technologies amplify not only our rational but also irrational deeds. Social media algorithms, tuned to maximize attention, make sure that misinformation spreads much faster than its counterpart. What can we do to minimize the adverse effects of misinformation, especially in the case of earthquakes?
Article
Full-text available
We present rigorous tests of global short-term earthquake forecasts using Epidemic Type Aftershock Sequence models with two different time kernels (one with exponentially tapered Omori kernel (ETOK) and another with linear magnitude dependent Omori kernel (MDOK)). The tests are conducted with three different magnitude cutoffs for the auxiliary catalog (M3, M4 or M5) and two different magnitude cutoffs for the primary catalog (M5 or M6), in 30 day long pseudo prospective experiments designed to forecast worldwide M ≥ 5 and M ≥ 6 earthquakes during the period from January 1981 to October 2019. MDOK ETAS models perform significantly better relative to ETOK ETAS models. The superiority of MDOK ETAS models adds further support to the multifractal stress activation model proposed by Ouillon and Sornette [J. Geophys. Res.: Solid Earth 110, B04306 (2005)]. We find a significant improvement of forecasting skills by lowering the auxiliary catalog magnitude cutoff from 5 to 4. We unearth evidence for a self-similarity of the triggering process as models trained on lower magnitude events have the same forecasting skills as models trained on higher magnitude earthquakes. Expressing our forecasts in terms of the full distribution of earthquake rates at different spatial resolutions, we present tests for the consistency of our model, which is often found satisfactory but also points to a number of potential improvements, such as incorporating anisotropic spatial kernels, and accounting for spatial and depth dependant variations of the ETAS parameters. The model has been implemented as a reference model on the global earthquake prediction platform RichterX, facilitating predictive skill assessment and allowing anyone to review its prospective performance.
Article
Full-text available
We conclude this special issue on the Global Earthquake Forecasting System (GEFS) by briefly reviewing and analyzing the claims of non-seismic precursors made in the present volume, and by reflecting on the current limitations and future directions to take. We find that most studies presented in this special volume, taken individually, do not provide strong enough evidence of non-seismic precursors to large earthquakes. The majority of the presented results are hampered by the fact that the task at hand is susceptible to potential biases in data selection and possible overfitting. The most encouraging results are obtained for ground-based geoelectric signals, although the probability gain is likely small compared to an earthquake clustering baseline. The only systematic search on satellite data available so far, those of the DEMETER mission, did not find a robust precursory pattern. The conclusion that we can draw is that the overall absence of convincing evidence is likely due to a deficit in systematically applying robust statistical methods and in integrating scientific knowledge of different fields. Most authors are specialists of their field while the study of earthquake precursors requires a system approach combined with the knowledge of many specific characteristics of seismicity. Relating non-seismic precursors to earthquakes remains a challenging multidisciplinary field of investigation. The plausibility of these precursors predicted by models of lithosphere-atmosphere-ionosphere coupling, together with the suggestive evidence collected here, call for further investigations. The primary goal of the GEFS is thus to build a global database of candidate signals, which could potentially improve earthquake predictability (if the weak signals observed are real and false positives sufficiently uncorrelated between different data sources). Such a stacking of disparate and voluminous data will require big data storage and machine learning pipelines, which has become feasible only recently. This special issue compiled an eclectic list of non-seismic precursor candidates, which is in itself a valuable source of information for seismologists, geophysicists and other scientists who may not be familiar with such types of investigations. It also forms the foundation for a coherent, multi-disciplinary collaboration on earthquake prediction.
Article
Full-text available
Currently, one of the best performing earthquake forecasting models relies on the working hypothesis that the “locations of past background earthquakes reveal the probable location of future seismicity.” As an alternative, we present a class of smoothed seismicity models (SSMs) based on the principles of the epidemic‐type aftershock sequence (ETAS) model, which forecast the location, time, and magnitude of all future earthquakes using the estimates of the background seismicity rate and the rates of future aftershocks of all generations. Using the Californian earthquake catalog, we formulate six controlled pseudo‐prospective experiments with different combinations of three target magnitude thresholds: 2.95, 3.95, or 4.95 and two forecasting time horizons: 1 or 5 years. In these experiments, we compare the performance of (1) the ETAS model with spatially homogenous parameters, or GETAS; (2) the ETAS model with spatially variable parameters, or SVETAS; (3) three declustering‐based SSMs; (4) a simple SSM based on undeclustered data, and (5) a model based on strain rate data, in forecasting the location and magnitude of all (undeclustered) target earthquakes during many testing periods. In all conducted experiments, the SVETAS model comes out with consistent superiority compared to all the competing models. Consistently better performance of the SVETAS model with respect to declustering‐based SSMs highlights the importance of forecasting the future aftershocks of all generations for developing better earthquake forecasting models. Among the two ETAS models themselves, accounting for the optimal spatial variation of the parameters leads to stronger improvements in forecasting performance.
Article
Full-text available
The Collaboratory for the Study of Earthquake Predictability (CSEP) is a global cyberinfrastructure for prospective evaluations of earthquake forecast models and prediction algorithms. CSEP’s goals are to improve our understanding of earthquake predictability, advance forecasting model development, test key scientific hypotheses and their predictive power, and improve seismic hazard assessments. Since its inception in California in 2007, the global CSEP collaboration has been conducting forecast experiments in a variety of tectonic settings and at a global scale and now operates four testing centers on four continents to automatically and objectively evaluate models against prospective data. These experiments have provided a multitude of results that are informing operational earthquake forecasting systems and seismic hazard models, and they have provided new and, sometimes, surprising insights into the predictability of earthquakes and spurned model improvements. CSEP has also conducted pilot studies to evaluate ground-motion and hazard models. Here, we report on selected achievements from a decade of CSEP, and we present our priorities for future activities.
Article
Full-text available
The ETAS model is widely employed to model the spatio-temporal distribution of earthquakes, generally using spatially invariant parameters. We propose an efficient method for the estimation of spatially varying parameters, using the Expectation Maximization (EM) algorithm and spatial Voronoi tessellation ensembles. We use the Bayesian Information Criterion (BIC) to rank inverted models given their likelihood and complexity, and select the best models to finally compute an ensemble model at any location. Using a synthetic catalog, we also check that the proposed method correctly inverts the known parameters. We apply the proposed method to earthquakes included in the ANSS catalog that occurred within the time period 1981-2015 in a spatial polygon around California. The results indicate significant spatial variation of the ETAS parameters. We find that the efficiency of earthquakes to trigger future ones (quantified by the branching ratio) positively correlates with surface heat flow. In contrast, the rate of earthquakes triggered by far-field tectonic loading or background seismicity rate shows no such correlation, suggesting the relevance of triggering possibly through fluid-induced activation. Furthermore, the branching ratio and background seismicity rate are found to be uncorrelated with hypocentral depths, indicating that the seismic coupling remains invariant of hypocentral depths in the study region. Additionally, triggering seems to be mostly dominated by small earthquakes. Consequently, the static stress change studies should not only focus on the Coulomb stress changes caused by specific moderate to large earthquakes, but also account for the secondary static stress changes caused by smaller earthquakes.
Article
Full-text available
Over the last 50 years, we argue that incentives for academic scientists have become increasingly perverse in terms of competition for research funding, development of quantitative metrics to measure performance, and a changing business model for higher education itself. Furthermore, decreased discretionary funding at the federal and state level is creating a hypercompetitive environment between government agencies (e.g., EPA, NIH, CDC), for scientists in these agencies, and for academics seeking funding from all sources—the combination of perverse incentives and decreased funding increases pressures that can lead to unethical behavior. If a critical mass of scientists become untrustworthy, a tipping point is possible in which the scientific enterprise itself becomes inherently corrupt and public trust is lost, risking a new dark age with devastating consequences to humanity. Academia and federal agencies should better support science as a public good, and incentivize altruistic and ethical outcomes, while de-emphasizing output.
Article
Citizen science involves the general public in research activities that are conducted in collaboration with professional scientists. In these projects, citizens voluntarily contribute to the research aims set forward by the scientists through the collection and analysis of large datasets, without a preliminary technical background required. While advancements in information technology have facilitated the involvement of the general public in citizen science through online platforms, several projects still fail due to limited participation. This paper investigates the feasibility of using selected reward mechanisms to positively influence participation and motivations to contribute in a technology-mediated citizen science project. More specifically, we report the results of an empirical study on the effects of monetary and public online acknowledgement rewards. Survey indices and electroencephalographic measurements are synergistically integrated to offer a comprehensive basis for the analysis of citizens’ motivations. Our results suggest that both reward mechanisms could crowd-in participants in technology-mediated citizen science projects. With this study, we seek to lay the foundations for a private-collective research model, where the focus is the intensification of participation in technologymediated citizen science projects.
Article
In the context of data-mining competitions (e.g., Kaggle, KDDCup, ILSVRC Challenge), we show how access to an oracle that reports a contestant's log-loss score on the test set can be exploited to deduce the ground-truth of some of the test examples. By applying this technique iteratively to batches of m examples (for small m), all of the test labels can eventually be inferred. In this paper, (1) We demonstrate this attack on the first stage of a recent Kaggle competition (Intel & MobileODT Cancer Screening) and use it to achieve a log-loss of 0.00000 (and thus attain a rank of #4 out of 848 contestants), without ever training a classifier to solve the actual task. (2) We prove an upper bound on the batch size m as a function of the floating-point resolution of the probability estimates that the contestant submits for the labels. (3) We derive, and demonstrate in simulation, a more flexible attack that can be used even when the oracle reports the accuracy on an unknown (but fixed) subset of the test set's labels. These results underline the importance of evaluating contestants based only on test data that the oracle does not examine.
Article
We describe a cheating strategy enabled by the features of massive open online courses (MOOCs) and detectable by virtue of the sophisticated data systems that MOOCs provide. The strategy, Copying Answers using Multiple Existences Online (CAMEO), involves a user who gathers solutions to assessment questions using a "harvester" account and then submits correct answers using a separate "master" account. We use "clickstream" learner data to detect CAMEO use among 1.9 million course participants in 115 MOOCs from two universities. Using conservative thresholds, we estimate CAMEO prevalence at 1,237 certificates, accounting for 1.3% of the certificates in the 69 MOOCs with CAMEO users. Among earners of 20 or more certificates, 25% have used the CAMEO strategy. CAMEO users are more likely to be young, male, and international than other MOOC certificate earners. We identify preventive strategies that can decrease CAMEO rates and show evidence of their effectiveness in science courses.