Content uploaded by Yavor Kamer

Author content

All content in this area was uploaded by Yavor Kamer on Jan 22, 2021

Content may be subject to copyright.

Eur. Phys. J. Special Topics 230, 451–471 (2021)

c

EDP Sciences, Springer-Verlag GmbH Germany,

part of Springer Nature, 2021

https://doi.org/10.1140/epjst/e2020-000260-2

THE EUROPEAN

PHYSICAL JOURNAL

SPECIAL TOPICS

Regular Article

Democratizing earthquake predictability

research: introducing the RichterX platform

Yavor Kamer1,a, Shyam Nandan1,2, Guy Ouillon3, Stefan Hiemer1, and

Didier Sornette4,5

1RichterX.com, Mittelweg 8, Langen 63225, Germany

2Windeggstrasse, 5, 8953 Dietikon, Zurich, Switzerland

3Lithophyse, 4 rue de l’Ancien S´enat, 06300 Nice, France

4ETH Zurich, Department of Management, Technology and Economics, Scheuchzerstrasse

7, 8092 Zurich, Switzerland

5Institute of Risk Analysis, Prediction and Management (Risks-X), Academy for Advanced

Interdisciplinary Studies, Southern University of Science and Technology (SUSTech),

Shenzhen, P.R. China

Received 5 October 2020 / Accepted 7 October 2020

Published online 19 January 2021

Abstract. Predictability of earthquakes has been vigorously debated in

the last decades with the dominant -albeit contested -view being that

earthquakes are inherently unpredictable. The absence of a framework

to rigorously evaluate earthquake predictions has led to prediction ef-

forts being viewed with scepticism. Consequently, funding for earth-

quake prediction has dried out and the community has shifted its focus

towards earthquake forecasting. The ﬁeld has beneﬁted from collabo-

rative eﬀorts to organize prospective earthquake forecasting contests

by introducing protocols, model formats and rigorous tests. However,

these regulations have also created a barrier to entry. Methods that do

not share the assumptions of the testing protocols, or whose outputs

are not compatible with the contest format, can not be accommodated.

In addition, the results of the contests are communicated via a suite

of consistency and pair-wise tests that are often diﬃcult to interpret

for those not well versed in statistical inference. Due to these limit-

ing factors, while scientiﬁc output in earthquake seismology has been

on the rise, participation in such earthquake forecasting contests has

remained rather limited. In order to revive earthquake predictability

research and encourage wide-scale participation, here we introduce a

global earthquake prediction platform by the name RichterX. The plat-

form allows for testing of any earthquake prediction in a user-deﬁned

magnitude, space, time window anywhere on the globe. Predictions

are assigned a reference probability based on a rigorously tested real-

time global statistical earthquake forecasting model. In this way, we

are able to accommodate methods issuing alarm based predictions as

well as probabilistic earthquake forecasting models. We formulate two

metrics to evaluate the participants’ predictive skill and demonstrate

their consistency through synthetic tests.

ae-mail: yaver.kamer@gmail.com

452 The European Physical Journal Special Topics

1 Introduction

Earthquake prediction is a hard problem, which has remained an elusive holy grail

of seismology. Unfortunately, the current incentive structures are pushing researchers

away from hard problems where results are rarely positive. Negative results are less

likely to lead to a publication or a citation. While the utility of these quantities

is being put into question [8], they are still widely used as performance criteria in

academia.

To avoid negative results and public reactions associated with failed earthquake

predictions, the seismological community has mainly shifted its focus to descriptive

case-studies, long-term probabilistic hazard analysis, and probabilistic forecasting

experiments. However, by not engaging the prediction problem, we have eﬀectively

left it to be exploited by less reputable actors. These actors often emerge during

times of crisis, spreading disinformation leading to public anxiety. As a result, it

has become common to view any sort of prediction eﬀort with suspicion and often

negative prejudice, forgetting that the scientiﬁc principle requires hypotheses to be

tested rather than disregarded due to prior held beliefs. It can be argued that many

of these prediction claims are not formulated as falsiﬁable hypotheses, yet it is our

duty as scientists to assist those interested by providing guidelines, protocols, and

platforms facilitating the scientiﬁc method.

To help revive the earthquake prediction eﬀort and bring scientiﬁc rigor to the

ﬁeld, here we propose a general platform to facilitate the process of issuing and eval-

uating earthquake predictions. The platform is general as it allows for the testing of

both deterministic alarm based predictions and probabilistic forecasts. The common

metrics proposed to evaluate the respective skills of these two diﬀerent model classes

will put methods relying on diﬀerent theories, physical mechanisms and datasets on

the same footing. In this way, we aim to achieve larger participation, to facilitate the

inclusion of diﬀerent methods from various ﬁelds, and to foster collaborative learning

through regular feedback.

The paper is structured as follows. First, we introduce the general requirements

that a public earthquake prediction platform must satisfy and brieﬂy explain how

RichterX addresses these. Second, we describe the implementation of our global real-

time earthquake forecasting model that the RichterX platform uses to inform the

public of short term earthquake probabilities, and that is also taken as a reference

to evaluate all submitted predictions. Next, we introduce two complementary perfor-

mance metrics that allow us to assess the predictive performance of the participants.

Finally, we conduct synthetic contests with known ground truth and candidate mod-

els to test the consistency of the proposed metrics.

2 Characteristics of the earthquake prediction platform RichterX

Participation: Considering that earthquakes have a huge global impact, previous

and current forecasting experiments have reached out to only a small number of par-

ticipants [41,42]. Our platform aims to attract broader participation, not only from

the relatively small seismology community in academia but also other scientiﬁc disci-

plines including ﬁelds like machine learning, pattern recognition, data mining, remote

sensing, etc., active in information technologies and engineering applications. The

platform also encourages and rewards public participation to increase public aware-

ness of the earthquake hazard, motivate prediction eﬀorts, and, more importantly,

allow citizens to participate in a scientiﬁc challenge. Previous prediction experiments

have been criticized in this regard because they have treated the public as mere sub-

jects of a scientiﬁc experiment and sometimes as means to higher ends (i.e increased

The Global Earthquake Forecasting System 453

Fig. 1. The RichterX platform accessible at www.richterX.com, as viewed on a mobile

phone. (1) Forecast screen (a) Map colors indicate the monthly M5 earthquake count; black

circle represents the target area of the prediction; pop-up message reports the probability

according to the RichterX model; (b) Three tabs with a slider for adjusting the radius, time

duration and minimum magnitude of the prediction; (c) Toggle button to switch between

probabilities to-occur and not-to-occur; the number of events to-occur can be speciﬁed via

the up/down arrows; (d) Summary of the RichterX forecast in human-readable format. (2)

Forecast at the same location with radius reduced from 300 km to 100 km. (3) Forecast

screen with the Predict toggle on: (a) Slider for setting the prediction stake. (4) Prediction

preview screen showing a summary and the round at which the prediction closes.

public awareness) [43]. Participating in a global earthquake prediction contest will

allow the public to gain hands-on insight, internalize the current achievements and

diﬃculty of the problem. To achieve this, we have built a minimal graphical user

interface, compatible with desktop devices as well as most mobile phones, allowing

anyone to participate (see Fig. 1). We also provide an application programming inter-

face (API), allowing more sophisticated participants to submit predictions using a

computer algorithm. Moreover, we see the language barrier as one of the main fac-

tors hindering participation. We will, therefore, make the platform and the relevant

publications available in multiple languages.

Privacy: The negative connotation associated with failed predictions is an important

factor hampering prediction eﬀorts. The RichterX platform provides the participants

with the option to anonymize their identity, allowing them to focus on the scientiﬁc

question instead of worrying about the possible loss of reputation.

Transparency: Results and conclusions of any forecasting or prediction contest must

be accessible to the general public. The results of previous forecasting contests such as

CSEP and RELM have been published, but these papers are often behind paywalls.

The CSEP public website containing results of several models and tests, although

rather technical and not very intuitive for the general public, has since gone oﬄine. In

our view, transparency and ease of access to contest results, reinforce responsibility

and accountability.

Assuming that science is conducted to enhance public utility, the public is entitled

to know of its progression, which entails not only successes but also failures. Thus,

we are committed to making the results openly available to the public. In addition to

results about each earthquake (whether it was predicted or not), metrics regarding

the overall performance of each participant are updated on an hourly basis and in

454 The European Physical Journal Special Topics

the form of regularly issued public reports. The provision of this information will

counter false prediction allegations, serve as a veriﬁable track record, and allow the

public to distinguish between one-time guesses and skilled predictions.

Global coverage: Earthquakes do not occur randomly in space, but cluster on tec-

tonic features such as subduction zones, continental plate boundaries, volcanic regions

and intraplate faults. These active features span across the whole globe and produce

large earthquakes continuously. Previous forecasting experiments have focused mainly

on regions with very good instrumental coverage, available only in a small number

of countries (USA, Japan, New Zealand, Iceland, Italy, etc) [42]. Our goal is to fos-

ter a worldwide earthquake prediction eﬀort by providing the community and the

public with a global reference earthquake forecasting model. With the help of such

a reference model, our platform will be able to accommodate any regional model

by evaluating it against the same global baseline, putting regional added value in a

global perspective.

Real-time updates: Temporal clustering is another main feature of earthquake

occurrence: the probability that an earthquake will occur in a given space window

can vary greatly in time. Thus, if a prediction is to be evaluated according to a

reference model probability, such a reference model should be updated in near real-

time as soon as a new event occurs. Together with global coverage, this requirement

poses serious computational demands that have hindered the implementation of such

models. Recent advances in the ﬁeld of statistical earthquake modeling [17,24,33,34]

have allowed us to undertake this challenge. Having secured the computational and

hosting capabilities, the RichterX platform is able to provide the global community

with worldwide earthquake forecasts updated on an hourly basis.

Active learning: The reference model provided on the RichterX platform aims to

reﬂect the current state-of-art in statistical seismology. Hence it is not set in stone

but is subject to further improvements as the participants, through successful predic-

tions, eﬀectively highlight regions and time frames where the model performance is

lacking. In this way, the participants serve as referees continuously peer-reviewing the

reference model, which thereby is permanently improving, providing the community

with a higher bar to surpass.

Feedback mechanism: The goal of our prediction experiment is to provide the par-

ticipants with meaningful feedbacks, allowing them to test their hypotheses, models,

assumptions, and auxiliary data. Through repeated iteration of submitting predic-

tions, testing, and receiving feedback, we expect the participants to improve their

prediction performance. For the public observers, the results should be presented

transparently and succinctly, allowing for an intuitive comparison of the participants’

performances. Therefore, we have developed a skill assessment scheme that is both

easy to understand for the public and powerful in distinguishing between diﬀerent

predictive skills. It is important to note that the participants may lose interest if the

experiment takes too long to deliver results. The provided feedback may also lose its

relevance if not provided in a timely fashion. The RichterX platform issues results

on a bi-weekly basis and cumulative metrics spanning longer durations. In contrast,

consider that previous earthquake forecasting experiments by CSEP were carried out

for 5 years [27], with preliminary results being released only after 2.5 years [41].

Incentives: We hope that the opportunity to easily test and compare diﬀerent

hypotheses and models on a global scale would provide enough stimulus for the

academic community. At the same time, it is important to recognize that science can

be costly. Apart from the devoted time, many published studies are behind paywalls,

data processing requires expensive hardware, and some data sources can be subject

to fees. Thus, we believe amateur scientists, students, and the general public can

The Global Earthquake Forecasting System 455

be incentivized to participate by providing rewards, with “scientiﬁc microfundings”

similar to microcredits in the business ﬁeld. These can be monetary or in the form

of technical equipment or journal subscriptions. Some studies have raised concerns

that improper use of monetary rewards can reduce the intrinsic motivation of the

participants [28]. However, recent studies have shown that ﬁnancial rewards have a

positive eﬀect on engagement and satisfaction [4,12]. The delivery of such monetary

rewards is now much easier due to the popularization of crypto-currencies [10,26].

These recent developments allow us to ﬁnancially transact with the successful partic-

ipants without requiring a bank account, which almost half of the world’s population

does not have access to [5].

Social responsibility: It is essential to recognize that earthquake prediction is not

only a scientiﬁc goal but also a topic that has the potential to aﬀect the lives of many

people, especially those living in seismically active regions. The contest participants

should be aware that the events that they are trying to predict are not just num-

bers on a screen, but actual catastrophes causing human suﬀering. We believe that

providing a mechanism for expressing solidarity with the victims can help raise this

awareness. To this end, the RichterX platform encourages the participants to donate

their rewards to charitable organizations such as GiveWell, Humanity Road [44] and

UNICEF [11], which take part in global earthquake relief eﬀorts. Recent studies indi-

cate that the use of decentralized ledger technologies can improve transparency and

accountability in humanitarian operations [7]. Therefore, all donations on RichterX

are made using cryptocurrencies and recorded on the blockchain, allowing for anyone

to verify the amount and destination independently. In this way, we hope to prevent

a possible detachment between a community that engages with earthquakes from a

scientiﬁc perspective and people who suﬀer their physical consequences.

3 A global, real-time reference earthquake forecasting model

3.1 Introduction

The characteristics summarized in the previous section have emerged due to the expe-

rience gained from previous earthquake prediction and forecasting experiments. In

his address to the Seismological Society of America in 1976, Clarence Allen proposed

that an earthquake prediction should be assigned a reference probability indicating

how likely it is to occur by chance [2]. Indeed, the development of a model that

can assign a probability for any time window anywhere in the world has been one

of the main hurdles. There have been several accomplishments in the modeling of

global seismicity. Those eﬀorts began with models based on smoothing locations of

observed large earthquakes [22], progressing to combining past seismicity with strain

rates estimates [3,21]. Recently, the Global Earthquake Model working group led a

collaborative eﬀort to harmonize many regional models [39]. Although these models

are important milestones, they model seismicity as a memoryless, stationary process.

As a result, they do not capture the time-dependent aspect of earthquake occurrence.

The choice of treating earthquakes as a stationary process likely is motivated by the

risk assessment practices in the civil engineering and insurance industry. Yet, we

believe that, as the seismology community reassesses its assumptions and develops

more realistic models, the industry will, in turn, adapt to these changes.

Based on empirical laws derived from observed seismicity, the Epidemic Type

Aftershock Sequence (ETAS) model was introduced to enhance stationary statistical

earthquake models by accounting for the time-space clustering of seismicity [38].

Retrospective and prospective studies show that statistical models outperform models

derived from physical concepts such as rate-and-state, stress transfer, seismic gaps,

456 The European Physical Journal Special Topics

or characteristic earthquakes [6,23,48]. The recent developments in ETAS modeling

have not only made a global scale application possible, but they have also highlighted

the importance of abolishing assumptions about the distribution of simulated events

[34,35]. Details about the model development, testing, and prospects can be found in

the accompanying paper [36]. Here we describe the real-time online implementation

and operation of the model in the context of the platform.

3.2 Data

The RichterX platform employs a dedicated server, the so-called “grabber”, that

periodically connects to web-based earthquake data center feeds. The grabber com-

pares our current local database with the remote host for the addition of new or the

deletion of old events. If any change is detected, the grabber synchronizes our current

event catalog with the remote database. We are obtaining data from multiple global

agencies, such as the GFZ Geofon [14] and INGV Early-Est [29], but our primary

data source is the USGS ComCat feed [45]. Our data polling frequency is usually

around once every few minutes but can be increased automatically during elevated

seismic activity.

3.3 Model selection, calibration, and forward simulations

We have developed multiple candidate models that are diﬀerent variations of the

ETAS model or use diﬀerent input datasets. We use pseudo-prospective testing to

select among these models. See details of the experiment and competing models in

the accompanying paper [36]. In this procedure, only data recorded before a certain

time is considered and divided into sets of training and validation; competing models

are trained on the training data, and their forecasting performances are compared

on the validation set. The performances are averaged by moving forward in time and

repeating the tests. The top-ranking model, and its ﬁnal parameter set optimized

over the whole dataset, is deployed online on the platform servers. These servers use

the real-time earthquake event data provided by the grabber as input and conduct

forward simulations on an hourly basis. The result of these forward simulations is a

collection of synthetic event datasets that represent a spatio-temporal projection of

how global seismicity will evolve.

The ETAS model is stochastic, i.e., its output samples statistical distributions,

and therefore multiple forward simulations are needed to obtain an accurate rep-

resentation of the underlying probabilities. Each such simulation produces a global

synthetic catalog containing location, time, and magnitudes of events for the next 30

days. The total number of events produced at each real-time update can reach sev-

eral millions. These simulated events are uploaded onto our online database, where

they can be queried via the web-based user interface on www.richterX.com. Using this

interface, the participants can select any point on the globe, deﬁne a circular region, a

time window, and a magnitude range that they are interested in (see Fig. 1). The full

distribution of the simulated events within the user-speciﬁed time-space-magnitude

range is then used to calculate the probability of earthquake occurrence. In essence,

this probability corresponds to the number of simulations having events satisfying

the user-deﬁned criteria divided by the total number of simulations.

To cope with the computational demands of these simulations, we have scheduled

several servers to run periodically in a staggered fashion. In this way, we can assure

that the model forecasts are updated within less than an hour after each earthquake.

The servers are also distributed in diﬀerent locations to add redundancy in case of

service interruptions.

The Global Earthquake Forecasting System 457

4 How RichterX works

Earthquake predictions and forecasts have been issued and studied for decades. How-

ever, there is still confusion about their deﬁnition and proper formulation. We believe

it is essential to be strict about terminology. Science advances by accumulating evi-

dence in support or against hypotheses, and vague statements can become a missed

opportunity for testing and obtaining such evidence.

4.1 Earthquake forecast and earthquake prediction

We deﬁne an earthquake forecast as the statement of a probability that a minimum

number of events will occur within a speciﬁc time-space-magnitude window. There-

fore, a statement cannot be regarded as an earthquake forecast if either one of these

four parameters is omitted. For instance, the operational aftershock forecasts issued

by the USGS do not specify a space window for the issued probabilities (Field et al.,

2014; USGS, 2019), and therefore cannot be tested. Similarly, any ambiguity in the

parameters also renders the statement untestable. For instance, the statement “The

probability (that the San Andreas fault) will rupture in the next 30 years is thought

to lie somewhere between 35% and 70%” [20] does not satisfy the forecast deﬁnition

because both rupture size and occurrence probability are ambiguous. Unfortunately,

this is a common malpractice, and public oﬃcials often communicate probabilities by

giving a range [9]. The range is usually due to several models or diﬀerent scenarios

leading to diﬀerent probabilities. Using diﬀerent approaches and assumptions is to be

encouraged; however, the resulting probability should be communicated as a single

value. There exist various techniques on how models can be weighed and ensembled

according to their predictive performances [13].

We deﬁne earthquake prediction as the statement that a minimum number of

earthquakes will, or will not occur in a speciﬁc time-space-magnitude window. Under

our deﬁnition, an earthquake prediction always results in a binary outcome: it is either

true or false. This deﬁnition is more general than its commonly used predecessors

[18,20,49] because it considers the negative statement, that an earthquake will not

occur, as an equally valid earthquake prediction. By construction, if the probability of

an earthquake to occur in a space-time-magnitude window is P, the probability of an

earthquake not to occur is 1-P. While Pis often small, it can exceed 0.5 immediately

after large earthquakes or during seismic swarms. In such cases, a prediction of no

occurrence carries more information potential, as it refers to a more unlikely outcome.

In this way, negative predictions can serve as a feedback mechanism that counters

overestimated earthquake forecast probabilities.

Once an earthquake prediction is issued, it is considered to be in a pending (i.e.

open) state. The “To-occur” predictions, which predict the occurrence of an event

or events, are closed as true if the number of predicted events is observed in the

predeﬁned space-time-magnitude window, or as false if otherwise. The “Not-to-occur”

predictions, which predict that no event will occur, are closed as true if there are no

events in their predeﬁned space-time-magnitude windows, or as false if at least one

such event occurs.

The deﬁnitions of earthquake forecast and earthquake predictions are similar, as

they both refer to a constrained space-time-magnitude window. They diﬀer in that

the former conveys the expected outcome with a real number while the latter uses

a binary digit. In that sense, regardless of the observed outcome, a forecast carries

more information compared to a prediction. Forecasts are also more straightforward

to evaluate; any set of independent earthquake forecasts (i.e having non-overlapping

space-time windows) can be evaluated based on the sum of their log-likelihood, which

458 The European Physical Journal Special Topics

Fig. 2. (1) Ranks screen showing the scores for a selected round: (a) round beginning and

end dates; (b) table showing anonymized user names, skill class and current rX score, see

Section 3.3 for details. (2) M5+ target events colored as blue for predicted and red for

not-predicted: (a) magnitude vs time plot; (b) spatial distribution of the events. (3) Results

screen for a participant: (a) ﬁlter criteria; (b) results table list of prediction locations, round

number and status; (c) expandable row with further details. (4) Prediction details screen:

(a) magnitude-time plot highlighting the elapsed portion of prediction window with red;

(b) spatial distribution of events around the prediction circle; (c) prediction statement with

space-time-magnitude and event number details.

is analogous to their joint likelihood:

LL =

N

X

i=1

log (OiPi+ (1 −Oi) (1 −Pi)) (1)

where Nis the total number of forecasts, Pidenotes the probability of each forecast

and Oirepresents the outcome as 1 (true) or 0 (false). The larger the sum, the higher

the joint likelihood and hence the more skillful a forecast set is. The performance

evaluation of prediction sets is covered in the Performance Assessment section.

4.2 Rules and regulations

The goal of the RichterX platform is to organize a prediction contest that provides

timely feedback, skill assessment, and incentives for participation. Therefore, we have

tried to devise a system that fosters collaborative competition and rewards skill while

maintaining fairness. To attract broader participation, we tried to make the contest’s

regulations intuitive and straightforward without sacriﬁcing statistical rigor. Here,

we will present these rules and the reasons behind them.

4.2.1 Limited number of predictions

Each participant is allowed to place a maximum of 100 predictions every 24 hours.

This prediction budget is expressed in units of so-called “earthquake coins” (EQC). It

is recharged continuously in real-time, such that after ∼15 minutes, the participant

The Global Earthquake Forecasting System 459

accumulates 1 EQC and can submit another prediction. The accumulated budget

cannot exceed 100 EQC. Hence if participants want to submit more predictions, they

have to wait. In this way, we hope to encourage the participant to engage with the

platform regularly and follow the evolution of seismicity and think thoroughly before

using it to submit predictions. We expect the participants to perceive their limited

prediction budget as valuable, since it is scarce.

4.2.2 One user – one account

Each participant is allowed to have only one account on the platform. Since we

are providing monetary rewards as an incentive for public participation, users could

increase their chance of getting a reward by creating multiple accounts and placing

random predictions. We have addressed this by requiring each user to validate their

account via a mobile phone application, i.e., a chatbot. The bot runs on the messaging

platform Telegram and veriﬁes the user by requiring them to enter a secret code. If

the code is correct, the user is matched with their unique Telegram ID, which requires

a valid mobile phone number. All reward-related operations are veriﬁed through this

unique ID.

It is important to note that policies limiting participation rate and preventing

multiple accounts are common in online courses and contests such as Kaggle [37,47].

However, previous earthquake forecasting competitions conducted by CSEP, and also

its upcoming second phase CSEP2 [40], do not impose such policies. As a result,

participants who submit several versions of a model can increase their chance of

obtaining a good score, creating a disadvantage for participants who submit only a

single model.

4.2.3 Submitting earthquake predictions

The user interface provided on the RichterX platform allows the participants to query

our global reference model and obtain its forecasts within the following ranges: time

duration from 1 to 30 days, a circular region with radius from 30 to 300 km, lower

magnitude limit from M5+ to M9.9+ and a number of events from 1+ to 9+. Once

these parameters are set, the platform will report a probability of occurrence P(or

non-occurrence 1−P). The participant can then submit a prediction assigned with

this model probability. This probability is used to assess the participant’s prediction

skill by accounting for the outcome of their closed predictions.

In addition to the time, space, magnitude, and number parameters, the user can

also specify a so-called “stake” for each prediction. The stake acts as a multiplier

allowing the participants to submit the same prediction several times, provided that

it is within their prediction budget (EQCs). Therefore the stake can be thought of

as a proxy for the conﬁdence attributed to a prediction.

The reference model updates automatically on an hourly basis. Thus, when a

new earthquake occurs, the region in its vicinity becomes unavailable for submitting

predictions. Once the new earthquake is incorporated as an input and the model has

been updated, the region becomes available for the submission of new predictions.

This allows us to fairly compare users and our reference model, as both parties are

fed with the same amount of information. The radius of the blocked area (Rb) scales

as a function of the event magnitude according to the empirical magnitude-surface

rupture length scaling [46] given in the following equation.

Rb= 10 + 10−3.55+0.74M(km) (2)

460 The European Physical Journal Special Topics

This assures that the projection of the fault rupture, where most aftershocks are

expected to occur, remains within the restricted region regardless of the rupture

direction. The additional 10 km in the Rbterm accounts for the typical global location

uncertainty.

4.2.4 Evaluation of earthquake predictions

Target events are all M≥5 events, as reported by the USGS ComCat [45]. Predic-

tions are evaluated on a bi-weekly round basis. A time frame of only 14 days may

seem too short, yet our target region is the whole Earth rather than a speciﬁc local-

ity. To put this in perspective, the ﬁrst regional forecasting experiment, the Regional

Earthquake Likelihood Models (RELM), was limited to the state of California, USA,

took place during a 5 year period of 2006–2010 and had a total of 31 target events

[27]. This corresponds to roughly half of the global bi-weekly M5+ event count (mean

63, median 58 since 1980).

5 Performance assessment metrics

5.1 Conditions for proper metrics

In the case of probabilistic forecasts, a scoring rule is said to be proper if it incen-

tivizes the forecaster to convey their actual estimated probability [19]. In other words,

a proper scoring rule does not aﬀect the probability issued by the forecaster. An

improper scoring rule, however, can be exploited by modifying one’s forecast in a

certain way speciﬁc to the scoring rule. For example, if a scoring rule does not penal-

ize false positives, then participants can gain an advantage by issuing more alarms

than they usually would have. Deterministic predictions do not convey the informa-

tion of probability; thus, the deﬁnition of properness given above becomes irrelevant

[19]. Yet it is useful to consider a more general deﬁnition: a proper scoring rule, in

the context of a contest, aligns the goals of the organizers and the incentives of the

participants.

One goal of the RichterX platform is to encourage broad public and academic

participation from various ﬁelds of expertise. Therefore, we need a scoring rule that

is statistically rigorous, easy to understand, and applicable on short time scales. The

scoring rule should also ensure that the public participants are rewarded propor-

tionally to their predictive skills, as opposed to a winner-takes-all approach, while

incentivizing their regular participation. Another important goal is to provide the sci-

entiﬁc community with a generalized platform where diﬀerent models and hypotheses

(be it alarm based or probabilistic) can be evaluated to determine performance and

provide feedback to researchers. For this second goal, the scoring rule needs to be

exclusively focused on skill and be generally applicable. To achieve both goals, we

have chosen to implement two scoring strategies that complement each other. These

are the RichterX score and the information ratio score. In the following section,

we will describe how these two scores are implemented and used jointly in the

competition.

5.2 RichterX Score (rX)

The deﬁnition of the rX score is straightforward; each submitted prediction counts

as a negative score equal to the prediction stake (s). If a prediction comes true, the

The Global Earthquake Forecasting System 461

value of the stake(s) multiplied by the odds (1/p) is added to the score; if a prediction

fails, the score remains at −s:

∆R=(s1

p−sif true

−sif false (3)

This can be rewritten as

∆R=Os−sp

p+ (O−1) s(4)

where O= 1 if the prediction is true and O= 0 if false.

Our goal is to incentivize the participants to challenge our model and highlight

regions or time frames where it can be improved; thus, we want to reward those who

perform better than our reference model. The expected gain from any prediction,

according to the model, can be written as the probability-weighted sum of wins and

losses:

E[R] = ps1

p−s+ (1 −p) (−s) = 0.(5)

The expected gain of a participant is thus zero (positive scores indicating a better

performance than our model). The signiﬁcance of a positive score (i.e. the probability

for a participant to improve on it by chance assuming that the model is correct) has

to be estimated for each participant. The latter performance estimator will become

more reliable as a participant accumulates independent submitted predictions.

At the end of each bi-weekly contest round, each participant’s score is calculated

using all their Npredictions closed during the round. Thus, summing expression (4)

over all Npredictions yields:

R=

N

X

i=1

OiSi−SiPi

Pi+ (Oi−1) Si(6)

where Siis the stake, Piis the prediction probability given by the reference model,

and Oiis a binary variable representing the prediction results as 1 (true) or 0 (false).

Monetary reward is distributed proportionally among all participants with positive

scores.

The scores are reset to 0 at the beginning of each round to encourage new partic-

ipants to join the competition. However, resetting the scores each round introduces a

problem. Since the only participation cost is the invested time, participants who see

that their negative scores are reset at the beginning of each round are incentivized

to make low-probability/high reward predictions, especially towards the end of the

round. If a few such predictions come true, the user can get a positive score, and if

they end up with a negative score, the participants would just have to wait for the

next round for their scores to be reset, and then they can try again. This would be

problematic because the participants can start treating the experiment as a game of

chance with no penalty for false predictions, rather than a contest of skill.

To counter this, we apply a carry-over function that introduces a memory eﬀect

for negative scores: if a participant has a negative score not less than −100 at the

end of the round, they carry-over 10% of this negative score to the next round as a

penalty. The carry-over percentage increases proportionally with the amount of the

negative score and caps of at 90% as given in the following equation:

∆Ct=(max {|∆Rt−1|/1000,0.9}∆Rt−1if ∆Rt−1<−100

0.1∆Rt−1if 0 >∆Rt−1≥ −100 (7)

462 The European Physical Journal Special Topics

Hence, a participant with a score of −200 would carry over a penalty of −40,

while a user with −1000 would carry over −900 to the next round. In this way,

participants are incentivized to obtain positive scores consistently, instead of inter-

mittently. Nevertheless, since predictions can be submitted at any time, a participant

may stop submitting new predictions as soon as they have reached a positive score.

This problem, which has already been discussed by [19], is somewhat alleviated by

distributing the reward proportionally to the participant’s score with respect to the

combined total of all other positive participants. Therefore, there is always an incen-

tive to continue participating as other users become positive and start claiming larger

portions of the ﬁxed overall reward.

Another aspect of the rX score is that it is a function of the prediction stake.

Two participants with the same predictions but diﬀerent stakes will get diﬀerent

scores. Assuming they possess some information gain, regular participants will be

able to submit the same prediction repeatedly, thereby increase their stake, and

obtain higher scores compared to those who follow a diﬀerent strategy, testing their

predictions regardless of their returns. This makes sense in the context of a com-

petition where the participants provide added value by testing our reference model

through their predictions. Yet, as we are not necessarily interested in the optimiza-

tion of staking strategies, there is also a need to assess the predictive skill of each

participant regardless of their staking weights. For this purpose, we employ a second

metric.

5.3 Information ratio score (IR)

To assess the predictive skill of a participant, we need to answer the following two

questions: Firstly, how much better is the participant’s performance compared to the

reference model, and secondly, is this performance signiﬁcant. To answer the ﬁrst

question, we calculate a metric called the “information ratio” (IR):

IR =

1

N

N

P

i=1

Oi

1

N

N

P

i=1

Pi

(8)

IR is essentially the participant’s success rate (fraction of true predictions among

all predictions) divided by the reference model probability averaged over all predic-

tions (i.e., the model’s expected success rate) of the participant. This formulation

implies that there is an upper bound of IR = 1/min(Pi) and incentivizes the partici-

pants to achieve higher success rates in regions and time frames for which the model

gives low probabilities. Assuming that the reference model is true, the expected IR

value for any set of predictions would tend to 1.

To answer the question of whether a participant’s IR is statistically signiﬁcant, we

employ Monte Carlo sampling to build an IR distribution given their set of submitted

predictions. This distribution is independent of the actual prediction outcomes as

we sample the model probability of each prediction Pito generate several possible

outcomes O0

i

xi∈U(0,1)

O0

i=(1 if xi< P

0 if xi≥Pi

(9)

where U(a, b) is the uniform distribution within bounds aand b. We then calculate

the IRmof each outcome set according to equation (8), where mdenotes the index of

The Global Earthquake Forecasting System 463

the Monte Carlo sample. This forms the null-distribution that is used to benchmark

the actual IR value of the participant. The ratio of the sampled model IR values

that are above or equal to the participant’s value (α) can then be interpreted as the

probability of observing an IR at least as high as the participant’s, i.e., the p-value

under the null hypothesis that the reference model is true:

gm=(1 if IRm≥IRu

0 if IRm<IRu

α=1

M

M

P

m=1

gm

(10)

where IRuis the participant’s information ratio, and Mis the number of Monte

Carlo samples used to sample the distribution. If α≤0.05, then the participant is

considered to be signiﬁcantly better than the reference model.

5.4 Accounting for overlapping predictions

In all previous equations, we have assumed that the submitted predictions are inde-

pendent, i.e., that they do not overlap in space and time. This assumption simpliﬁes

the derivations of expected success rate, allowing for probabilities of independent

predictions to be averaged, and also makes it easier to calculate signiﬁcance levels by

sampling each prediction independently during the Monte Carlo procedure. However,

the participants are free to submit predictions at anytime, anywhere on the globe.

Since predictions are submitted as circles with a maximum radius, participants who

want to cover larger areas completely will have to submit several overlapping predic-

tions. We also see that some participants re-issue predictions at the same locations

when an earthquake does not occur (assuming some local stress accumulation or a

characteristic period) or when it occurs (expecting aftershocks). Updating a hypoth-

esis as new information becomes available is the hallmark of the scientiﬁc method.

In the ideal case, if a precursory signal becomes gradually more prominent as an

earthquake approaches, one can expect overlapping predictions with narrower space-

time windows to be issued. Therefore instead of constraining the participants by

forbidding overlapping predictions, we prefer to deal with such predictions.

The question of evaluating overlapping predictions has been investigated previ-

ously by Harte and Vere-Jones, Harte et al. [15,16], who introduced the entropy score

as a pseudo-likelihood to evaluate M8 predictions, which are expressed as a set of

overlapping circles [25]. The entropy score is rather complicated and “awkward”, as

the authors put it, thus we have refrained from using it as we would like to keep the

performance criteria as intuitive as possible for the general public. The Molchan dia-

gram, which accounts for the total time-space volume covered by prediction alarms,

can also be employed to deal with predictions overlapping in space and time [31,32].

It is worth noticing that Molchan and Romashkova [30] successfully adopted their

methodology to the M8 predictions using speciﬁc features, such as constant large

circle sizes and large magnitudes, to assess its predictive skill. This is rather diﬀer-

ent from our application, which involves evaluating and comparing diﬀerent sets of

predictions that can each be a mix of to-occur and not-to-occur, with varying circle

sizes.

For the particular case of the RichterX prediction contest, the rX score is additive

and already incorporates the concept of “stake” that has the same eﬀect as re-issuing

the same prediction; thus, it does not require any modiﬁcation. However, the over-

lapping predictions constitute a problem for the IR score and its signiﬁcance α. This

can be seen with a simple example of two non-overlapping to-occur predictions that

464 The European Physical Journal Special Topics

Fig. 3. Left: a set of overlapping predictions showing time and space domain. Right: a sample

of 4 sets containing only non-overlapping predictions obtained by the selective sampling

procedure described in the text.

require two earthquakes to come true. In comparison, two identically overlapping

predictions would come true with a single event. Intuitively, it follows that true inde-

pendent predictions are “worth” more in terms of signiﬁcance than overlapping ones.

To take into account the presence of overlapping predictions, we employ a sampling

approach, whereby we begin with the full set of overlapping and non-overlapping pre-

dictions of each participant and, by selective sampling, create sets consisting only of

non-overlapping predictions (see Fig. 3). The IR metric and the associated αvalues

are calculated for each of these sampled sets, and the resulting averages are assigned

as the participant’s skill and signiﬁcance. The selective sampling of each participant’s

predictions is performed in the following steps:

1. Considering all closed predictions in a given round, we calculate the distance

between the prediction centers for all predictions that overlap in the time domain.

2. If the distance between the centers of two predictions that overlap in time is less

than the sum of their radii, then these predictions are labeled as “overlapping”.

Predictions that do not overlap with any other prediction are labeled as “non-

overlapping”.

3. After all predictions are labeled, the overlapping predictions are put in the “can-

didate” set. We begin by randomly selecting one of these candidates and remove

all the predictions that overlap it (both in space and time).

4. We put the selected prediction in the “selection” set and repeat the procedure

by randomly selecting one of the predictions in the candidate set. We repeat this

until the candidate set is exhausted.

5. We then add all the non-overlapping predictions to the selection set. This set

constitutes a sample set of independent prediction that we then use to calculate

the IR score and αvalues as described above. We calculate an average value for

both metrics by repeating this sampling procedure several times.

Based on the signiﬁcance threshold (α≤0.05) combined with the IR metric, we

categorize the participants into the following skill classes: (A) signiﬁcant participants

with IR ≥2 and at least 5 independent predictions; (B) signiﬁcant participants with

IR ≥1.33 and at least 5 independent predictions; (C) participants with IR>1 but

who fail to satisfy either the signiﬁcance, prediction number or IR criteria to become

an A or B; (D) all participants with IR <1. It can be argued that requesting a

minimum number of predictions may aﬀect the participants’ behavior; some might

start placing predictions that they would not have placed just to reach the limit. We

The Global Earthquake Forecasting System 465

concede that the contest regulations will aﬀect participant behavior in one way or

another and deem such eﬀects admissible as long as they do not hinder the goals

of the competition. Participants who achieve skill classes of A or B are rewarded

additionally to the reward distributed proportionally to the rX score. By distribut-

ing rewards according to two diﬀerent but complementary performance metrics, we

hope to make exploiting a single metric less enticing and to incentivize demonstrating

actual skill. The rX score is relatively easier to calculate since the score of each new

prediction is simply added to the current balance. However, the skill classes based on

the IR score are more diﬃcult to calculate because each new prediction aﬀects the

average prediction probability and estimating signiﬁcance requires numerical simu-

lation. We acknowledge that such statistical concepts can be intimidating for the

general public and hinder participation. Therefore, we have implemented a recom-

mendation algorithm that uses the currently closed predictions of each participant

to suggest an additional number of true predictions with a probability suﬃcient to

achieve skill classes B or A. The ﬂowchart of the recommendation algorithm is given

in Figure 4. In essence, the algorithm estimates what is the minimum number and

highest reference model probability of additional true predictions that would satisfy

both the signiﬁcance and the IR criteria. If the participant has achieved skill class

B, the algorithm would recommend predictions for achieving skill class A, while for

classes C and D the recommendation would aim at B. In principle, similar recom-

mendations can be calculated not necessarily for the minimum but for any number of

predictions; the minimum probabilities would increase as the number of predictions

increases. Figure 5shows the outputs of the recommendation system based on the

closed predictions of two diﬀerent participants.

6 Synthetic consistency tests

We proposed the two score metrics introduced in the previous section to assess the

predictive skills of individual participants as well as probabilistic forecasting mod-

els that can be sampled with deterministic predictions through an application pro-

gramming interface. Fairness in reward distribution and reputation based contest is

an essential factor that motivates participants. Moreover, from a scientiﬁc point of

view, it is crucial to establish that the proposed metrics are powerful enough to dis-

criminate between good and bad models such that research can be focused in more

promising directions.

To test the consistency of the proposed metrics, we conduct a simpliﬁed synthetic

ranking test. The test consists of three main components: (1) the ground truth model

that generates the events; (2) several competing models that issue predictions trying

to predict the generated events; (3) a reference model that is used as the basis of

prediction probabilities entering in the rX and IR metrics. The synthetic prediction

contest is carried out by all of the competing models issuing Nppredictions based on

their expectations and the reference model probability. The outcome of the submitted

predictions is dictated by the ground truth model. The scores are then calculated

using the outcomes and the reference model probabilities assigned to the predictions

submitted by the candidate models. The synthetic test is carried out in these steps:

1. The ground truth model is deﬁned as a 1D probability vector with Npelements

T=U(0.01,0.99)

2. Outcomes, occurrence, or no-occurrence, are generated by sampling each of the

individual probabilities in the Tvector to create an outcome vector Oas per

equation (9)

3. A set of mprogressively worse candidate models Ciis created by perturbing the

ground truth model by adding uniform random noise with increasing amplitude.

466 The European Physical Journal Special Topics

Fig. 4. Flow chart of the recommendation algorithm estimating the probability and number

of true predictions needed to achieve a higher skill class. SR: success rate, ARP: average

RichterX probability, IRtar: target information ratio.

The perturbed probabilities are capped to remain within the [0.01, 0.99] interval

xi∈U(0,1)

Ci= max min T+i(x−0.5)

m,0.99,0.01(11)

4. For each of the Nppredictions, a candidate model indexed jdecides to issue

ato-occur or not-to-occur prediction by choosing the prediction type with the

maximal expected return.

The Global Earthquake Forecasting System 467

Fig. 5. (1) Summary of prediction outcomes for a participant: (a) Two bar charts indicating

outcome of to-occur and not-to-occur predictions as false (red) or true (blue), the length

of each bar scales with the prediction stake; (b) Map showing the location of the to-occur

and not-to-occur predictions as purple up or beige down arrows. (2) Skill assessment plot

for a participant: (a) Current skill class, from A to D, signiﬁcance value (1−α) and number

of independent predictions; (b) Recommendation containing the number of true predictions

with a given probability needed to achieve a higher skill class; (c) Success rate vs average

RichterX probability. Participants with less than 5 independent predictions are shown as

triangles, others as squares. Colors indicate the skill class; A bright green, B green, C yellow,

D gray. The selected participant is indicated by a symbol with thicker edges. (3) Same as

(1) but for a diﬀerent participant. (4) Same as (2) but for a diﬀerent participant.

E[Occ] = Ci(j)1−R(j)

R(j)−(1 −Ci(j))

E[Noc] = (1 −Ci(j)) R(j)

1−R(j)−Ci(j)

(12)

where E[Occ] and E[Noc] denote a candidate model’s expected return for a to-

occur and not-to-occur prediction respectively.

5. All the issued predictions are assigned as true or false according to the outcome

vector O, and each candidate model receives rX and IR scores.

We expect the consistency to improve with an increasing number of predictions

Np. Thus, we conducted the synthetic test for Np= [100,1000,5000]. Figure 6shows

the results of the synthetic tests for the case when the reference model is chosen

as the median (Ci= 250) and when the reference model is chosen as the worse

(Ci= 500), respectively. We can see that, as the number of predictions increases, the

ﬂuctuation in both score metrics decreases, highlighting a linear relationship with

the true rank. The skill of the reference model relative to the candidate models also

plays an important role in interpreting the consistency results.

Since we created the candidate models by adding an increasing amount of noise

to the ground truth model we also know the true ranking. We can study the scoring

consistency by comparing the ranking obtained by each metric to the true ranking

via the Kendall’s rank correlation coeﬃcient τ[1] in the following equation, where

pcand pdare concordant (i.e having the same relative order) and discordant pairs,

and nis the number of elements being ranked:

468 The European Physical Journal Special Topics

Fig. 6. The rX (left column) and IR (right column) scores for the 500 competing models

resulting from Npindependent predictions. Increasing Npvalues are shown in darker shades.

Top row plots the results when the reference model is chosen as rank 250, i.e the average

model. Bottom row corresponds to the reference chosen as rank 500, i.e. the worst model.

τ=2 (pc−pd)

n(n−1) (13)

The coeﬃcient τis equal to 1 when the two rankings are exactly the same; −1

when they are the reverse of each other and values close to zero when they are

unrelated. Figure 7plots the τcoeﬃcients as a function of increasing number of

predictions, for diﬀerent reference model choices. We observe that the IR score is

more powerful in retrieving the original ranking, resulting in consistently higher τ

values, both at small and large number of predictions and regardless of the choice of

the reference model

Another important observation is that, when the reference model is chosen as the

best model (i.e very close to the generating one), the ranking becomes inconsistent.

Remember from equation (5), that, if the reference model is the generating model,

we expect the rX score for any set of predictions to have an expected value of zero.

Similarly, for the IR metric, the expected value would be 1 (see Eq. (9)). Figure 8

conﬁrms this by plotting the individual model scores when the reference model is

chosen as the best. For any number of predictions, the score values ﬂuctuate around

0 and 1 for rX and IR respectively, explaining the near zero rank correlation values

observed in Figure 7.

The Global Earthquake Forecasting System 469

Fig. 7. Rank correlation between true rank and inferred ranked of 500 models as a function

of increasing number of predictions. Results based on IR and rX score are shown as solid

and dashed lines respectively. Diﬀerent color are used to show the results when the true

rank of the reference model is chosen as 1(best), 100, 250, 400 or 500 (worst).

Fig. 8. The same as Figure 6when the reference model is chosen as rank 1, i.e the best

model.

It is important to note that the complete inability to distinguish between worse

models will occur only when the reference model is very close to the truth. In the

context of a prediction contest, this would mean that we have reached our ﬁnal goal

and that further research cannot provide added value. In reality, we know that our

current models have a lot of room for improvement. We have demonstrated that,

when this is the case, the proposed metrics are able to rank both models better

and worse than the reference (see Figs. 6and 7). Nevertheless, if there are concerns

regarding the ranking of models that are much worse than the RichterX reference

model, these can be easily addressed by invoking a very weak reference model, such

as smoothing past seismicity assuming Poissonian rates.

470 The European Physical Journal Special Topics

7 Conclusion

The RichterX platform aims to rekindle the earthquake prediction eﬀort by organiz-

ing a prediction contest invoking large scale participation both from the public and

from diﬀerent ﬁelds of academia. To facilitate this contest, we have implemented a real-

time global earthquake forecasting model that can estimate the short term earthquake

occurrence probabilities anywhere in the world. On one hand, this platform makes the

contest highly accessible, allowing anyone with just a mobile phone to submit a pre-

diction. On the other hand, it allows the public to query earthquake occurrence proba-

bilities in real-time for any speciﬁc region, which becomes vital information, especially

after the large mainshocks. In this way, with a single platform we hope to achieve three

main goals: (1) Inform the public about short-term earthquake probabilities anywhere

on the globe in real time; (2) Serve as a public record empowering the media and public

oﬃcials to counter claims of earthquake prediction after the fact; (3) Allow researchers

from various ﬁelds to easily participate in an earthquake prediction contest and chal-

lenge state-of-the-art global statistical seismology models.

Wide scale participation has the potential to bring forward and allow for the test-

ing of various data sources that may or may not have precursory information. Current

earthquake forecasting contests, which rely on systematic reporting of earthquake

rates for large regions in predeﬁned space-time resolutions, are not suitable for the

testing of intermittent observations, such as earthquake lights, groundwater chem-

istry, electromagnetic and thermal anomalies, and so on. The RichterX platform can

easily accommodate alarm based predictions based on such data sources. In addi-

tion, through synthetic ranking tests, we have shown that the proposed performance

metrics can distinguish between probabilistic models that are better or worse than

the reference model, and retrieve the true performance ranking.

Publisher’s Note The EPJ Publishers remain neutral with regard to jurisdictional claims

in published maps and institutional aﬃliations.

References

1. H. Abdi, in Encyclopedia of Measurement and Statistics (Sage, Thousand Oaks, CA,

2007), p. 508–510

2. C.R. Allen, Bull. Seismol. Soc. Am. 66, 2069 (1976)

3. P. Bird, D.D. Jackson, Y.Y. Kagan, C. Kreemer, R.S. Stein, Bull. Seismol. Soc. Am.

105, 2538 (2015)

4. F. Cappa, J. Laut, M. Porﬁri, L. Giustiniano, Comput. Human Behav. 89, 246 (2018)

5. A. Chaia, A. Dalal, T. Goland, M.J. Gonzalez, J. Morduch, R. Schiﬀ, Half the world

is unbanked: ﬁnancial access initiative framing note (Financial Access Initiative, New

York, 2009)

6. R. Console, M. Murru, F. Catalli, G. Falcone, Seismol. Res. Lett. 78, 49 (2007)

7. G. Coppi, L. Fast, Blockchain and distributed ledger technologies in the humanitar-

ian sector (Hpg commissioned report, London, 2019), http://hdl.handle.net/10419/

193658

8. M.A. Edwards, S. Roy, Academic research in the 21st Century: Maintaining scientiﬁc

integrity in a climate of perverse incentives and hypercompetition (2017), https://www.

liebertpub.com/doi/abs/10.1089/ees.2016.0223

9. Erdst¨osse im Wallis, Zahlreiche Erdst¨osse schrecken Menschen im Wallis auf (2019),

https://www.tagesanzeiger.ch/panorama/vermischtes/naechtliches-erdbeben-

erschuettert-das-wallis/story/13668757

10. A. Extance, Nature 526, 21 (2015)

11. C. Fabian, Innov. Technol. Governance Globalization 12, 30 (2018)

12. D. Fiorillo, Ann. Public Cooperative Econ. 82, 139 (2011)

The Global Earthquake Forecasting System 471

13. D. Fletcher, Model averaging (Springer, 2019)

14. GEOFON, Deutsches GeoForschungszZntrum GFZ (1993)

15. D. Harte, D. Vere-Jones, Pure Appl. Geophys. 162, 1229 (2005)

16. D. Harte, D.F. Lp, M. Wreede, D. Vere-Jones, Q. Wang, New Zealand J. Geol. Geophys.

50, 117 (2007)

17. S. Hiemer, Y. Kamer, Seismol. Res. Lett. 87, 327 (2016)

18. D.D. Jackson, Proc. Nat. Acad. Sci. USA 93, 3772 (1996)

19. I.T. Jolliﬀe, Meteorol. Appl. 15, 25 (2008)

20. T.H. Jordan, Seismol. Res. Lett. 77, 3 (2006)

21. Y.Y. Kagan, Worldwide Earthquake Forecasts (2017)

22. Y.Y. Kagan, D.D. Jackson, Geophys. J. Int. 143, 438 (2000)

23. Y.Y. Kagan, D.D. Jackson, R.J. Geller, Seismol. Res. Lett. 83, 951 (2012)

24. Y.Kamer, S. Hiemer, J. Geophys. Res. Solid Earth 120, 5191 (2015)

25. V.I. Keilis-Borok, V.G. Kossobokov, Phys. Earth Planet Inter. 61, 73 (1990)

26. Y.M. Kow, First Monday 22 (2017)

27. Y.-T.T. Lee, D.L. Turcotte, J.R. Holliday, M.K. Sachs, J.B. Rundle, C.-C.C. Chen, K.F.

Tiampo, Proc. Nat. Acad. Sci. USA 108, 16533 (2011)

28. M.R. Lepper, D. Greene, The hidden costs of reward: New perspectives on the psychology

of human motivation (Lawrence Erlbaum, Oxford, England, 1978)

29. A. Lomax, A. Michelini, Pure Appl. Geophy. 170, 1385 (2013)

30. G. Molchan, L. Romashkova, [arXiv: 1005.3175](2010)

31. G.M. Molchan, Phys. Earth Planet. Inter. 61, 84 (1990)

32. G.M. Molchan, Tectonophysics 193, 267 (1991)

33. S. Nandan, G. Ouillon, S. Wiemer, D. Sornette, J. Geophys. Res. Solid Earth 122, 5118

(2017)

34. S. Nandan, G. Ouillon, D. Sornette, S. Wiemer, Seismol. Res. Lett. 90, 1650 (2019)

35. S. Nandan, G. Ouillon, D. Sornette, S. Wiemer, J. Geophys. Res. Solid Earth 124, 8404

(2019)

36. S. Nandan, Y. Kamer, G. Ouillon, S. Hiemer, D. Sornette, Eur. Phys. J. Special Topics

230, 425 (2021)

37. C.G. Northcutt, A.D. Ho, I.L. Chuang, Comput. Edu. 100, 71 (2016)

38. Y. Ogata, J. Am. Stat. Assoc. 83, 9 (1988)

39. M. Pagani, J. Garcia, D. Monelli, G. Weatherill, A. Smolka, Ann. Geophys. 58 (2015),

https://www.annalsofgeophysics.eu/index.php/annals/article/view/6677

40. W. Savran, P. Maechling, M. Werner, D. Schorlemmer, D. Rhoades, W. Marzocchi, J.

Yu, T. Jordan, The Collaboratory for the Study of Earthquake Predictability Version

2 (CSEP2): Testing Forecasts that Generate Synthetic Earthquake Catalogs (EGUGA,

2019), p. 12445

41. D. Schorlemmer, J.D. Zechar, M.J. Werner, E.H. Field, D.D. Jackson, T.H. Jordan,

Pure Appl. Geophys. 167, 859 (2010)

42. D. Schorlemmer, M.J. Werner, W. Marzocchi, T.H. Jordan, Y. Ogata, D.D. Jackson,

S. Mak, D.A. Rhoades, M.C. Gerstenberger, N. Hirata, M. Liukis, P.J. Maechling,

A. Strader, M. Taroni, S. Wiemer, J.D. Zechar, J. Zhuang, Seismol. Res. Lett. 89,

1305 (2018)

43. A. Sol, H. Turan, Sci. Eng. Ethics 10, 655 (2004)

44. K. Starbird, L. Palen, Working & sustaining the virtual disaster desk, in Proceedings

of the ACM Conference on Computer Supported Cooperative Work, CSCW, New York,

USA, 2013 (ACM Press, New York, USA, 2013)

45. U.S. Geological Survey Earthquake Hazards Program, Advanced National Seismic Sys-

tem (ANSS) comprehensive catalog of earthquake events and products (2017)

46. D.L. Wells, K.J. Coppersmith, Bull. Seismol. Soc. Am. 84, 974 (1994)

47. J. Whitehill, Climbing the kaggle leaderboard by exploiting the log-loss oracle, Technical

report (2018)

48. J. Woessner, S. Hainzl, W. Marzocchi, M.J. Werner, A.M. Lombardi, F. Catalli,

B. Enescu, M. Cocco, M.C. Gerstenberger, S. Wiemer, J. Geophys. Res. 116, 1 (2011)

49. H.O. Wood, B. Gutenberg, Earthquake Prediction (1935)